In [None]:
import matplotlib.gridspec as gs
import matplotlib.pyplot as pp
import pandas as pd
import pingouin as pg
import seaborn as sb

from pandas.api.types import CategoricalDtype

In [None]:
def format_p_val(p):
    """Given a p-value format it to a nice string.

    Examples:
        format_p_val(0.43242323) -> p = 0.432
        format_p_val(0.000001) -> p < 0.001
    """
    if p < 0.001:
        return "< 0.001"
    else:
        return "{}".format(round(p, 3))

First let's load the dataset and set the columns to the appropriate type.

Key point is to set the categorical/ordinal columns using the `astype` method and the `CategoricalDtype` class.

## Loading the data

In [None]:
# Read the data
file_name = "../data/mouse_linear_regression.csv"
df = pd.read_csv(file_name)
# Set relevant variables to categorical/ordinal with appropriate levels and order
df["Genotype"] = df["Genotype"].astype(
    CategoricalDtype(categories=["WT", "-/-"], ordered=True)
)
df["Sex"] = df["Sex"].astype(CategoricalDtype(categories=["F", "M"], ordered=False))
df["Treatment group"] = df["Treatment group"].astype(
    CategoricalDtype(categories=["Control", "TX", "TX2"], ordered=True)
)
# Take a look a the data
df.head()

## Exploratory data analysis

Now let's take a quick look at the data.
Here we use the seaborn [`catplot`](https://seaborn.pydata.org/generated/seaborn.catplot.html) function which allows us to create panelled plots for categorical data.
There are various types of plot supported throught the `kind` argument.

In [None]:
sb.catplot(
    df,
    kind="swarm",
    col="Sex",
    row="Genotype",
    x="Treatment group",
    y="Weight",
    color="k",
    aspect=1, # Width to height ratio
    height=4, # Height
)

The `catplot` function is useful for quickly getting a look at the data, but lacks flexibility if we want to combine different types of plots.
Previously we had seen how could use the `pointplot` and `swarmplot` functions to data plots with errors bars.
One way to do this is to use the [plotnine](https://plotnine.org) package which attempts to mimic R's ggplot library.
I prefer to use [matplotlib](https://matplotlib.org) along with seaborn.
It is a steeper learning curve, but ultimately gives a lot more control.

The key idea below is to use the `GridSpec` class which allows us to create a grid of `Axes` object.
`GridSpec` really only lays out the positions of the boxes for the `Axes`, to create them we first create a `Figure` object and then create `Axes` in the relevant location using `add_subplot`.

> Note: The `tight_layout` method is really useful for getting the `Axes` to be nicely spaced.

In [None]:
# Start by defining which columns we need for the plot
col = "Sex"
row = "Genotype"
x = "Treatment group"
y = "Weight"

In [None]:
# Now we need to find out how many rows and columns we need.
# To do so we just ask how many unique values are present in the relevant columns of our dataset.
ncols = df[col].nunique()
nrows = df[row].nunique()

In [None]:
# Create the grid and do our plotting
grid = gs.GridSpec(nrows=nrows, ncols=ncols)
# Create a figure to plot to
fig = pp.figure(figsize=(8, 8))
# Now we loop through the different row and column values to create our plots
for i, row_val in enumerate(df[row].unique()):
    for j, col_val in enumerate(df[col].unique()):
        # Create an Axes object for plotting
        ax = fig.add_subplot(grid[i, j])
        # Subset our DataFrame to the relevant entries
        plot_df = df[(df[row] == row_val) & (df[col] == col_val)]
        # Now we can plot using the code from worksheet 2
        sb.swarmplot(
            plot_df,
            ax=ax,
            x=x,
            y=y,
            color="k",
        )
        sb.pointplot(
            plot_df,
            ax=ax,
            x=x,
            y=y,
            capsize=0.4,
            color="k",
            estimator="median",
            errorbar=("pi", 50),
            linestyle="none",
        )
        ax.set_xlabel(x)
        ax.set_ylabel(y)
        ax.set_title("Genotype = {0} | Sex = {1}".format(row_val, col_val))
# Usually a good idea to do tight_layout
grid.tight_layout(fig)

### Testing difference of means

Our EDA has revealed that genotype -/- seems to have an interaction with the treatment, whereas the WT genotype doesn't seem to be impacted by treatment.
To start with let's split the data by genotype and test the difference of means.

In [None]:
# Subset by genotype
df_geno = df[df["Genotype"] == "WT"]
# Create one group for control
x = df_geno.loc[df_geno["Treatment group"] == "Control", "Weight"]
# Create another group for single treatment
y = df_geno.loc[df_geno["Treatment group"] == "TX", "Weight"]
# Use pingouin to do a t test
pg.ttest(x, y)

As expected, no effect for the WT genotype.
Let's try for the knockout genotype.

In [None]:
# Subset by genotype
df_geno = df[df["Genotype"] == "-/-"]
# Create one group for control
x = df_geno.loc[df_geno["Treatment group"] == "Control", "Weight"]
# Create another group for single treatment
y = df_geno.loc[df_geno["Treatment group"] == "TX", "Weight"]
# Use pingouin to do a t test
pg.ttest(x, y)

Now we see an effect.
We should conduct our assumptions check on the data.
Here we will do Shapiro-Wilks for normality and Levene's test for equal variance of the populations.

In [None]:
# Shapiro-Wilks
pg.normality(
    data=df_geno,
    dv="Weight",
    group="Treatment group"
)

In [None]:
# Levene's test
pg.homoscedasticity(
    data=df_geno,
    dv="Weight",
    group="Treatment group"
)

Normality and equal variance hold.
So far this looks like a lot more work than JASP.
The benefit comes if we want to do a few different comparisons.
We can reuse the previous code but put it inside a `for` loop.

In [None]:
# We will split by the genotypes
genos = ["-/-", "WT"]
# We will save our results in a list
results = []
# Now we loop over the genotypes
for g in genos:
    # Do the t test
    df_g = pg.pairwise_tests(
        data=df[df["Genotype"] == g],
        dv="Weight",
        between="Treatment group"
    )
    # Add the information about which genotype we looked at to the data frame
    df_g.insert(0, "Genotype", g)
    # Add the genotype results to our results list
    results.append(df_g)
# Turn our results into a dataframe by concatenating them
results = pd.concat(results)
results

Now we should also be doing multiple test correction on the p-values.
The `multicomp` function from `pingouin` provides this functionality.
Below I will use Bonferroni to adjust the p-values and Benjamini/Hochberg to get the false discovery rate.

In [None]:
pg.multicomp(results["p-unc"], method="bonferroni")

In [None]:
pg.multicomp(results["p-unc"], method="fdr_bh")

The `multicomp` function that corrects for multiple testing returns to collections of things.

1. Whether the corrected values are reject the null at a given power (0.05) by default.
2. The adjusted p-values.

If we just want the p-values we can select them using the indexing notation `[1]` since they are the second element. 
> Recall that Python uses zero-based indexing i.e. starts counting from zero.

In [None]:
results["p_bon"] = pg.multicomp(results["p-unc"], method="bonferroni")[1]
results["fdr"] = pg.multicomp(results["p-unc"], method="fdr_bh")[1]
results

If the normality tests had failed we could also perform Mann Whitney.

> Caveat: scipy does not seem to support CI's for Mann Whitney.

In [None]:
df_geno = df[df["Genotype"] == "-/-"]
# Mann Whitney
pg.pairwise_tests(
    data=df_geno,
    dv="Weight",
    between="Treatment group",
    parametric=False
)

If the equal variance test fails we use Welch's t test.

In [None]:
df_geno = df[df["Genotype"] == "-/-"]
# Mann Whitney
pg.pairwise_tests(
    data=df_geno,
    dv="Weight",
    between="Treatment group",
    correction=True
)

## One way ANOVA

Since treatment group has three levels (Control, TX, TX2), we need to do multiple pairwise t tests.

An alternate approach is to use an ANOVA to do a omnibus test, followed by post-hoc pairwise tests.

In [None]:
# We'll subset by genotype here
df_geno = df[df["Genotype"] == "-/-"]
# Run a one way ANOVA
pg.anova(
    data=df,
    dv="Weight",
    between="Treatment group",
)

You can use the same functions as we did in the t test section to check assumptions.

Since the ommibus test was significant we can run a post hoc test.
We will use the Tukey test.

In [None]:
pg.pairwise_tukey(
    data=df,
    dv="Weight",
    between="Treatment group",
)

The `pairwise_tukey` functions returns Tukey-HSD corrected p-values.
Based on these we note a significant difference between Control and TX2, but not Control and TX.

## Two way ANOVA

So far we have been subsetting by genotype.
We can unify the entire analysis doing a two way ANOVA.

In [None]:
pg.anova(
    data=df,
    dv="Weight",
    between=["Genotype", "Treatment group"],
)

We see a significant interaction effect.
We can use post-hoc tests to look at what specific interactions.
However, pingouin does not currently support this.
There is a simple fix which is to create a dummy column with the combo of the factors in the interaction.
We will do this for Sex and Treatment group.

In [None]:
df["interaction"] = df["Genotype"].astype(str) + ":" + df["Treatment group"].astype(str)
df.iloc[::5]

Now we can do the post-hoc test on the interaction.

In [None]:
pg.pairwise_tukey(
    data=df,
    dv="Weight",
    between="interaction",
)

We can also include Sex.

In [None]:
pg.anova(
    data=df,
    dv="Weight",
    between=["Genotype", "Sex", "Treatment group"],
).round(3)

The results suggest the sex alone is significant.
This makes sense because the data was generated to reflect a baseline difference in weight between male and female mice.
However, there are no significant interactions between sex and other variables.