In [None]:
from pathlib import Path
from scipy import stats
import pandas as pd
from statsmodels.stats.weightstats import ttest_ind
base_path = Path().cwd().parent/"data/stats"

# Statistics
Statistics is one of those things where I feel like the classes need to be updated for a modern world. Most of the statistics you will do will be run on a computer, yet in the stats class I took we never touched a computer. It is important to know some of the math underneath the hood of the statistical models but, I feel that we have limited time and cannot become an expert in everything. There are very smart people who have created statistical tools we should use. Programming does allow you to see more of the inner workings of statistical tests. This tutorial well not go deep into the math but will show you best practices on how to use Python (this chapter) and R (the next chapter) to run statistics. The R chapter will mostly be programming examples since the Python chapter will cover more whys. R will cover more mixed model statistics since the ecosystem for that is way better. Lastly, we will cover only frequentist methods and not bayesian.

## Basic terminology
Before we start covering stats we will cover some basic terminology.<br>
**Mean**: The mean is the central tendency of your data. Many statistical tests that use the normal distribution under the hood are comparing the mean.<br>
**Variance (Var)**: Thespread of your data around the mean. Is is just standard deviation squared and is not in the units your data is but essentials $units^2$.<br>
**Standard deviation (STD or STDEV)**: The spread of your data around the mean. The standard deviation is very important for statistics because most parametric statistical models assume the standard deviation of both groups is the same.<br>
**Standard error of the mean (SEM)**: The precision of your estimated mean. SEM decreases as n increases but, not linearly. Many people plot $mean \pm SEM$ because it looks nice. Since SEM is indirectly related to statistical tests in that p-values are correlated with sample size.  
**Homogeneity of variances (homoscedastic)**: When two or more groups have the same variance. Homogeneity of variances is one the assumptions of many parametric statistics.<br>
**Heterogeneity of variances (heteroscedastic)**: When two or more groups have the different variances. Hetetoscedasticity can lead to incorrect results of hypothesis tests. Sometimes heteroscedasticity can be correct by transforming your data other times you will need specialized statistical tests.<br>
**Model**: A model is essentially anything you can put your data into and get a out something that helps you describe your data. All statistical tests model your data. Models make assumptions about your data thus you need to make sure your data fits those assumptions.<br>
**Within-subject (repeated measures)**: Some sort of measure that is collected from a subject two or more times. This can be cells within mouse, mice within litter, students within a school within a distric within a state. Usually repeated measures can be things like baseline, drug, drug+drug in the same cell or timepoints in the same mouse. Mixed model and repeated measures regression/ANOVA are two ways to control for within-subjects however, these models have different assumptions. Within-subject effects are often ignored in neuroscience and biology. Ignoring within-subjects can inflate p-values since within-subjects tests can trade off within and between mice differences. Within-subject tests often require larger sample sizes for better estimates.<br>
**Between-subject**: Between subjects are things like genotype, sex, age (if you have restricted age groups like adult, child, infant). Between subjects tests are the most common in basic neuroscience and biology.<br>
**Test statistic**: This is a value calculated from the test that can be used for hypothesis testing. You will see $F$, $t$, $z$ and $\chi^2$ test statistics. The test statistics also have an associated degrees of freedom that is used to look up the resulting p-value. $F$ and $t$ values are in some case interchangable such as for simple regression or categorical predictors and t-test.<br>
**Predictor or indepedent variable**: These are the "x" values in your data. Many tests are essentially solving for coefficients to show how much these predictor variables effect the outcome or dependent variable.<br>
**Outcome or depedent variable**: These are the "y" values in your data. These depend on your "x" values or are the outcome of your "x" values.<br>
**Parametric**: We covered this in the distributions chapter but parametric just means you have distributions that have parameters to describe them such as mean and STD. Parametric statistics are most of what we will cover.<br>
**Non-parametric**: These statistical tests are often rank-based tests. They essential rank say two groups and compare how many values in one group are larger than those in the other group. Non-parametric statistics do not assume your data follows a known distribution but do assume that each group follows the same distribution.<br>
**Multiple comparisions**: This when you compare one group with many other groups. Multiple comparisions need to be corrected for the many comparisions. Multiple comparisions are usually run a posthoc tests for ANOVA or on RNA expression data. Multiple comparisions are not a substitute for ANOVA which I have seen used too many times.<br>
**Bootstraping**: Resampling from a set of data over and over again. You can bootstrap with replacement, you can draw a value multiple times, or without replacement, you cannot draw a value multiple times. Bootstrapping can be used to construct non-parametric statistics. Bootstrapping assumes your data is independent and identically distributed.<br>
**Independent and identically distributed**: Indepedent means that each data point you have was not influenced or related to any other data points that you have in your dataset. Knowledge of one value does not give you information about another value. When this assumption is violated you use within-subject tests. Identically distributed means that all datasets are drawn from the same probability distribution. In a way many statistical tests with categorical independent variables (predictors) are testing whether identically distributed is false; the means are different with the assumption that the variance is the same. Other tests can test whether the variance is different such as Levene's test.<br>
**Null hypothesis**: Baseline assumption that there is no difference in a parameter between different groups. Usually the parameter is the mean for normally distributed data but could be the variance for something like Levene's test.<br>
**Hypothesis testing**: Testing to determine whether the data provided is sufficient to reject some other hypothesis. For most statitical tests we run will be used to determine whether we can reject the null hypothesis, usually the hypothesis that there is no difference between groups. Typically your hypothesis does not tell you the direction or magnitude of the difference. <br>
**Significance testing**: Generating a number, usually the p-value, that we use to determine whether we reject the null hypothesis. Random fact, Karl Pearson and Ronald Fisher are thought to have developed significance testing due to their interest in eugenics and testing for differences between different populations of people to justify their views.<br>
**Sample size**: The number of samples or n in your dataset.<br>
**Effect size**: The number tells you how large your difference or effect is in your statistical test. These include $r^2$, Cohen's d, eta/partial-eta squared, omega/partial-omega squared, odds ratio, or you can even use confidence intervals as an effect size though they are not considered an effect size.

## Interpreting the output of your results.
If you are running parametric testings that are many way to assess our statistical test other than just the p-value. The p-value is biased by number of samples. The more samples you have, the more likely you will get a significant p-value. So the next thing you need to do is find some other number to asses the significance of your signifcant p-value. You can use effect sizes but, those can be biased. One of the best things to use is the confidence interval of the difference. This is can be measured from the standard errors in the ANOVA or by bootstrapping. For mixed models ANOVA it is hard to calculate due to the within and between parameters.

## The t-test
The t-test is perhaps one of the simplest statistical tests you can run. Usually in electrophysiology we are comparing two different (independent) groups, comparing two different timepoints or drug treatment within a group (paired, not independent) or comparing whether a single group is different from a predefined mean (I have never seen this used but, t-test tutorials always seem to start with this). You can run a t-test using Scipy or Statsmodels in Python. I always recommend getting the confidence intervals of the difference. This tells you how large your difference is. Since the p-value is dependent on sample size this helps interpret the p-value. If you are running a t-test between two different groups there are several things to consider.
1. Are your data normally distributed? Yes, go to 2.
2. Are your two groups independent? Yes, use z-test, t-test or Welch's t-test. No, use paired t-test.
3. Is the variance of your two groups the same? Yes, use standard t-test. No, use Welch's t-test.
4. Are you sample sizes unequal? Use Welch's t-test
5. How large are your sample sizes? Larger than ~30 use the z-test

### The output
You will get the test statistic, a p-value and the degrees of freedom. You can also get confidence intervals. For simple tests like t-tests I prefer to get bootstrapped confidence intervals.

### Scipy vs Statmodels
I prefer Scipy for the t-tests but, I will include the code for Statsmodels as well. 

### Two-sample t-test with equal variances

#### Scipy

In [None]:
data = pd.read_csv(base_path/"ttest.csv", header=None)
one = data.loc[data[1] == "one", 0]
two = data.loc[data[1] == "two", 0]
output = stats.ttest_ind(one, two)
print(output)
print(output.confidence_interval())

#### Statsmodels

In [None]:
one = data.loc[data[1] == "one", 0]
two = data.loc[data[1] == "two", 0]
output = ttest_ind(one, two, alternative='two-sided', usevar='pooled')
print(output)

### Comparing two-sample t-test with unqual variances with and without Welch's correction
You will see that without Welch's correction that p-value and the degrees of freedom (df) is inflated. One thing to note is that Welch's correct will converge towards a standard t-test if the variances and sample sizes are equal.

#### Scipy

In [None]:
data = pd.read_csv(base_path/"ttest_welch.csv", header=None)
one = data.loc[data[1] == "one", 0]
two = data.loc[data[1] == "two", 0]
output = stats.ttest_ind(one, two, equal_var=True)
output_w = stats.ttest_ind(one, two, equal_var=False)
print(output, output_w, sep="\n")

#### Statsmodels

In [None]:
one = data.loc[data[1] == "one", 0]
two = data.loc[data[1] == "two", 0]
output = ttest_ind(one, two, alternative='two-sided', usevar='unequal')
output_w = ttest_ind(one, two, alternative='two-sided', usevar='equal')
print(output, output_w, sep="\n")

### One tailed t-test
One tailed t-tests are used when you have a specific hypothesis about the direction of your effect. This needs to be choosen before you collect your data. An example of when this would be used is if you have a treatment that "needs" to be more effective than a previous treatment or if you are replicating data and aleady have a hypothesis about the direction of your effect. The direction of you effect is determine by why group you have as one and two. If you expect one to be larger than two (positive differece between the groups) you want larger/greater. If you expect one to be smaller than two (negative difference between the groups) the you want less/smaller.

### Paired t-test
The paired t-test is used when you have a subject that has two measures and you want to compare if the two measures are different. Paired data (aka within-subject data) violates the independence assumption of a traditional t-test. Also paired t-test will increase your statistical power by reducing between subject variance since each subject essentially acts as its own control.`

#### Scipy

In [None]:
data = pd.read_csv(base_path/"ttest.csv", header=None)
one = data.loc[data[1] == "one", 0]
two = data.loc[data[1] == "two", 0]
output = stats.ttest_rel(one, two)
print(output)