# Choosing the Correct Statistical Test

References:<br>
http://www.ats.ucla.edu/stat/mult_pkg/whatstat/<br>
http://www-users.cs.umn.edu/~ludford/stat_guide.html



___

# Short answer

  &nbsp; | Categorical Dependent Variable | Continuous Dependent Variable
  ------------- | -------------
  **Categorical<br>Independent Variable** | Chi square | t-test or ANOVA
  **Continuous<br>Independent Variable** | LDA or QDA | Regression
  
____

# Long answer

# 0 Independent variables
___

Comparing a dataset against a hypothesized value (e.g., average age of people is 30 years old).

##  1 continuous dependent variable with normal distribution
### One sample t-test

Null hypothesis is that the mean value of the dataset is equal to the test/hypothesized value ($\mu_0$).
$$ H_o: \mu = \mu_0 $$
$$ H_A: \mu < \ or > \ or \ \ne \mu_0 $$

The test determines if there is a statistically significant difference between $\mu$ and $\mu_0$. 

This requires calculation of the mean of the dataset ($\bar{x}$),  the degrees of freedom dataset ($df$), the corresponding t-value and the p-value for the predetermined significance level (most commonly $\alpha=0.05$). 

If the p-value calculated from the t-distribution at the given t-value is smaller than the selected signficance level (e.g., $p<0.05$) then the null hypothesis can be rejected and the alternative hypothesis is supported.

Important calculations,

$$ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \ , \quad df = n-1 \ , \quad t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} \ , \quad s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2}$$

Typically, $p$ is found for a given t-distribution in a table or computer software as it requires integrating the t-distribution to find the probability from the area under the curve.

If the alternative hypothesis includes $\ne$, then a ***two-tail test*** is used where the significance level is split at the two extremes of the probability density curve. Otherwise for $<$ or $>$, a ***one-tail test*** is used where the significance level is all at the left or right side of the curve, respectively.

***This test is not appropriate if the number of data points is below 30 ($n<30$) and the data is not normally distributed.*** Larger values of $n$ may be more robust against this requirement.

In Python using scipy
```
scipy.stats.ttest_1samp(data, popmean, axis=0, nan_policy='propagate')
```

## 1 continuous dependent variable with normal distribution

### One sample median test

This test is non-parameteric (it is not fit to a specific distribution) so it doesn't have the same requirements as the t-test. Typically a ***one-sample Wilcoxon Signed Rank Test*** is used.

## 1 binary categorical dependent variable

### Binomial test

For $n$ trials of success/failure, returns the probability $p$ that the number of successes is not equal, greater than, or less than $k$.
$$ H_0: n_{success} = k $$
$$ H_A: n_{success} \ne k $$

In Python using scipy,
```
scipy.stats.binom_test(k, n, p, alternative=['two-sided','greater','less'])
```

For example, probability of returning $k$ heads given $n$ coin flips. Of course for a coin there are only two probabilities and each are likely to appear ($p=0.5$ for heads). The binomial test will determine if $k$ heads are likely to occur. If the null hypothesis is rejected for the pre-determined significance level, this can be interpreted as evidence that the coin is not fair.

## 1 categorical dependent variable

### Chi square goodness of fit test (Pearson's)

Similar to binomial test however there are more than two categories. This test will determine if the observed proportions of each category differ significantly from the hypothesized proportions.

In other words, it tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. The events (or categories) must be mutually exclusive and have total probability 1. 

A simple example is testing the outcome from a six-sided die to determine if the die is fair (all 6 outcomes equally likely to occur).

This assumes the categories are independent and identically distributed (*iid*) with the normal distribution, i.e., unpaired.

Here the test statistic $\chi^2$ follows the chi-squared distribution (*from wikipedia*):
<img src="https://upload.wikimedia.org/wikipedia/commons/8/8e/Chi-square_distributionCDF-English.png" width="350">

#### Procedure
1. Calculate the test statistic $\chi^2$
$$ \chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} $$
1. Determine the degrees of freedom $df = n-p$ where $n$ is the number of outcomes/categories and $p=s+1$ where $s$ is the number of parameters in the distribution (s=2 for normal, mean and standard deviation; s=0 for discrete uniform, no parameters)
1. Select a significance level $\alpha$ ($\alpha = 0.05$ typically)
1. Compare calculated $\chi^2$ with its critical value from the chi-squared distribution with the appropriate $df$ (one sided since only check if $\chi^2$ is greater than critical)
1. Accept or reject the null hypothesis that the observed frequency distribution is different from the theoretical distribution based on whether the test statistic exceeds the critical value of $\chi^2$ ***or*** report the corresponding p-value and compare with pre-determined significance.

In python using scipy,
```
scipy.stats.chisquare(f_obs, f_exp=None, ddof=0, axis=0)
```
This returns the calculated chi-square and the corresponding p-value which can be compared with the desired significance. The default value for this test assumes all expected probabilities are equally likely (come from a discrete uniform distribution).

A rule of thumb states that there must be at least 5 values for each category for this test to be valid.

A good example can be found at: https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Examples.

# 1 Independent binary categorical variable
___
Comparing two samples such as in an experiment, control vs. treatment groups or male vs. female.

## 1 continuous dependent variable with normal distribution

### Two sample t-test

This t-test assesses whether the means of two groups/categories of data are statistically different.

It is very similar to the one-sample t-test with a few exceptions:
1. The t-statistic is calculed from,
$$ t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} $$

1. And the degrees of freedom are,
$$ df = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{(s_1^2/n_1)^2/(n_1-1) + (s_2^2/n_2)^2/(n_2-1)} $$

The calculation for the t-statistic is appropriate for equal or unequal sample sizes and unequal variances between the two independent categories. This variation is known as ***Welch's t-test***.

For ***paired samples*** (e.g., experiments looking at before/after treatment values to find a statistically significant difference), the t-test is calculated from,
$$ t = \frac{\bar{x}_D - \mu_0}{s_D / \sqrt{n}} $$

where the $D$ subscript represents the difference between pairs, so the mean and standard deviation are calculated from the paired differences and not the measured values. Here the degrees of freedom is the same as the one-sample t-test $df = n-1$.

The paired test can help reduce the influence of confounding variables. For an explanation see wikipedia: https://en.wikipedia.org/wiki/Paired_difference_test#Use_in_reducing_confounding.