# Notes

### Tests

### Parametric Tests
**Z-Test**<br>
$\bar{x}$, $\mu$, $\sigma$ or n > 30

**T-Tests and Tails**<br>
* One-sample one-tail: $H_{a}: \mu < 3$
* One-sample two-tail: $H_{a}: \mu \neq 3$
* Two-sample one-tail: $H_{a}: \mu_{1} < \mu_{2}$
* Two-sample two-tail: $H_{a}: \mu_{1} \neq \mu_{2}$

**One Sample T-Test**<br>
$\bar{x}$, $\mu$, $\bar{s}$ and n < 30

**Two Sample T-Test**<br>
$\bar{x}_1$, $\bar{x}_2$, $n_1$, $n_2$

**ANOVA**<br>
3 or more samples.<br>
ANOVAs are an omnibus test. We can only determine that a difference exists<br>
in the set of samples, but not between which samples.

---

### Non-parametric Tests
**$\chi^{2}$ Test**<br>
Use when one categorical (discrete) variable is in question.<br>
$p_1$, $p_2$

---


## Z-Test

$$\large \text{z-statistic} = \dfrac{\bar x - \mu_0}{{\sigma}/{\sqrt{n}}} $$

To calculate your P-Value with the z statisitic:<br>
`pvalue = 1 - stats.norm.cdf(z)`

## t-Tests

### One Sample

$$\large t = \frac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}$$

To find your p-value using your t-test statistic use:<br>
`pvalue = stats.t.sf(t, df = degrees_of_freedom)`<br>
        or<br>
`pvalue = stats.t.cdf(t, df = degrees_of_freedom)`

You can also look for your t-critical to compare to your t-stat:<br>
    `t_crit = stats.t.ppf(1 - alpha, df = degrees_of_freedom)`

### Two Samples

$$\large t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{{\frac{s_1^2}{n_1}}+{\frac{s_2^2}{n_2}}}}$$

Where:
* $\bar{x_i}$ - mean of sample i
* ${s_i}^2$ - variance of sample i
* $n_i$ - sample size of sample i

If given these statistics, you can use:<br>
`t_stat = stats.ttest_ind_from_stats(x1, s1, n1, x2, s2, n2)`

If given two lists or arrays, you can use:<br>
`t_stat = stats.ttest_ind(array1, array2)`

## ANOVA

### One-way

Given a dictionary of arrays (your samples to test), use:<br>
`results = stats.f_oneway(*groups.values()); f_stat, p = results`

Where results returns a tuple with your F statistic and your P-Value.

The term `groups` is the dictionary for your test.

## $\chi^{2}$-Test

This gives us the $\chi^2$ statistic:


$$\large \chi^2 = \sum \frac{( Expected_i - Observed_i)^2}{Expected_i}$$

To calculate your P-Value with the $\chi^2$ statistic:<br>
`pvalue = 1 - stats.chi2.cdf(chisq_stat, df=degrees_of_freedom)
`

If you have your data in a set of lists, you can also calculate your P-Value with this:<br>
`results = stats.chisquare(f_obs=observations, f_exp=expectations)`

Where results returns your ($\chi^2$ statistic , P-Value) as a tuple.

#### $\chi^{2}$ Test of independence

$df = (n_{rows} - 1)\cdot(n_{cols} -1)$

Given an array of two lists. You can use this function to determine your statistics:<br>
`results = stats.contingency.chi2_contingency(array); chi, p, dof, exp = results`

Where results returns your $\chi^{2}$, P-Value, df, and the input array.<br>
**Input arrays must be vertically stacked** <br>
Use `np.vstack((list_1, list_2, list_3,..., list_n))`