# Hypothesis Tests

In [1]:
from scipy import stats

## z-test
- prerequisite: normal distribution

In [2]:
# convert a bunch of observaions into z-scores 
# z = (x-mean)/std
z = stats.zscore([2, 4.2, 5.7, 3.7, 1, 3, 2, 1, 4, 3, 2, 3, 6])
z

array([-0.73901556,  0.70864505,  1.69568638,  0.37963128, -1.39704311,
       -0.08098801, -0.73901556, -1.39704311,  0.57703954, -0.08098801,
       -0.73901556, -0.08098801,  1.89309464])

In [3]:
from scipy import stats
# propability to get a z-value up to x
# cdf cummulative distribution fct.
# left-tailed
stats.norm.cdf(z[-1])

0.9708273577780218

In [None]:
# how probable is it to get a value more extreme 
# than the observed one, on  the upper end of the distro
# right-tailed
1 - stats.norm.cdf(z[-1])

0.02917264222197824

## t-test
- one sample t-test: $\mu = \mu_{given}$
- two-sample t-test: $H_0: \mu_1 = \mu_2$ 
- pairwise differences: $\mu_{before} = \mu_{after} \quad$ resp. $\quad \mu_{difference} = 0$
- we need at least N > 30

### one-sample

In [None]:
# [observations], mean
# 5.2 is the mean we want to test
# how likely is 5.2 as mean when we see these obseravtions
# output: t-value, p-value
tt = stats.ttest_1samp([2 ,4, 5.1, 7, 8], 15.2)
tt

Ttest_1sampResult(statistic=-9.353692699235728, pvalue=0.0007274958924473985)

### two-sample

In [None]:
# do two samples have the same mean ?
# sample size can differ
t2 = stats.ttest_ind([10, 15, 12, 13], [50, 30, 60, 70, 50, 20])
t2

Ttest_indResult(statistic=-3.5825284929430237, pvalue=0.007161878118066208)

### pairwise-difference

In [None]:
# sample size is equal, test/ask the same statistical units before and after
# H_0 says there is no difference p > 0.05
# if there is a difference p < 0.05
tp0 = stats.ttest_rel([4, 5 ,6, 7], [7, 8, 7, 9])
tp1 = stats.ttest_rel([4, 5 ,6, 7], [3, 6, 7, 9])
tp2 = stats.ttest_1samp([1, -1, -1, -2], 0) # the same as tp1, but we build the diff and test if mu = 0 

tp2

Ttest_1sampResult(statistic=-1.1920791213585396, pvalue=0.31893179191277526)

## Chi sqaured Test
$X^2$ Test
- The **purpose** of the test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true. 
- in other words: is there a significant difference between the observed values and the expected values
- $H_0: \mu = \mu_0$ - the Null states there is no difference <br>

**prerequisites**: 
  - expected frequencies > 5 for every observation
  - the test is just for frequencies not for relative values (e.g. percentages), 
  - the sample is random.<br>
  
**restrictions**:
  - sample size is important, the bigger the sample the easier it is to find small differences
  - $X^2$ test makes only statement about difference but not it's direction
  - There is also no decleration about the strenght of the effect
- $X^2$ is always  positive, and thus always one-sided 
- the test statistic $\sum \frac {(h_i - \hat {h_i})^2}{\hat {h_i}}$ is big if there is a difference between observed and expected values 

In [None]:
# [measurements], [expectations]
# can we say the measurements align with our expectations?
c2 = stats.chisquare([30, 111, 10], [25, 116, 10])
c2

Power_divergenceResult(statistic=1.2155172413793103, pvalue=0.5445700903289235)

## ANOVA
- **AN**alysis **O**f **VA**riance
- generalization of the t-test 
- $H_0: \mu_1 = \mu_2 = \mu_3 = .... = \mu_n$
**resrictions:**
  - all groups have the same standard deviation
  - error terms are normally distributed<br><rb>

  **problem: multiple comparisons**
- how to find out which of the samples doesn't fit
- one could do pairwise t-tests of all combinations, <br> 
 but with many test the probability grows to have a test with wrong results, thus one need to correct  $\alpha$


In [None]:
an = stats.f_oneway([1, 2, 3, 4, 1, 1, 1], [3, 4 , 2, 4], [5, 4, 3, 2, 1])
an

F_onewayResult(statistic=1.9043388429752064, pvalue=0.18821911126180027)