# Statistical Hypothesis Testing I

In this notebook, we will implement and apply **statistical hypothesis tests** to make inferences about populations based on sample data.

At the start, we clarify common misconceptions in statistical hypothesis testing.

Subsequently, we will implement the one-sample $z$-test and the one-sample $t$-test.

Finally, we will apply one of the tests to a concrete example.

### **Table of Contents**
1. [Clarification of Misconceptions](#misconceptions)
2. [One-samples Tests](#one-sample-tests)
3. [Example](#example)

In [1]:
%load_ext autoreload
%autoreload 2

import matplotlib.pyplot as plt
import numpy as np

from scipy import stats

C:\Users\klara\anaconda3\envs\e2ml-env\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
C:\Users\klara\anaconda3\envs\e2ml-env\lib\site-packages\numpy\.libs\libopenblas.GK7GX5KEQ4F6UYO3P26ULGBQYHGQO7J4.gfortran-win_amd64.dll


### **1. Clarification of Misconceptions** <a class="anchor" id="misconeptions"></a>
Statistical hypothesis testing can often cause confusion and thus misconceptions, which we would like to clarify below.

#### **Questions:**
1. (a) Is the $p$-value the probability that the null hypothesis $H_0$ is true given the data?
   
   Wahrscheinlichkeit extremere Werte (von H_0 distribution abweichende Werte) zu bekommen unter der Voraussetzung, dass H_0 war ist.
   
   (b) Are hypothesis tests carried out to decide if the H_0 is true or false?

   Es gibt keine Sicherheit, es sind immer nur Indizien.
   Wenn das Siginificance level klein ist, kann man H_0 verwerfen.
   H_0 zu verwerfen kann sicherer gemacht werden, als zu sagen, dass sie wahr kist.
   Gegeben \alpha: Sprechen Indizien gegen oder für H_0.
   Vermutung ist H_1, also das was man überprüfen will.
   
   (c) Are hypothesis tests carried out to establish the test statistic?
   
   Test Statistik beschreibt wie gut die Observations die distributions abbilden, die in der H_0 angenommen wurden.
   Hypothesis tests haben nicht (nur) die Aufgabe test statistics zu etablieren.
   Sie sollen Inferenzen möglich machen.




### **2. One-sample Tests** <a class="anchor" id="one-sample-tests"></a>

We implement the function [`z_test_one_sample`](../e2ml/evaluation/_one_sample_tests.py) in the [`e2ml.evaluation`](../e2ml/evaluation) subpackage. Once, the implementation has been completed, we check it for varying types of tests.

In [2]:
from e2ml.evaluation import z_test_one_sample
sigma = 0.5
mu_0 = 2
sample_data = np.round(stats.norm.rvs(loc=2, scale=sigma, size=10, random_state=50), 1)
z_statistic, p = z_test_one_sample(sample_data=sample_data, mu_0=mu_0, sigma=sigma, test_type="right-tail")
assert np.round(z_statistic, 4) == -1.5811 , 'The z-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.9431, 'The p-value must be ca. 0.0007 for the one-sided right-tail test.' 
z_statistic, p = z_test_one_sample(sample_data=sample_data, mu_0=mu_0, sigma=sigma, test_type="left-tail")
assert np.round(z_statistic, 4) == -1.5811 , 'The z-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.0569, 'The p-value must be ca. 0.9993 for the one-sided left-tail test.' 
z_statistic, p = z_test_one_sample(sample_data=sample_data, mu_0=mu_0, sigma=sigma, test_type="two-sided")
assert np.round(z_statistic, 4) == -1.5811 , 'The z-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.1138, 'The p-value must be ca. 0.0014 for the two-sided test.' 

We implement the function [`t_test_one_sample`](../e2ml/evaluation/_one_sample_tests.py) in the [`e2ml.evaluation`](../e2ml/evaluation) subpackage. Once, the implementation has been completed, we check it for varying types of tests.

In [3]:
from e2ml.evaluation import t_test_one_sample
sample_data = np.round(stats.norm.rvs(loc=13.5, scale=0.25, size=10, random_state=1), 1)
t_statistic, p = t_test_one_sample(sample_data=sample_data, mu_0=13, test_type="right-tail")
assert np.round(t_statistic, 4) == 4.5898 , 'The t-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.0007, 'The p-value must be ca. 0.0007 for the one-sided right-tail test.' 
t_statistic, p = t_test_one_sample(sample_data=sample_data, mu_0=13, test_type="left-tail")
assert np.round(t_statistic, 4) == 4.5898 , 'The t-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.9993, 'The p-value must be ca. 0.9993 for the one-sided left-tail test.' 
t_statistic, p = t_test_one_sample(sample_data=sample_data, mu_0=13, test_type="two-sided")
assert np.round(t_statistic, 4) == 4.5898 , 'The t-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.0013, 'The p-value must be ca. 0.0014 for the two-sided test.' 

### **3. Example** <a class="anchor" id="example"></a>

Let us assume we have access to the follwing *identically and independently distributed* (i.i.d.) heart rate measurements $[\mathrm{beats/min}]$ of 40 patients in an *intensive care unit* (ICU):

$124, 111,  96, 104,  89, 106,  94,  48, 117,  61, 117, 104,  72,
86, 126, 103,  97,  49,  78,  52, 119, 107, 131, 112,  78, 132,
80, 139,  87,  44,  40,  60,  40,  80,  41, 103, 102,  44, 115,
103.$

#### **Questions:**
3. (a) Are heart rates from ICU patients unusual given normal heart rate has mean of 72 beats/min with a significance of .01? Perform a statistical hypothesis test by following the steps presented in the lecture and by using Python.

   cf. below

   p-hacking: 
      - define alpha after p value is computed. 
      - ziehe Subset of observed samples auf denen getestet: Obwohl Daten eigentlich von dist generiert wurden, sieht das bei gewissen Subsets nicht so aus. -> By chance effect der Hypothese zu schreiben.-> Wiederhole subsets ziehen
      -> Klausurrelevant. Youtube

   Man kann nicht grundsätzlich sagen, ob es besser ist auf einen gesamten oder wiederholt auf subsamples zu testen.
   (Möglicherweise ab großen Datensätzen könnte wiederholtes subsamplen aussagekräftiger zu sein.)


In [4]:
data = [124, 111,  96, 104,  89, 106,  94,  48, 117,  61, 117, 104,  72,
86, 126, 103,  97,  49,  78,  52, 119, 107, 131, 112,  78, 132,
80, 139,  87,  44,  40,  60,  40,  80,  41, 103, 102,  44, 115,
103]

# (1) define hypotheses
# H_0: mu = 72
# H_1: mu != 72
mu_0 = 72   # population mean

# (2) select test statistic: Mittelwert
s_n = (1/len(data))*sum(data)

# (3) find sampling distribution of test statistic under H_0
# z statistic: std is knownc & independent & normally distributed & enough (>30) samples -> NOT usable
# t statistic: std is unknown & independent & normally distributed & enough (>30) samples -> usable
# t transformation anwenden ==  treffen die Annahmen student t Verteilung ist zugrundeliegende Verteilung (aufgrund der erfüllten Voraussetzungen)

# (4) define significance level: always define alpha before computing p-value
alpha = 0.01

# (5) evaluate test statistic for observed data & (6) compute p-value
t_statistic, p = t_test_one_sample(sample_data=data, mu_0=mu_0, test_type="two-sided")

# (7) make decision
if p < alpha:
    print("Reject H_0 with significance level alpha={}, and pvalue={}".format(np.round(alpha,4),np.round(p,4)))
else:
    print("Do not reject H_0")



Reject H_0 with significance level alpha=0.01, and pvalue=0.0004
