# Statistical Hypothesis Testing I

In this notebook, we will implement and apply **statistical hypothesis tests** to make inferences about populations based on sample data.

At the start, we clarify common misconceptions in statistical hypothesis testing.

Subsequently, we will implement the one-sample $z$-test and the one-sample $t$-test.

Finally, we will apply one of the tests to a concrete example.

### **Table of Contents**
1. [Clarification of Misconceptions](#misconceptions)
2. [One-samples Tests](#one-sample-tests)
3. [Example](#example)

In [1]:
%load_ext autoreload
%autoreload 2

import matplotlib.pyplot as plt
import numpy as np

from scipy import stats

### **1. Clarification of Misconceptions** <a class="anchor" id="misconeptions"></a>
Statistical hypothesis testing can often cause confusion and thus misconceptions, which we would like to clarify below.

#### **Questions:**
1. (a) Is the $p$-value the probability that the null hypothesis $H_0$ is true given the data?
   
   _Answer:_<br/>
      No, the $p$-value is the probability of observing a test statistic at least as extreme as the one observed, given that the null hypothesis $H_0$ is true.

   (b) Are hypothesis tests carried out to decide if the null hypothesis is true or false?

   _Answer:_<br/>
      No, the null hypothesis is assumed to be true. The hypothesis test is carried out to decide if the data is consistent with the null hypothesis or not.
   
   (c) Are hypothesis tests carried out to establish the test statistic?
   
   _Answer:_<br/>
      No, the test statistic is calculated from the data. The hypothesis test is carried out to determine the $p$-value.


### **2. One-sample Tests** <a class="anchor" id="one-sample-tests"></a>

We implement the function [`z_test_one_sample`](../e2ml/evaluation/_one_sample_tests.py) in the [`e2ml.evaluation`](../e2ml/evaluation) subpackage. Once, the implementation has been completed, we check it for varying types of tests.

In [10]:
from e2ml.evaluation import z_test_one_sample
sigma = 0.5
mu_0 = 2
sample_data = np.round(stats.norm.rvs(loc=2, scale=sigma, size=10, random_state=50), 1)
z_statistic, p = z_test_one_sample(sample_data=sample_data, mu_0=mu_0, sigma=sigma, test_type="right-tail")
assert np.round(z_statistic, 4) == -1.5811 , 'The z-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.9431, 'The p-value must be ca. 0.0007 for the one-sided right-tail test.' 
z_statistic, p = z_test_one_sample(sample_data=sample_data, mu_0=mu_0, sigma=sigma, test_type="left-tail")
assert np.round(z_statistic, 4) == -1.5811 , 'The z-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.0569, 'The p-value must be ca. 0.9993 for the one-sided left-tail test.' 
z_statistic, p = z_test_one_sample(sample_data=sample_data, mu_0=mu_0, sigma=sigma, test_type="two-sided")
assert np.round(z_statistic, 4) == -1.5811 , 'The z-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.1138, 'The p-value must be ca. 0.0014 for the two-sided test.' 

We implement the function [`t_test_one_sample`](../e2ml/evaluation/_one_sample_tests.py) in the [`e2ml.evaluation`](../e2ml/evaluation) subpackage. Once, the implementation has been completed, we check it for varying types of tests.

In [13]:
from e2ml.evaluation import t_test_one_sample
sample_data = np.round(stats.norm.rvs(loc=13.5, scale=0.25, size=10, random_state=1), 1)
t_statistic, p = t_test_one_sample(sample_data=sample_data, mu_0=13, test_type="right-tail")
assert np.round(t_statistic, 4) == 4.5898 , 'The t-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.0007, 'The p-value must be ca. 0.0007 for the one-sided right-tail test.' 
t_statistic, p = t_test_one_sample(sample_data=sample_data, mu_0=13, test_type="left-tail")
assert np.round(t_statistic, 4) == 4.5898 , 'The t-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.9993, 'The p-value must be ca. 0.9993 for the one-sided left-tail test.' 
t_statistic, p = t_test_one_sample(sample_data=sample_data, mu_0=13, test_type="two-sided")
assert np.round(t_statistic, 4) == 4.5898 , 'The t-test statistic must be ca. 4.590.' 
assert np.round(p, 4) == 0.0013, 'The p-value must be ca. 0.0014 for the two-sided test.' 

### **3. Example** <a class="anchor" id="example"></a>

Let us assume we have access to the follwing *identically and independently distributed* (i.i.d.) heart rate measurements $[\mathrm{beats/min}]$ of 40 patients in an *intensive care unit* (ICU):

$124, 111,  96, 104,  89, 106,  94,  48, 117,  61, 117, 104,  72,
86, 126, 103,  97,  49,  78,  52, 119, 107, 131, 112,  78, 132,
80, 139,  87,  44,  40,  60,  40,  80,  41, 103, 102,  44, 115,
103.$

#### **Questions:**
3. (a) Are heart rates from ICU patients unusual given normal heart rate has mean of 72 beats/min with a significance of .01? Perform a statistical hypothesis test by following the steps presented in the lecture and by using Python.

   _Answer:_<br/>

   Step 1: Define null and alternative hypothesis
      H0: heart rate is normal
      H1: heart rate is not normal

   Step 2: Define test statistic
      Median of the sample

   Step 3: Find sampling distribution of test statistic under H0
      We assume that the heart rate is normally distributed with mean 72 beats/min and standard deviation 10 beats/min.

   Step 4: Choose significance level
      alpha = 0.01

   Step 5: Evaluate test statistic
      The median of the sample is 96.5 beats/min.

   Step 6: Calculate p-value
      p-value = t_test_one_sample(sample, 72, 10, alternative='two-sided')

   Step 7: Decide
      if p-value < alpha: reject H0
      else: do not reject H0


In [22]:
# Perform hypothesis test
X_bar = np.array([124, 111,  96, 104,  89, 106,  94,  48, 117,  61, 117, 104,  72,
86, 126, 103,  97,  49,  78,  52, 119, 107, 131, 112,  78, 132,
80, 139,  87,  44,  40,  60,  40,  80,  41, 103, 102,  44, 115,
103])

# 1. Define null and alternative hypotheses
# H_0: mu_0 = 72
# H_1: mu_0 != 72
mu_0 = 72

# 2. Select appropriate test statistic s
s_n = lambda x_n: 1/len(X_bar) * sum(X_bar)

# 3. Find sampling distribution for s under H_0
# s ~ N(mu, sigma/sqrt(n))

# 4. Choose significance level alpha
alpha = 0.01

# 5. Evaluate the test statistic s for observed data
s_bar = s_n(X_bar)

# 6. Compute p-value
t_statistic, p = t_test_one_sample(sample_data=X_bar, mu_0=mu_0, test_type="two-sided")

# 7. Make decision
if p < alpha:
    print(f"Reject the null hypothesis at the {alpha*100}% significance level. With a p-value of {np.round(p, 4)} the data provides enough evidence to reject the null hypothesis.")
else:
    print(f"Do not reject the null hypothesis at the {alpha*100}% significance level. With a p-value of {np.round(p, 4)} the data does not provide enough evidence to reject the null hypothesis.")

Reject the null hypothesis at the 1.0% significance level. With a p-value of 0.0004 the data provides enough evidence to reject the null hypothesis.


P-Hacking:
    When significance level not given, it is tempting to choose the significance level that gives the desired result. Alternatively, one can perform multiple tests and choose the one that gives the desired result.