In [1]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
import math
from scipy import stats
from scipy.stats import norm
from scipy.stats import chi2
from scipy.stats import t
from scipy.stats import f
from scipy.stats import bernoulli
from scipy.stats import binom
from scipy.stats import nbinom
from scipy.stats import geom
from scipy.stats import poisson
from scipy.stats import uniform
from scipy.stats import randint
from scipy.stats import expon
from scipy.stats import gamma
from scipy.stats import beta
from scipy.stats import weibull_min
from scipy.stats import hypergeom
from scipy.stats import shapiro
from scipy.stats import pearsonr
from scipy.stats import normaltest
from scipy.stats import anderson
from scipy.stats import spearmanr
from scipy.stats import kendalltau
from scipy.stats import chi2_contingency
from scipy.stats import ttest_ind
from scipy.stats import ttest_rel
from scipy.stats import mannwhitneyu
from scipy.stats import wilcoxon
from scipy.stats import kruskal
from scipy.stats import friedmanchisquare
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import kpss
from statsmodels.stats.weightstats import ztest
from scipy.integrate import quad
from IPython.display import display, Latex

**Statistical Hypothesis:** A statistical hypothesis is usually a statement about a set of parameters of a population distribution.

**$H_0$ (Null Hypothesis):** The null hypothesis is a statistical hypothesis to be tested and accepted or rejected in favor of an alternative.

**$H_1$ (Alternative Hypothesis):**  An alternative hypothesis is an opposing theory in relation to the null hypothesis.

**Type $\mathrm{I}$ Error:** The type $\mathrm{I}$ error, is said to result if the test incorrectly calls for rejecting $H_0$ when it is indeed correct.

$\alpha = P(reject\ H_0\ |\ H_0\ is\ true)$

**Type $\mathrm{II}$ Error:** The type $\mathrm{II}$ error, results if the test calls for accepting $H_0$ when it is false.

$\beta  = P(Accept\ H_0\ |\ H_0\ is\ not\ true)$

**Significance Level:** Whenever $H_0$ is true, its probability of being rejected is never greater than $\alpha$. The value $\alpha$, called the level of significance of the test, is usually set in advance, with commonly chosen values being $\alpha = 0.1, 0.05, 0.005$.

**P_value:** The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H 0) of a study question is true — the definition of ‘extreme’ depends on how the hypothesis is being tested.

If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis.

<table style="display: inline-block">
<tr><th>.</th><th> 

$H_0$ is actually true

</th><th>

$H_0$ is not true

</th><tr>
<tr><td>

Accept $H_0$ 

</td><td> 

$1-\alpha$

</td><td> 

$\beta$

</td></tr>
<tr><td>

Reject $H_0$ 

</td><td> 

$ \alpha $

</td><td> 

$1- \beta$

</td></tr>
</table>

**Steps of a hypothesis testing:**

1. Specify the null and alternative Hypothesis.
2. Collect a random sample from the population. 
3. Calculate the Test Statistic and Corresponding P-Value
4. Decide whether to reject or fail to reject your null hypothesis

# **Test Concerning the Mean of a Normal Population with known sd:**

**A. Two Tailed Test:**

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu = \mu_0$

$H_1:\ \mu  \neq \mu_0$

$\\ $

Testing statistics: 

$Z_0  \equiv \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

1. $-\ Z_{\frac{\alpha}{2}}\ <\ Z_0 = \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}\ <\ Z_{\frac{\alpha}{2}}$

2. $\mu_0\ -\ Z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\ <\ \overline{X}\ <\ \mu_0\ +\ Z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$

3. $|\overline{X}-\mu_0| < Z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$

4. P_value $ = 2 \times P(Z \geq |Z_0|) > \alpha$

<br>
<br>

**B. One-sided tests:**

**Right-tailed test:**

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu =  \mu_0\ \quad or \quad H_0:\ \mu  \leq \mu_0$

$H_1:\ \mu  > \mu_0$

$\\ $

Testing statistics: 

$Z_0  \equiv \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

1. $ -\infty <\ Z_0 = \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}\ <\ Z_{\alpha} $

2. $- \infty \ <\ \overline{X}\ <\ \mu_0\ +\ Z_{\alpha} \frac{\sigma}{\sqrt{n}}$

3. P_value $ = P(Z \geq Z_0) > \alpha$


<br>
<br>

**Left-tailed test:**

Suppose that $X_1, X_2, ..., X_n$ is a sample of size $n$ from a normal distribution having an unknown mean $\mu$ and a known variance $\sigma^2$.

$X_1, X_2, ..., X_n \sim N( \mu, \sigma^2)$

$H_0:\ \mu =  \mu_0 \quad or \quad H_0:\ \mu  \geq \mu_0$

$H_1:\ \mu  < \mu_0$

$\\ $

Testing statistics: 

$Z_0  \equiv \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$

$\\ $

Significance level = $\alpha$

We accept $H_0$ if:

1. $-\ Z_{\alpha}\ <\ Z_0 = \frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}\ <\ \infty $

2. $\ \mu_0\ -\ Z_{\alpha} \frac{\sigma}{\sqrt{n}}\ <\ \overline{X}\ <\ \infty$

3. P_value $ = P(Z \leq Z_0) > \alpha$