In [1]:
from scipy.stats import binom
from IPython.display import display, Latex

# [Hypothesis Testing](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing)

A **statistical hypothesis test** is a method of [statistical inference](https://en.wikipedia.org/wiki/Statistical_inference "Statistical inference") used to determine a possible conclusion from two different, and likely conflicting, hypotheses.

In a statistical hypothesis test, a [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis "Null hypothesis") and an [alternative hypothesis](https://en.wikipedia.org/wiki/Alternative_hypothesis "Alternative hypothesis") is proposed for the probability distribution of the data. If the sample obtained has a probability of occurrence less than the pre-specified threshold probability, the [significance level](https://en.wikipedia.org/wiki/Significance_level "Significance level"), given the null hypothesis is true, the difference between the sample and the null hypothesis is deemed [_statistically significant_](https://en.wikipedia.org/wiki/Statistically_significant "Statistically significant"). The hypothesis test may then lead to the rejection of null hypothesis and acceptance of alternate hypothesis.

The process of distinguishing between the null hypothesis and the alternative hypothesis is aided by considering [Type I error](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors "Type I and type II errors") and [Type II error](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors "Type I and type II errors") which are controlled by the pre-specified significance level.

Hypothesis tests based on statistical significance are another way of expressing [confidence intervals](https://en.wikipedia.org/wiki/Confidence_interval "Confidence interval") (more precisely, confidence sets). In other words, every hypothesis test based on significance can be obtained via a confidence interval, and every confidence interval can be obtained via a hypothesis test based on significance.[[1]](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing#cite_note-1)

---

## [p-value](https://en.wikipedia.org/wiki/P-value)

In [null-hypothesis significance testing](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing "Statistical hypothesis testing"), the **_p_-value**[[note 1]](https://en.wikipedia.org/wiki/P-value#cite_note-2) is the probability of obtaining test results at least as extreme as the [results actually observed](https://en.wikipedia.org/wiki/Realization_(probability) "Realization (probability)"), under the assumption that the [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis "Null hypothesis") is correct.[[2]](https://en.wikipedia.org/wiki/P-value#cite_note-3)[[3]](https://en.wikipedia.org/wiki/P-value#cite_note-ASA-4) A very small _p_-value means that such an extreme observed [outcome](https://en.wikipedia.org/wiki/Outcome_(probability) "Outcome (probability)") would be very unlikely under the null hypothesis. Reporting _p_-values of statistical tests is common practice in [academic publications](https://en.wikipedia.org/wiki/Academic_publishing "Academic publishing") of many quantitative fields. Since the precise meaning of _p_-value is hard to grasp, [misuse is widespread](https://en.wikipedia.org/wiki/Misuse_of_p-values "Misuse of p-values") and has been a major topic in [Metascience](https://en.wikipedia.org/wiki/Metascience "Metascience").[[4]](https://en.wikipedia.org/wiki/P-value#cite_note-5)[[5]](https://en.wikipedia.org/wiki/P-value#cite_note-6)

### Example 1: One-tailed

It is commonly said that $10\%$, percent of people are left-handed, but Lilianna suspected that a higher proportion of art students at her university are left-handed. To test this theory, she took a sample of $150$ art students and found that $\hat p=14\%$ of the sample was left-handed.

To see how likely a sample like this was to happen by random chance alone, Lilianna performed a simulation. She took a sample of $n=150$ students from a population where $10\%$ of the students were left-handed, and she recorded what proportion of the sample was left-handed. She repeated this process for a total of $50$ samples. Here are the sample proportions from her $50$ samples:

She wants to test $H_0: p=10\%$ vs. $H_\text{a}: p>10\%$ where $p$ is the proportion of art students at her university who are left-handed.

![](https://raw.githubusercontent.com/ZacksAmber/PicGo/master/img/20220221023934.png)

**Based on these simulated results, what is the approximate ppp-value of the test?**  
_Note: The sample result was $\hat p=14\%$._

In [2]:
# the number p >= 0.14 is 4
trials = 50
p_value = 4 / trials
p_value

0.08

### Example 2: Two-tailed

A large school district knows that $75\%$, percent of students in previous years rode the bus to school. Administrators wondered if that figure was still accurate, so they took a random sample of $n=80$ students and found that $\hat p=65\%$ of those sampled rode the bus to school.

To see how likely a sample like this was to happen by random chance alone, the school district performed a simulation. They simulated $120$ samples of $n=80$ students from a large population where $75\%$, percent of the students rode the bus to school. They recorded the proportion of students who rode the bus in each sample. Here are the sample proportions from their $120$ samples:

![](https://raw.githubusercontent.com/ZacksAmber/PicGo/master/img/20220221024540.png)

They want to test $H_0: p=75\%$ vs. $H_\text{a}: p \neq 75\%$ where $p$ is the true proportion of students in this district that ride the bus to school.

**Based on these simulated results, what is the approximate $p-value$ of the test?**  
_Note: The sample result was $\hat p=65\%$.

In [3]:
# the number of p <= 0.65 and p >= 0.85 is 4 + 3
trials = 120
p_value = 7 / trials
p_value

0.058333333333333334

## [Binomial Distribution](https://en.wikipedia.org/wiki/Binomial_distribution)

> [Probability mass function](https://en.wikipedia.org/wiki/Probability_mass_function)<br>
> In [probability](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), a **probability mass function** is a function that gives the probability that a [discrete random variable](https://en.wikipedia.org/wiki/Discrete_random_variable "Discrete random variable") is exactly equal to some value.[[1]](https://en.wikipedia.org/wiki/Probability_mass_function#cite_note-1) Sometimes it is also known as the discrete density function. The probability mass function is often the primary means of defining a [discrete probability distribution](https://en.wikipedia.org/wiki/Discrete_probability_distribution "Discrete probability distribution"), and such functions exist for either [scalar](https://en.wikipedia.org/wiki/Scalar_variable "Scalar variable") or [multivariate random variables](https://en.wikipedia.org/wiki/Multivariate_random_variable "Multivariate random variable") whose [domain](https://en.wikipedia.org/wiki/Domain_of_a_function "Domain of a function") is discrete.<br>
> A probability mass function differs from a [probability density function](https://en.wikipedia.org/wiki/Probability_density_function) (PDF) in that the latter is associated with continuous rather than discrete random variables. A PDF must be [integrated](https://en.wikipedia.org/wiki/Integration_(mathematics) "Integration (mathematics)") over an interval to yield a probability.[[2]](https://en.wikipedia.org/wiki/Probability_mass_function#cite_note-:0-2)<br>
> The value of the random variable having the largest probability mass is called the [mode](https://en.wikipedia.org/wiki/Mode_(statistics) "Mode (statistics)").

> [Cumulative distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function)<br>
> In [probability theory](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), the **cumulative distribution function** (**CDF**) of a real-valued [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") ${\displaystyle X}$, or just **distribution function** of ${\displaystyle X}$, evaluated at ${\displaystyle x}$ that ${\displaystyle X}$ will take a value less than or equal to ${\displaystyle x}$.[[1]](https://en.wikipedia.org/wiki/Cumulative_distribution_function#cite_note-1)<br>
> In the case of a scalar [continuous distribution](https://en.wikipedia.org/wiki/Continuous_distribution "Continuous distribution"), it gives the area under the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") from minus infinity to ${\displaystyle x}$. Cumulative distribution functions are also used to specify the distribution of [multivariate random variables](https://en.wikipedia.org/wiki/Multivariate_random_variable "Multivariate random variable").

### Example 1: PMF

Olga has a pair of dogs, and she noticed they usually breed more male puppies than females. To check that, she bred them again and obtained $4$ puppies, and _all_ of them were males.

Let's test the hypothesis that **each puppy has an equal chance of $50\%$  of being either male or female** versus the alternative that the chance of a male puppy is _greater_.

**Assuming the hypothesis is correct, what is the probability of having 4 male puppies out of $4$? Round your answer, if necessary, to the nearest tenth of a percent.**

Let's agree that if the observed outcome has a probability _less_ than 1%, percent under the tested hypothesis, we will reject the hypothesis.

In [4]:
# Analytics
k, n, p = 4, 4, 0.5
precision = 2
pmf = binom.pmf(k, n, p)
pmf = round(pmf * 100, precision)
display(Latex(f'$P(X=4)={pmf}\%$'))

<IPython.core.display.Latex object>

Conculsion: Since 6.25% is greater than 1%, we cannot reject $H_0$.

### Example 2: CDF for left tail

Mateus’s bank issued an advertisement saying that $90\%$ of its customers are satisfied with the bank’s services. Since he himself wasn't very satisfied, he suspected the ad is false. He surveyed a random sample of $80$ of the bank’s customers, and found that only 80% were satisfied.

Let's test the hypothesis that **the actual percentage of satisfied customers is $90\%$** versus the alternative that the actual percentage is _lower_ than that.

The table below sums up the results of $1000$ simulations, each simulating a sample of $80$ customers, assuming there are $90\%$ satisfied customers.

**According to the simulations, what is the probability of getting a sample with $80\%$ satisfied customers or less?**

Let's agree that if the observed outcome has a probability _less_ than 1%, percent under the tested hypothesis, we will reject the hypothesis.

|Measured % of satisfied customers|Frequency|
|:-:|:-:|
|80|5|
|82.5|24|
|85|72|
|87.5|181|
|90|281|
|92.5|272|
|95|136|
|97.5|27|
|100|2|

In [5]:
# Analytics
k, n, p = 80*0.8, 80, 0.9
precision = 2
cdf = binom.cdf(k, n, p)
cdf = round(cdf * 100, precision)
display(Latex(f'$P(X<=80\%)={cdf}\%$'))

<IPython.core.display.Latex object>

In [6]:
# Simulation
trials = 1000
c_freq = 5
c_prob = c_freq / trials * 100
c_prob = round(c_prob, precision)
display(Latex(f'$P(X<=80\%)={c_prob}\%$'))

<IPython.core.display.Latex object>

Conculsion: Since 0.5% is less than 1%, we should reject $H_0$.

### Example 3: CDF for right tail

Jan has two brothers: Jonas and Niklas. Every day Niklas draws a name out of a hat to randomly select one of the three brothers to wash the dishes. Jan suspected that Niklas is cheating, so he kept track of the draws, and found that out of 11 draws, Jonas got picked 6 times.

Let's test the hypothesis that **each brother has an equal chance of $\displaystyle \frac{1}{3}$ of getting picked in each draw** versus the alternative that Jonas's probability is _greater_.

The table below sums up the results of $1000$ simulations, each simulating 11 draws with a probability of $\displaystyle \frac{1}{3}$ of Jonas getting picked.

**According to the simulations, what is the probability of Jonas getting picked 6 times or more out of 11?**

Let's agree that if the observed outcome has a probability _less_ than 1%, percent under the tested hypothesis, we will reject the hypothesis.

|# of times Jonas got picked out of 11|Frequency|
|:-:|:-:|
|0|13|
|1|64|
|2|159|
|3|238|
|4|238|
|5|167|
|6|83|
|7|30|
|8|7|
|9|1|
|10|0|
|11|0|

In [7]:
# Analytics
k, n, p = 6-1, 11, 1/3
precision = 2
cdf = 1 - binom.cdf(k, n, p)
cdf = round(cdf * 100, precision)
display(Latex(f'$P(X>=6)={cdf}\%$'))

<IPython.core.display.Latex object>

In [8]:
# Simulations
trials = 1000
c_freq = 83 + 30 + 7 + 1 + 0 + 0
c_prob = c_freq / trials * 100
c_prob = round(c_prob, precision)
display(Latex(f'$P(X>=6)={c_prob}\%$'))

<IPython.core.display.Latex object>

Conculsion: Since 12.1% is greater than 1%, we cannot reject $H_0$.