Hypothesis testing is crucial in inferential statistics for inferring population parameters from sample data. It involves two key components:

Null Hypothesis: a statement that suggests no significant difference or effect.
Alternative Hypothesis: a statement that contradicts the null hypothesis, suggesting a significant difference or effect.
This notebook covers several hypothesis tests:

1. One Population Proportion
2. Difference in Population Proportions
3. One Population Mean
4. Difference in Population Means
We'll also introduce functions from the statsmodels Python package, aiding in the calculation of t-statistics, z-statistics, and corresponding p-values for hypothesis testing.

In [11]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import statmodels.api as sm 

ModuleNotFoundError: No module named 'statmodels'

One Population Proportion
Research Question
In previous years, 52% of parents believed that electronics and social media was the cause of their teenager’s lack of sleep. Do more parents today believe that their teenager’s lack of sleep is caused due to electronics and social media?

Population: Parents with a teenager (age 13-18)
Parameter of Interest: p

Null Hypothesis: p = 0.52
Alternative Hypthosis: p > 0.52 (note that this is a one-sided test)

Data: 1018 people were surveyed. 56% of those who were surveyed believe that their teenager’s lack of sleep is caused due to electronics and social media.

In [None]:
n = 1018
pnull = .52 
phat = .56

In [None]:
sm.stats.proportions_ztest(phat*n,n,pnull,alternative ='larger')

Based on the small calculated p-value from the z-test, we reject the Null hypothesis, indicating that the percentage of parents attributing their teenager's lack of sleep to electronics and social media is unlikely to be the same as the previous estimate of 52%. While we don't formally accept the alternative hypothesis, this suggests a strong possibility that this proportion exceeds 52%.

Research Question:
Is there a significant difference between the proportions of parents of black children and parents of Hispanic children who report that their child has had some swimming lessons?

Populations:
All parents of black children aged 6-18 and all parents of Hispanic children aged 6-18.

Parameter of Interest:
The difference between the proportions of parents (p1 - p2), where p1 represents parents of black children and p2 represents parents of Hispanic children.

Null Hypothesis:
The difference in proportions of parents who report their child has had swimming lessons between black and Hispanic children is zero.

Alternative Hypothesis:
There is a difference in proportions of parents who report their child has had swimming lessons between black and Hispanic children.

Data:

247 Parents of Black Children: 36.8% report their child has had some swimming lessons.
308 Parents of Hispanic Children: 38.9% report their child has had some swimming lessons.

In [None]:
n1 = 247
p1 = .37

n2 = 308
p2 = .39

population1 = np.random.binomial(1, p1, n1)
population2 = np.random.binomial(1, p2, n2)
sm.stats.ttest_ind(population1, population2)

Given the relatively high p-value of approximately 0.768, we fail to reject the Null hypothesis. This indicates that there is insufficient evidence to conclude a statistically significant difference in the population proportions between parents of black children and parents of Hispanic children regarding their reports of their child having had swimming lessons.


Research Question:
Is the average cartwheel distance (in inches) for adults greater than 80 inches?

Population:
All adults.

Parameter of Interest:
The population mean cartwheel distance.

Null Hypothesis:
The population mean cartwheel distance is equal to 80 inches.

Alternative Hypothesis:
The population mean cartwheel distance is greater than 80 inches.

Data:
There are 25 adult participants.

In [None]:
cwdata = np.array([80.57, 98.96, 85.28, 83.83, 69.94, 89.59, 91.09, 66.25, 91.21, 82.7 , 73.54, 81.99, 54.01, 
                 82.89, 75.88, 98.32, 107.2 , 85.53, 79.08, 84.3 , 89.32, 86.35, 78.98, 92.26, 87.01])

In [None]:
n = len(cwdata)
mean = cwdata.mean()
sd = cwdata.std()
(n, mean, sd)

In [None]:
sm.stats.ztest(cwdata, value = 80, alternative = "larger")

Given that the p-value (0.0394) is less than the standard confidence level of 0.05, we reject the Null hypothesis, suggesting that the mean cartwheel distance for adults (a population parameter) is not equal to 80 inches. There is compelling evidence to support the alternative hypothesis, indicating that the mean cartwheel distance is indeed greater than 80 inches. It's worth noting that we employed the "larger" alternative parameter in the z-test.

Additionally, we can visualize the distribution of the data using a histogram to assess if it approximately conforms to a Normal distribution.

In [None]:
plt.hist(cwdata,bins=5,edgecolor='k')
plt.show()