In [6]:
from datascience import *
import numpy as np
from math import *
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

## Lesson 23: Hypothesis Testing, Continued

Recall in Lesson 22, we covered hypothesis testing. The structure of a hypothesis test is largely similar regardless of the context of the problem. We state the hypotheses, decide on a test statistic, calculate the $p$-value and reach a conclusion. To calculate a $p$-value, we need to find the distribution of the test statistic under the null hypothesis. 

### Example 1: The Lady Tasting Tea

The "lady tasting tea" problem is a now famous story during which, at a gathering one summer afternoon in Cambridge, some friends drank tea with milk. Among them, a woman claimed to be able to tell, based on taste, whether the milk or the tea was added first to the cup. A now famous statistician, Ronald Fisher, was at the gathering, and he studied the claim. The woman was offered 8 cups of tea mixed with milk (4 with milk added first and 4 with tea added first) and she successfully identified 6 (3 of each). What can we say about her ability to discriminate the teas? 

Step 1:

**Null Hypothesis:** The lady tasting the tea is unable to discern between the two types of teas (i.e. a random guess would be just as accurate).

**Alternative Hypothesis:** The lady *is* able to discern between the two types of teas (i.e. one-sided data).

Step 2: (Test Statistic)

**X:** the number of tea cups that are correctly identified out of the 8 original cups. Note that due to the structure of the experiment, this can only be an even number less than or equal to 8.

Step 3: We find the p-value associated with 6 correct out of 8 if the Null Hypothesis is true. This assumes a hypergeometric distribution, where the expected value is 4.

In [7]:
# Simulation using stats.hypergeom
Teas = stats.hypergeom.rvs(8,4,4,size=10**4)
p_val = np.sum(Teas >= 3)/10**4 # 3 out of 4 test
print('The simulated p-value associated with 6/8 correct answers is:',p_val)

The simulated p-value associated with 6/8 correct answers is: 0.2533


Step 4: In conclusion, we fail to reject the Null Hypothesis with a p-value of approximately 0.24. There is insufficient evidence to find that the lady is able to discriminate between the two types of teas.

### Example 2: iris dataset

The `iris` dataset is common in introductory statistics. It shows various characteristics of three different species of irises. Let's determine whether the virginica species has a larger mean sepal width than that of versicolor. 

In [8]:
iris=Table().read_table("iris.csv")
iris.group(4,np.mean)

species,sepal_length mean,sepal_width mean,petal_length mean,petal_width mean
setosa,5.006,3.418,1.464,0.244
versicolor,5.936,2.77,4.26,1.326
virginica,6.588,2.974,5.552,2.026


Step 1:

**Null Hypothesis:** The virginica species has an average sepal width that is less than or equal to that of versicolor.

**Alternative Hypothesis:** The virginica species has an average sepal width that is greater than that of versicolor.

Step 2: (Test Statistic)

**X:** the difference in average sepal widths between the two sampled species

Step 3: We find the p-value associated with the given statistics of mean sepal widths within the iris population.

In [9]:
# Sample
iris_sub=iris.select(4,1).where(0,are.not_containing('setosa'))
obs=np.diff(iris_sub.group(0,np.mean).column(1))[0]
obs

0.20399999999999974

In [11]:
# Bootstrap Method using sample()

num_iris_sample = iris_sub.num_rows
diff_sw = [] # initialize

for i in np.arange(10**4):
    iris_sample = iris_sub.with_column('sample',iris_sub.sample(num_iris_sample,with_replacement=False).column(1))
    diff_sw = np.append(diff_sw,np.diff(iris_sample.group(0,np.mean).column(2))[0])

p_val = np.sum((diff_sw >= obs))/10**4
print('The simulated p-value associated with the iris samples is:',p_val)

The simulated p-value associated with the iris samples is: 0.0007


Step 4: In conclusion, we are able to reject the Null Hypothesis with a simulated p-value of approximately $7 \times 10^{-4}$. Therefore, it is highly likely that the virginica iris species has an average sepal width that is greater than that of versicolor.

Documentation: C2C Williams helped me with the syntax of coding step 3 of problem 2. I did not understand how to use sample to complete the simulation.