# Applied Statistics Problem Notebook

## Lady Tasting Tea Experiment

### The Experiment
https://jyyna.co.uk/lady-tasting-tea/
This notebook presents an alteration to the classic Lady Tasting Tea Experiment, originally formulated by Ronald Fisher. Fisher wanted to test Muriel Bristol's claim that she could distinguish in a cup of tea whether the milk was added before or after the tea, and if she was successful, determine if she actually possessed the ability or if her success occurred due to chance. 

In the original experiment, 8 cups of tea were prepared: 4 with milk added first, and 4 with tea added first. With the cups presented in random order, the Lady had to distinguish the two brewing methods. 

Lady Bristol did indeed correctly identify all 8 cups. Fisher determined that the probability of this occurring by chance was low. The findings supported a rejection of the null hypothesis that her success occurred due to chance. 

In this updated version of the experiment, 12 cups of tea will be 'prepared' for our proverbial Lady to sample. In 8 cups, tea will be added first, with the remaining 4 being prepared with milk first. The aim is to examine how the probability of the Lady correctly identifying all 12 cups compares to the probability in the original experiment. 

For reference, the probability of the Lady selecting all 8 cups correctly in the original experiment was 1/70 or approximately 0.0143. A p-value below .05 is generally considered statitically significant by convention- https://measuringu.com/setting-alpha/ - minimising the risk of a Type 1 error. A more stringent threshold for significance could require a p-value below .01 or even .001. This is often used in medical research where the effects of a false positive could mean attributing therapeutic effects to a drug where they are not present, leading to devastating consequences. However, in many cases a threshold of .05 balances the risk of a Type 1 and Type 2 error.

In [1]:
# Mathematical functions from the standard library.
# https://docs.python.org/3/library/math.html
import math

# Permutations and combinations.
# https://docs.python.org/3/library/itertools.html
import itertools

# Random selections.
# https://docs.python.org/3/library/random.html
import random

# Numerical structures and operations.
# https://numpy.org/doc/stable/reference/index.html#reference
import numpy as np

# Dataframes for visualisation of experiment
#https://pandas.pydata.org/docs/
import pandas as pd

# Plotting.
# https://matplotlib.org/stable/contents.html
import matplotlib.pyplot as plt

In [None]:
# Number of cups of tea in total.
no_cups = 12

# Number of cups of tea with milk in first.
no_cups_milk_first = 4

# Number of cups of tea with tea in first.
no_cups_tea_first = 8

In [7]:
# Create a dataframe to store all 12 cups
tea_df = pd.DataFrame()

# Insert a column which specifies whether milk has been poured first.
# np.repeat() to specify elements https://numpy.org/doc/2.3/reference/generated/numpy.repeat.html
tea_df['milk_first'] = np.repeat(['yes', 'no'], [4, 8])

# Insert a column which specifies the Lady's guesses
tea_df['guesses_milk_first'] = np.repeat(['yes', 'no'], [4, 8])

# Use pd.DataFrame.sample to put rows in a random order- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html
tea_df = tea_df.sample(12, replace=False)

# Reset the index so original position is not shown
tea_df = tea_df.reset_index(drop=True)

In [8]:
tea_df

Unnamed: 0,milk_first,guesses_milk_first
0,no,no
1,no,no
2,no,no
3,yes,yes
4,yes,yes
5,no,no
6,yes,yes
7,yes,yes
8,no,no
9,no,no


In [None]:
# Number of ways of selecting four cups from eight.
ways = math.comb(no_cups, no_cups_milk_first)

# Show.
ways