# Applied Statistics â€“ Problems

## Problem 1: Extending the Lady Tasting Tea

In this problem I simulate an extended version of the Lady Tasting Tea experiment.  
The original version uses 8 cups (4 tea-first and 4 milk-first).  
The extended version uses 12 cups (8 tea-first and 4 milk-first).

The idea is simple:
1. create the true order of cups  
2. shuffle them many times using NumPy  
3. count how often the shuffle matches the real order  

This estimates the chance of someone getting everything correct by guessing.

### Imports
Using NumPy for shuffling the cup labels.

In [12]:
import numpy as np  # NumPy docs: https://numpy.org/doc/
np.random.seed(42)  # keep results the same when running again

### Original experiment (8 cups)
There are 4 tea-first and 4 milk-first cups.  
I check how often a random shuffle matches the true order exactly.

In [13]:
labels_8 = np.array(["T"]*4 + ["M"]*4)

trials = 100000
correct_8 = 0

for _ in range(trials):
    guess = np.random.permutation(labels_8)
    if np.array_equal(guess, labels_8):
        correct_8 += 1

prob_8 = correct_8 / trials
prob_8

0.01459

### Extended experiment (12 cups)
Now there are 8 tea-first and 4 milk-first cups.  
Repeating the same simulation with the larger setup.

In [14]:
labels_12 = np.array(["T"]*8 + ["M"]*4)

correct_12 = 0

for _ in range(trials):
    guess = np.random.permutation(labels_12)
    if np.array_equal(guess, labels_12):
        correct_12 += 1

prob_12 = correct_12 / trials
prob_12

0.00201

### Showing both probabilities together
This makes the comparison easier.

In [15]:
print("Estimated probability (8 cups):", prob_8)
print("Estimated probability (12 cups):", prob_12)

Estimated probability (8 cups): 0.01459
Estimated probability (12 cups): 0.00201


### Interpretation

- The 8-cup experiment already has a small probability of being correct by guessing.  
- The 12-cup version is even harder and has an even smaller chance.  
- Because of that, if someone gets all 12 correct, it gives stronger evidence against guessing.  
- This means the p-value threshold could be more relaxed, since the experiment is naturally more demanding.

Overall, the 12-cup design makes random success extremely unlikely.