# Answers to Hdip Data Analytics - Applied Statistics Module Problems
---
Problems: https://github.com/ianmcloughlin/applied-statistics/blob/main/assessment/problems.md

## Problem 1: Extending the Lady Tasting Tea

This section demonstrates a simulation of an extended version of the famous 'Lady Tasting Tea' experiment. I will simulate the probability that a participant correctly identifies all cups purely by chance. Then, I will compare the extended experiment (12 cups: 8 tea-first, 4 milk-first) with the original design (8 cups: 4 tea-first, 4 milk-first).

In [6]:
# Import required libraries

# The numpy library provides support for efficient numerical computations and randomization.
# See: https://numpy.org/doc/stable/
import numpy as np
# The matplotlib.pyplot module offers plotting functions for data visualization.
# See: https://matplotlib.org/stable/api/pyplot_api.html
import matplotlib.pyplot as plt
# The seaborn library provides a high-level interface for statistical graphics.
# See: https://seaborn.pydata.org/api.html
import seaborn as sns

In [7]:
# Set Seaborn style
sns.set_theme(style="whitegrid", context="notebook")

### Step 1: Understanding the experiment setup
---

**Original experiment:** 8 cups (4 tea-first, 4 milk-first)

**Extended experiment:** 12 cups (8 tea-first, 4 milk-first)

The participant tries to identify which cups are which. I simulate random guessing many times to estimate the probability of correctly identifying all cups.


### Step 2: Define simulation function
---

In [12]:

def simulate_lady_tasting_tea(n_cups, n_tea_first, n_trials=100000):
    """Simulate the Lady Tasting Tea experiment.

    n_cups: total cups
    n_tea_first: number of tea-first cups
    n_trials: number of simulations
    
    Returns estimated probability of correctly guessing all cups."""
    # Initialize a counter for the number of times a random guess matches the true order
    correct = 0
    # Create an array representing the actual cup order
    # 1 represents tea poured first, 0 represents milk poured first
    # See: https://numpy.org/doc/stable/reference/generated/numpy.array.html
    cups = np.array([1]*n_tea_first + [0]*(n_cups - n_tea_first))
    # Repeat the simulation for n_trials
    for _ in range(n_trials):
        # Shuffle the cups randomly to simulate a guess
        # See: https://numpy.org/doc/stable/reference/random/generated/numpy.random.permutation.html
        guess = np.random.permutation(cups)  # random guessing
        # Check if the guess exactly matches the actual cup order
        # See: https://numpy.org/doc/stable/reference/generated/numpy.array_equal.html
        if np.array_equal(guess, cups):
            # Increment the counter if the guess is completely correct
            correct += 1
    # Return the estimated probability of a correct guess
    return correct / n_trials


### Step 3: Run simulations
---

(will be updated) - setting a seed ensures random operation is reproducible.

In [13]:
# Set random seed for reproducibility
# See: https://numpy.org/doc/stable/reference/random/generated/numpy.random.seed.html
np.random.seed(42)

estimated probability for both original experiment and extended version. 

In [14]:
# Run the simulation for both experiment designs.
prob_original = simulate_lady_tasting_tea(8, 4)
prob_extended = simulate_lady_tasting_tea(12, 8)

In [15]:
# Run simulations for original 8-cup and extended 12-cup experiments
print(f"Estimated probability (Original 8-cup): {prob_original:.8f}")
print(f"Estimated probability (Extended 12-cup): {prob_extended:.12f}")

Estimated probability (Original 8-cup): 0.01459000
Estimated probability (Extended 12-cup): 0.002010000000
