# Problem 1 - Extending the Lady Tasting Tea Experiment

The aim of this notebook, is to extend the classical *Lady Tasting Tea* experiment, as demonstrated in the module, to use a larger number of cups.

To achieve this, I will:

- Set up the original 8-cup experiment
- Set up extended 12-cup experiment,
- Use Numpy, to simulate both the the original 8 cup and extended 12-cup experiment
- Compare the simulated probabilities
- Interpret the results

# Importing Required Libraries

In [83]:
# Mathematical functions from the standard library.
# https://docs.python.org/3/library/math.html
import math

# Permutations and combinations.
# https://docs.python.org/3/library/itertools.html
import itertools

# Random selections.
# https://docs.python.org/3/library/random.html
import random

# Numerical structures and operations.
# https://numpy.org/doc/stable/reference/index.html#reference
import numpy as np

# Plotting.
# https://matplotlib.org/stable/contents.html
import matplotlib.pyplot as plt

import pandas as pd

# Experiment Set Up - Original and Extended

## Setup for Both Experiments

The designs of both the Experiments are detailed below:

1. **Original experiment as performed by Fisher:**
   - 8 cups in total
   - 4 cups with milk poured first
   - 4 cups with tea poured first

2. **Extended experiment  as carried out in this Notebook:**
   - 12 cups in total
   - 4 cups with milk poured first
   - 8 cups with tea poured first

For both experiments, the particiant is told there are exactly 4 milk-first cups and, in trn,
must try identify which 4 cups these are.

Under the null hypothesis (they possess no special ability to distuinguish between Milk First or Tea First), the participant is in essence
choosing 4 random cups at random from the total. The number of possible
guesses in each case is given by the binomial coefficient:

The expression $\binom{n}{k}$ represents the number of ways to choose $k$ cups from $n$ cups.


In the next two sections, I will set up both experiments and compute these potential combinations and probabilities of successfully identifying the four Milk First cups at random.


## Original Experiment - Cups of Tea

In [84]:
# --- Original 8-cup setup ---

original_num_cups = 8 # Setting the number of cups in the original experiment
original_num_milk_first = 4 # Setting the number of milk-first cups in the original experiment

original_num_comb = math.comb(original_num_cups, original_num_milk_first) # Calculating the number of combinations from the original experiment of choosing 4 cups from 8

# Exact probability of getting all 4 correct by chance.
original_prob_all_correct = 1 / original_num_comb # Calculating the exact probability of getting all 4 correct by chance in the original experiment

original_num_comb, original_prob_all_correct # Displaying the number of combinations and the exact probability of getting all 4 correct by chance in the original experiment

(70, 0.014285714285714285)

## Updated Experiment - Cups of Tea

In [85]:
# --- Extended 12-cup setup ---

ext_num_cups = 12 # Setting the number of cups in the extended experiment
ext_num_milk_first = 4 # Setting the number of milk-first cups in the extended experiment

ext_num_comb = math.comb(ext_num_cups, ext_num_milk_first) # Calculating the number of combinations from the extended setup of choosing 4 cups from 12

extended_prob_all_correct = 1 / ext_num_comb # Calculating the exact probability of getting all 4 correct by chance in the extended experiment

ext_num_comb, extended_prob_all_correct # Displaying the number of combinations and the exact probability of getting all 4 correct by chance in the extended experiment


(495, 0.00202020202020202)

# Running the Extended Experiment

## Identifying the number of Simulations to run using Monte Carlo Simulation

In [86]:
num_of_iters = [1_000, 5_000, 10_000, 50_000, 100_000, 1_000_000, 2_000_000] # Defining different numbers of simulations to run for estimating the probability

results = [] # Creating an empty list to store the results

for n in num_of_iters: # Stating to run the below steps for each number of simulations in num_of_iters, where n is the number of simulations being run
    # Generate a number of random values, where the value is greater then/ equal to 0 and less than 1, from np.random.rand.
    # Using n samples allows us to run n independent Monte Carlo trials in a single, efficient step.
    # A “success” occurs when a value is less than the true analytic probability (extended_prob_all_correct).
    # This comparison produces a Boolean array where True indicates a simulated correct full guess by chance. 
    # Taking the mean of this Boolean array gives the estimated probability based on these n simulated trials.
    estimate = (np.random.rand(n) < extended_prob_all_correct).mean()
    #Estimate the probability of getting all 4 correct by chance using n simulations.
    diff = estimate - extended_prob_all_correct # Calculating the difference between the estimate and the true probability
    results.append([n, estimate, diff]) # Appending the number of simulations, estimate, and difference to the results list

df_results = pd.DataFrame(results, columns=["Simulations", "Estimate", "Difference from Analytic Probability"]) # Creating a DataFrame from the results list with specified column names

df_results # Displaying the results DataFrame

Unnamed: 0,Simulations,Estimate,Difference from Analytic Probability
0,1000,0.001,-0.00102
1,5000,0.002,-2e-05
2,10000,0.0023,0.00028
3,50000,0.00234,0.00032
4,100000,0.00212,0.0001
5,1000000,0.002026,6e-06
6,2000000,0.001983,-3.7e-05


## Using NumPy to run the Extended Experiment with 12 Cups (4 Milk First)

Again, under the null hypothesis, the participant is identifying the 4 Milk First cups at random from an extended number of 12 cups total.  

In order to estimate the probability ofall 4 milk-first cups being correctly identified in the extended 12-cup experiment, we will run the experiment 100,000 times using NumPy.

The analytic probability is \(1/495\).  

The experiment below provides an estimated probability using 100,000 runs of the experiment.


### Running Extended Experiment

In [87]:
ext_milk_first_labels = set(range(ext_num_milk_first)) # Defining the Milk-First cup labels in the extended experiment

num_sims = 100000 # Setting the number of simulations to run based on Monte Carlo Simulation.

rng = np.random.default_rng() # Using NumPy's default random number generator to allow for the simulation to be repeatable

ext_num_of_successes = 0 # Setting the initial number of successes to zero

for i in range(num_sims): # Stating to run the below steps for number of simulations in num_sims
    # Randomly guessing which cups are milk-first without replacement.
    # Assigning the guessed milk-first cup labels to guessed_labels.
    # Using rng.choice to randomly select ext_num_milk_first unique cup labels from the total ext_num_cups.
    # ext_num_cups: Total number of cups (12 in this case).
    # size=ext_num_milk_first: Number of milk-first cups to guess (4 in this case).
    # replace=False: Ensures that the same cup label is not chosen more than once (no replacement).
    ext_guessed_labels = rng.choice(ext_num_cups, 
                                size=ext_num_milk_first,
                                replace=False)
    
    if set(ext_guessed_labels) == ext_milk_first_labels: # Checking if the guessed labels match the actual milk-first labels
        ext_num_of_successes += 1 # Incrementing the number of successes if the guess is correct

estimated_prob_all_correct_extended = ext_num_of_successes / num_sims # Calculating the estimated probability of getting all 4 correct by chance in the extended experiment based on the simulation

ext_num_of_successes, num_sims, estimated_prob_all_correct_extended # Displaying the number of successes, number of simulations, and the estimated probability from the simulation


(222, 100000, 0.00222)

### Running Simulation of Original Experiment

In [None]:
orig_milk_first_labels = set(range(original_num_milk_first)) # Defining the Milk-First cup labels in the original experiment


orig_num_of_successes = 0 # Setting the initial number of successes to zero

for i in range(num_sims): # Stating to run the below steps for number of simulations in num_sims
    # Randomly guessing which cups are milk-first without replacement.
    # Assigning the guessed milk-first cup labels to guessed_labels.
    # Using rng.choice to randomly select original_num_milk_first unique cup labels from the total original_num_cups.
    # original_num_cups: Total number of cups (8 in this case).
    # size=original_num_milk_first: Number of milk-first cups to guess (4 in this case).
    # replace=False: Ensures that the same cup label is not chosen more than once (no replacement).
    orig_guessed_labels = rng.choice(original_num_cups, 
                                size=original_num_milk_first,
                                replace=False)
    
    if set(orig_guessed_labels) == orig_milk_first_labels: # Checking if the guessed labels match the actual milk-first labels
        orig_num_of_successes += 1 # Incrementing the number of successes if the guess is correct

estimated_prob_all_correct_orig = orig_num_of_successes / num_sims # Calculating the estimated probability of getting all 4 correct by chance in the extended experiment based on the simulation

orig_num_of_successes, num_sims, estimated_prob_all_correct_orig # Displaying the number of successes, number of simulations, and the estimated probability from the simulation


(1423, 100000, 0.01423)

# Comparison of Original and Extended Experiment Probabilities

Placeholder for analysis

# Interpretation of Results

Placeholder for analysis

# References
1. Number of Simulations - https://statisticsbyjim.com/probability/monte-carlo-simulation/
2. Number of Simulations - https://www.statology.org/how-to-perform-monte-carlo-simulations-in-python-with-example/
3. Number of Simulations - https://www.statology.org/the-concise-guide-to-monte-carlo-simulation/
4. Numpy.Random - https://realpython.com/numpy-random-number-generator/
5. Numpy.Random - https://www.geeksforgeeks.org/python/numpy-random-rand-python/
6. Numpy.Random - https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html
7. Numpy.Random - https://sparkbyexamples.com/python/python-numpy-random-rand-function/
8. Numpy.Random - https://www.programiz.com/python-programming/numpy/random
9. For Loops - https://www.geeksforgeeks.org/python/loops-in-python/
10. Bernouli Trials - https://www.geeksforgeeks.org/maths/bernoulli-trials-binomial-distribution/
11. Bernouli Trials - https://github.com/uros-bojanic/bernoulli-trials
12. Monte Carlo Simulation and Bernouli - https://medium.com/@polanitzer/monte-carlo-simulation-and-bernoulli-distribution-in-python-predict-next-years-tax-exemption-ec6f3d8b6ba9
13. Random Number Generator - https://realpython.com/numpy-random-number-generator/
14. Random.Choice - https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html
15. Random.Choice - https://www.geeksforgeeks.org/python/random-choices-method-in-python/
16. 
17. 
18. 