
# Applied Statistics
## Grainne Boyle


In [1]:
# I import the following libraries for various functionalities in my project:  

import math # Provides extensive mathematical functions.  
import itertools # Provides tools for creating iterators such as combinations() and permutations(). Helps you loop and combine things efficiently.  
import random # Used for generating random numbers and random selections 
import numpy as np # package for numerical computations and arrays
import matplotlib.pyplot as plt # Used for creating visualizations and plots

 

# Add some notes and links on your research of different import libraries and how you plan to use them in your project:


## Problem 1: Extending the Lady Tasting Tea

[Lady tasting tea](https://en.wikipedia.org/wiki/Lady_tasting_tea)
This project begins with a look at the lady tasting tea, a randomised experiment devised by Ronald Fisher. In the experiment, the lady is given eight cups of tea, four prepared by adding milk first and four by adding tea first. The lady's task is to correctly identify the four cups prepared by one method or the other. The null hypothesis is that the subject has no real ability to distinguish between the preparation method of the teas. The test statistic is a count of the number of successful attempts to select the four cups prepared by a given method. The distribution of possible number of successes, assuming the null hypothesis is true, can be computed using the number of combinations.

I used chatgpt to help format and render mathematical formulas in LaTex so that they display clearly in the markdown cells in this notebook. ​[LaTeX Math in Jupyter Documentation](https://jupyterbook.org/en/stable/content/math.html)

In the formula for combinations, 

$$
\binom{n}{k} = \frac{n!}{k!(n-k)!},
$$

we choose $k$ objects from $n$, where $n$ is the total number of cups and $k$ is the number of cups with milk poured first.

In the original experiment, the number of possible ways to choose 4 cups out of 8 is:

$$
\binom{8}{4} = \frac{8!}{4!(8-4)!} = 70
$$

Since only one of these combinations is completely correct, the probability of identifying all 4 correctly by chance is:

$$
P = \frac{1}{\binom{8}{4}} = \frac{1}{70} \approx 0.0143
$$

This means that the lady has a 1.4% chance of guessing correctly.

## To do, add some notes to explain the calculation above and factorials and also to explain the null hypothesis from research.



In [5]:
# In the continuation of this experiment, suppose we have 12 cups of tea in total — 8 poured tea first and 4 poured milk first.

# Number of cups of tea in total
no_cups = 12

# Number of cups of tea with milk in first 
no_cups_milk_first = 4

# Number of cups of tea with tea in first 
no_cups_tea_first = 8

# Number of ways that we can select cups with tea in first from a total of 12 cups
ways = math.comb(no_cups, no_cups_tea_first)

# Show the result
ways



495

In the continuation of this experiment  — we see the results

where, $n = 12$ and $k = 8$.

The number of ways to choose 8 cups with the tea in first from 12 cups is calculated as:

$$
\binom{12}{8} = \frac{12!}{8!(12-8)!}
$$

Evaluating this gives:

$$
\binom{12}{8} = 495
$$

So, there are 495 different possible combinations of 8 cups with the milk in first among 12 cups in total. 

As the total number of cups grow, the number of possible combinations increases.

The probability of identifying all 8 cups with the tea in first cups is then:

$$
P = \frac{1}{\binom{12}{8}} = \frac{1}{495} \approx 0.0020
$$

This means that the lady has a 0.20% chance of guessing correctly.

Note that:
\[
\binom{12}{8} = \binom{12}{4} = 495
\]
because choosing which 8 cups are tea-first automatically determines
which 4 are milk-first (the remaining ones). The probabilities are the same either way.

## Add more notes on the binomial coefficent and factorials


In [None]:
# Using numPy I will use the random number generation capabilities to reiterate the experiment many times

# Number of cups of tea with tea in first 
n_tea = 8

# Number of cups of tea with milk in first 
n_milk = 4

# Number of simulations
n_simulations = 100000

# Create an array representing the labels (1 = tea-first, 0 = milk-first). By using numPy's array , we can create a list showing all one's for tea-first and all zero's for milk-first.
correct = np.array([1]*n_tea + [0]*n_milk)

# Create a counter for correct matches
correct_matches = 0

# Run the simulation, this may take some time depending on the number of simulations

for _ in range(n_simulations):
    # Randomly shuffle the cups do that you get a different order each time
    shuffled_correct = np.random.permutation(correct)
    
    # Lady randomly guesses which 8 cups are tea-first
    guess = np.zeros_like(correct)
    guess[np.random.choice(len(correct), size=n_tea, replace=False)] = 1

    # If her guesses exactly match the true shuffled order, count a success
    if np.array_equal(guess, shuffled_correct):
        correct_matches += 1

# Calculate the probability
simulated_probability = correct_matches / n_simulations
exact_probability = 1/math.comb(n_tea + n_milk, n_tea)  # == 1 / comb(12, 8) == 1 / comb(12, 4)


print(f"Simulated probability: {simulated_probability:.6f}")
print(f"Exact probability:     {exact_probability:.6f}")



# In this code snippet, I simulate the lady's guessing process multiple times to estimate the probability of her correctly identifying all 8 tea-first cups. I use NumPy to handle array operations and random permutations efficiently. After running the simulations, I compare the simulated probability with the exact probability calculated using combinatorial mathematics.  
# The result of the simulated probability should be very close to the exact probability of 0.20% which it is, by running the code cell several times, each time the simulated probability was in the range of 0.0018 to .0021.  

Simulated probability: 0.001960
Exact probability:     0.002020


## Problem 2: Normal Distribution

## Problem 3: t-Tests

##  Problem 4: Anova

![abacus](img/statistics.jpg)

-----------------------------------
# END

