# Problems Jupyter Notebook

## Problem 1: Extending the Lady Tasting Tea
![The Lady Tasting Tea](https://upload.wikimedia.org/wikipedia/en/2/2d/The_Lady_Tasting_Tea_-_David_Salsburg.jpg)

### Objective: 
_Let's extend the Lady Tasting Tea experiment as follows. The original experiment has 8 cups: 4 tea-first and 4 milk-first. Suppose we prepare 12 cups: 8 tea-first and 4 milk-first. A participant claims they can tell which was poured first._

_Simulate this experiment using `numpy` by randomly shuffling the cups many times and calculating the probability of the participant correctly identifying all cups by chance. Compare your result with the original 8-cup experiment._

_In your notebook, explain your simulation process clearly, report and interpret the estimated probability, and discuss whether, based on this probability, you would consider extending or relaxing the p-value threshold compared to the original design._

In [67]:
# Import libraries that will be used:
import math
import itertools
import numpy as np
import matplotlib.pyplot as plt

In [68]:
# Set up out our variables:
# Total number of cups
num_cups = 12
# Number of tea first
num_tea = 8
# Number of milk first
num_milk = 4

Using math.comb(n, k) to calculate combinations.
Interstingly, as we have by guessing the number of milk first cups correctly, we automatically know the number of tea first cups correctly as well. Therefore we only need to calculate one side of the combination. It would not make sense to select the number of tea first cups, however we will explore the alternative to see if it makes a difference.


In [69]:
# math.comb(n, k) calculates the number of combinations of n items taken k at a time.
ways = math.comb(num_cups, num_milk)
ways


495

Therefore above one can see that the total number of ways to correctly guess which cups had milk added first is 495. An increase from the original experiment from Fisher of 70.
Using a labelling and placeholder system comparing to the original problem we see that the number of placeholders does not change just the number of labelled cups:

Original problem (8 cups of tea 4 milk first 4 tea first)
Cup labels: `1` `2` `3` `4` `5` `6` `7` `8`
Placeholders: `_` `_` `_` `_`

New problem (12 cups of tea 4 milk first 8 tea first)
Cup labels: `1` `2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12`
Placeholders: `_` `_` `_` `_`

In [70]:
# If the order of the cups mattered it would result in:
order_ways = 12*11*10*9
order_ways

11880

We can see in the above that as we increase the number of cups if selecting from 12 cups in 4 places and the order is critical that there is a 1/11880 chance of it being correct. However in this situation we do not care about the order therefore we can disregard ordering.

In [71]:
# Ways to shuffle our placeholders:
shuffles = 4*3*2*1
shuffles


24

In [72]:
# Therfore the number of combinations is:
num_combs= order_ways // shuffles
num_combs

495

Therefore one can see that the num_combs = ways

What if we look at the alternative does it change the number of combinations?
New problem (12 cups of tea 4 milk first 8 tea first)   
Cup labels: `1` `2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12`  
Placeholders: `_` `_` `_` `_` `_` `_` `_` `_`  


In [73]:
way_alt = math.comb(num_cups, num_tea)
way_alt

495

It appears that regardless of whether we are looking to identify the tea first or the milk first the number of combinations excluding a need for order are the same.  
For completeness lets do the maths using the increased number of placeholders as if the lady is identifying the tea first cups. 

In [74]:
# Using the same logic previously lets do calculations and compare
order_ways_alt = 12*11*10*9*8*7*6*5
order_ways_alt 

19958400

A much higher number of order ways do to their being more placeholders. We also will have a higher number of shuffles that we can do.

In [75]:
shuffles_alt = 8*7*6*5*4*3*2*1    
shuffles_alt

40320

Also a larger number due to the larger number of places for the cups of tea to be place in. This should result in the the same number of combinations 


In [76]:
num_combs_alt = order_ways_alt // shuffles_alt
num_combs_alt

495

Now that we know there are 495 combinations that can happen across the 12 cups of tea lets use `Numpy` to perform some simulations look at probabilities and compare to the original problem.

In [77]:
# Make a numpy array of the 12 cups with cups 0 to 3 are the milk first cups and cups 4 to 11 are the tea first cups
cups = np.arange(12)
milk_first = cups[:4]
tea_first = cups[4:]
milk_first, tea_first, cups

(array([0, 1, 2, 3]),
 array([ 4,  5,  6,  7,  8,  9, 10, 11]),
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]))

Now that we have our cups in an array next we can use `np.random.shuffle(cups)` to simulate random selection.

In [78]:
np.random.shuffle(cups)
cups

array([10,  9,  8,  0,  5,  2,  4,  1,  7, 11,  6,  3])

However as I used indexing to determine the milk first and tea first cups I need to come up with another way of identifying the cups.

In [79]:
milk_first = cups[cups <= 3]
tea_first = cups[cups > 3]
milk_first.sort()
tea_first.sort()
milk_first, tea_first

(array([0, 1, 2, 3]), array([ 4,  5,  6,  7,  8,  9, 10, 11]))

We can now see that our cups if we imagine them in a line are no longer all 4 milk first followed by all 8 tea first cups. Next we can use `np.random.choice()` to simulate the lady selecting the 4 milk first tea cups. We have already seen it doesn't effect the combinations if she selects the tea first or milk first so for simplicity we will assume that she would select the milk first cups as there are less of them. 

In [None]:
# We are not replacing the cups so we set replace=False
selection = np.random.choice(cups, size=4, replace=False)
# Sort the selection to make it easier to read and to compare
selection.sort()
selection

array([ 4,  5, 10, 11])

This works for one execution of the experiment and we can keep running the above cell to see if we can get the ladies desired result of [0, 1, 2, 3]. Let's scale this to a full blown experiment and track the results. 

In [81]:
# Running the expierment 100,000 times to see how often the lady selects all 4 milk first cups
attempts = 100000
successes = 0
for _ in range(attempts):
    selection = np.random.choice(cups, size=4, replace=False)
    selection.sort()
    if np.array_equal(selection, milk_first):
        successes += 1
success_rate = successes / attempts
success_rate

0.00206

After simulating the above multiple times I keep returning a success_rate of 0.0 so lets modify it to get the experiment to run until we get a correct selection and record how many attempts that takes.


In [82]:
cups, milk_first

(array([10,  9,  8,  0,  5,  2,  4,  1,  7, 11,  6,  3]), array([0, 1, 2, 3]))

In [89]:
attempts = 0
while True:
    attempts += 1
    selection = np.random.choice(cups, size=4, replace=False)
    selection.sort()
    if np.array_equal(selection, milk_first):
        break
    attempts
attempts_needed = attempts
success_rate_experimental = 1/attempts_needed 
attempts_needed, success_rate_experimental

(321, 0.003115264797507788)

## Problem 2: Normal Distribution

## Problem 3: t-Tests

## Problem 4: ANOVA

### References:
- A very helpful github guide on `markdown` used throughout for all markdown formatting [here](https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet)

## END