# Exercise 1

***

## Part 1

***

### Calculate the minimum number of cups of tea required to ensure the probability of randomly selecting the correct cups is less than or equal to 1%.
<br>

***

The lady tea test is a simple count of the number of successes in selecting the 4 cups (the number of cups of the given type successfully selected). The distribution of possible numbers of successes can be computed using the number of combinations. Using the combination formula, with n=8 total cups and k=4 cups chosen, there are:

<br>
$${8 \choose 4} = \frac{8!}{4!(8-4)!} = 70$$<br>
<br>
possible combinations.
Here, the first part of the equation means 8 choose 4.[02] 

![lady_tea](img/lady_tea.png)

**Fig 1:** Possible outcomes of the "lady tasting tea" experiment.

The following is adapted from https://stackoverflow.com/a/4941932

We must first import the math module of python becuase we are going to need the math.comb method.

In [1]:
# Math module.
import math

The math.comb() method, also known as combinations, returns the number of ways to choose k unordered outcomes from n possibilities without repetition.[01]

**Parameters**

**n**: The positive integers of items from which to choose

**k**: The positive integers of items to choose

In [2]:
# Number of ways of selecting 4 cups from 8.
math.comb(8, 4)

70

This is another way of writing the mathematical formula above, but this time using python.<br>
The probability of randomly selecting the correct 4 cups is:

In [3]:
1.0 / math.comb(8, 4)

0.014285714285714285

We can say that if the experiment was done with 8 cups total, four with milk in first and four with tea in first, then the chance of selecting correctly randomly is about 1.5%. <br>
Now let's have a look at the number of ways of selecting 5 cups from 10.

In [4]:
math.comb(10, 5)

252

The probability of randomly selecting correct 5 cups is:

In [5]:
1.0 / math.comb(10, 5)

0.003968253968253968

So, if the experiment was done with 10 cups total, five with milk in first and five with tea in first, then the chance of selecting correctly randomly is much less than 1%.

Of course, we could design the experiment to have 9 cups in total, with 4 with milk in first and 5 with tea in first - or vice versa.

In [6]:
# Number of ways of selecting 4 cups from 9.
math.comb(9, 4)

126

In [7]:
# The probability is then:
1.0 / math.comb(9, 4)

0.007936507936507936

This is less than 1% too.

Note that the number of ways of selecting 5 cups from 9 is:

In [8]:
math.comb(9, 5)

126

This is because with 9 cups, we are segregating them into groups of 4 and 5. 4 with the milk in first and 5 with the tea in first or vice versa. So 9 choose 4 is the same as 9 choose 5 because of the way that the 2 groups of cups are separated.


### *Bonus Question:* How many would be required if you were to let the taster get one cup wrong while maintaining the 1% threshold?

As per above, the number of ways of selecting 4 cups from 8 is:

In [9]:
math.comb(8, 4)

70

For the lady to get all 4 cups correct, there is clearly only one set of four choices (namely, choosing all four correct cups). This is where the 1 comes from in the numerator:

In [10]:
1.0/70.0

0.014285714285714285

But to allow the lady to get one cup wrong, the sets of for choices are 4 choose 3 and 4 choose 1. 

In [11]:
math.comb(4, 3)

4

In [12]:
math.comb(4, 1)

4

Thus a selection of any one correct cup and any three correct cups can occur in any of 4×4 = 16 ways. [03]

In [13]:
16.0/70.0

0.22857142857142856

Therefore, the lady has a greater than 20% chance to get 3/4 cups correct. For the lady to get one cup wrong while maintaining the 1% threshold, the number of cups would have to be increased in the experiment. Let's try with 10 cups overall.[03]

In [14]:
math.comb(10, 5)

252

To allow the lady to get one cup wrong, the sets of for choices are 5 choose 4 and 5 choose 1. 

In [15]:
math.comb(5, 4)

5

In [16]:
math.comb(5, 1)

5

Thus a selection of any one correct cup and any four correct cups can occur in any of 5×5 = 25 ways. [03]

In [17]:
25.0/252.0

0.0992063492063492

The lady has less than a 1% chance to get 4/5 cups correct.

## Part 2

***

### Use <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fisher_exact.html" style="color: #ff791e">scipy's version of Fisher's exact test</a> to simulate the Lady Tasting Tea problem.
***

Scipy's version of Fisher's exact test is a scipy method that performs a Fisher exact test on a 2x2 contingency table. A contingency table displays frequencies for combinations of two categorical variables. They can also be refered to as crosstabulation and two-way tables. Contingency tables classify outcomes for one variable in rows and the other in columns. The values at the row and column intersections are frequencies for each unique combination of the two variables. [06]<br>
So, for the lady tea tasting experiment, for the lady to guess all four cups correctly, The below contingency table (table 1) illustrates `scipy.stats.fisher_exact`.

|| Actual Tea | Actual Milk |
| :- | -: | :-: |
| Selected Tea | 4 | 0|
| Selected Milk | 0 | 4|

<div align="center"><b>Table 1:</b> contingency table for the lady to select 4/4 correct cups.[04]</div>

The 'Actual Tea' refers to the tea with the tea in first and is the correct answer. 'The Actual Milk' refers to the tea with the milk in first and is the correct answer. The 'Selected Tea' refers to the tea with the tea in first that the lady selects. The 'Selected Milk' refers to the tea with the milk in first that the lady selects. <br>


`scipy.stats.fisher_exact` takes in 2 paramaters. 'Table', which is a 2x2 contingency table and 'alternative' which needs to be explored below. Let's start by importing scipy statistical methods.

In [18]:
# Statistical methods.
import scipy.stats as ss

In [19]:
ss.fisher_exact

<function scipy.stats.stats.fisher_exact(table, alternative='two-sided')>

We can see from the above code that the default of the parameter 'alternative' is 'two sided'. Let's insert the contingency table for the lady to guess 4 cups correctly as the first parameter in `scipy.stats.fisher_exact`

In [20]:
ss.fisher_exact([[4, 0], [0, 4]])

(inf, 0.028571428571428536)

The answer looks to be 2 times the value that we got from using `math.comb()` from the first section of this notebook.

In [21]:
# Answer from section one multiplied by 2.
0.014285714285714285 * 2

0.02857142857142857

To explore this, I changed the default parameters as seen below:

In [22]:
ss.fisher_exact([[4, 0], [0, 4]], alternative = 'less')

(inf, 1.0)

In [23]:
ss.fisher_exact([[4, 0], [0, 4]], alternative = 'greater')

(inf, 0.014285714285714268)

It seems that when the 'alternative' parameter is changed to 'greater', we are getting the target answer that was confirmed in the first section.

The first value returned is called the oddsratio and the second value returned is the p-value which the probability. <br>
Let's first define the oddsratio to better understand this.

|Column 1| Column 2 |
| :- | -: | 
| a | b |
| c | d | 

<div align="center"><b>Table 2:</b> Generic contingency table.</div>

The odds ratio is calculated directly from the table (table 2):
<br>
$$Odds ratio = \frac{\frac{a}{b}}{\frac{c}{d}} = \frac{ad}{bc}$$<br>
<br>

This explains why we see inf in this case. It happens when either of the letters are 0.[07]

In [24]:
ss.fisher_exact([[3, 1], [1, 3]], alternative = 'greater')

(9.0, 0.24285714285714263)

We can see above that when we use non-zero numbers in the table that the odds ration is an integer.

Going back to the alternative 'parameter', there are two types of alternative. Two-sided and one-sided. The two types of one-sided are 'less' and 'greater'. [07] They are also sometimes known as left and right tailed. A 'Two-sided' Fisher’s Exact Test is when the null hypothesis odds ratio is equal to 1 and the alternative hypothesis is not equal to 1. A 'Less' Fisher’s Exact Test is when the null hypothesis odds ratio is greater than or equal to 1 and the alternative hypothesis is less than 1. A 'Greater' Fisher’s Exact Test is when the null hypothesis odds ratio is less than or equal to 1 and the alternative hypothesis is greater than 1. See the below summary of alternatives[07]:

**Two-sided Fisher’s Exact Test:**<br>
ho: The odds ratio is equal to 1<br>
ha: The odds ratio is not equal to 1<br>
**“Less” Fisher’s Exact Test:**<br>
ho: The odds ratio ≥ 1<br>
ha: The odds ratio is <1<br>
**“Greater” Fisher’s Exact Test:**<br>
ho: The odds ratio is ≤1<br>
ha: The odds ratio is > 1<br>

This means that for the lady to get all cups correct and for the experiment to have the same result as using `math.comb()` as in section one, the alternative hypothesis must be that the odds ratio is greater than 1.

***

## References

***

[01][python-programs.com - Python math.comb() Method with Examples](https://python-programs.com/python-math-comb-method-with-examples/)<br>
[02][handwiki.org - Lady Tasting Tea](https://handwiki.org/wiki/Lady_tasting_tea)<br>
[03][Wikipedia - Lady tasting tea](https://en.wikipedia.org/wiki/Lady_tasting_tea)<br>
[04][Tables in Markdown (in Jupyter)](https://stackoverflow.com/questions/48655801/tables-in-markdown-in-jupyter)<br>
[05][docs.scipy.org - scipy.stats.fisher_exact](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fisher_exact.html)<br>
[06][statisticsbyjim.com - Contingency Table: Definition, Examples & Interpreting](https://statisticsbyjim.com/basics/contingency-table/)<br>
[07][towardsdatascience.com - Fully Mastering Fisher’s Exact Test for A/B Testing](https://towardsdatascience.com/fishers-exact-fb49432e55b5)<br>