# Applied Statistics Tasks

**Francesco Troja**

***

## Task 1: **Permutations and Combinations**

<figure style="text-align:center;">
    <img src="T1-fisher.png" alt="Lady Testing Tea" width="400"/>
    <figcaption>Photo credit<sup>1</sup> </figcaption>
</figure>

### Table of Contents
1. [Problem Statement](#problem_statement)
2. [Introduction to the problem](#2)
3. [Import Python libraries](#3)
4. [Understand the Problem](#4)
   -   [4.1 Permutations](#4_1)
   -   [4.2 combinations](#4_2)
5. [Visualize the experiment](#5)
6. [References](#references)

### 1. Problem Statement <a class="anchor" id="problem_statement"></a>
> Suppose we alter the Lady Tasting Tea experiment to involve twelve cups of tea. Six have the milk in first and the other six having tea in first. A person claims they have the special power of being able to tell whether the tea or the milk went into a cup first upon tasting it. You agree to accept their claim if they can tell which of the six cups in your experiment had the milk in first.
>
>Calculate, using Python, the probability that they select the correct six cups. Here you should assume that they have no special powers in figuring it out, that they are just guessing. Remember to show and justify your workings in code and MarkDown cells.
>
>Suppose, now, you are willing to accept one error. Once they select the six cups they think had the milk in first, you will give them the benefit of the doubt should they have selected at least five of the correct cups. Calculate the probability, assuming they have no special powers, that the person makes at most one error. 
>
>Would you accept two errors? Explain.

### 2. Introduction to the problem <a class="anchor" id="2"></a>


Before delving into problem resolution, it’s important to provide some context to better understand the origins of the experiment and why it was conducted. This background will help clarify how to approach the task at hand. The "**Lady Tasting Tea**" experiment is a essential example in the field of *statistics* and *hypothesis testing*, introduced by **Ronald A. Fisher** in the 1920s. The experiment was inspired by a claim made during a social gathering at Cambridge, where **Muriel Bristol**, a biologist, confidently stated that she could *distinguish* whether *milk or tea was poured first into a cup of tea*. Fisher, intrigued by her assertion, saw this as an opportunity to design a simple yet robust experiment to test her claim and demonstrate key principles of hypothesis testing **$^2$**.

Following Fisher's words in his book The Design of Experiments (1935, p. 13) **$^3$**:
> Our experiment consists in mixing eight cups of tea, four in one way and four in the other, and presenting them to the subject for judgment in a random order. The subject has been told in advance of what the test will consist, namely that she will be asked to taste eight cups, that these shall be four of each kind, and that they shall be presented to her in a random order, that is in an order not determined arbitrarily by human choice, but by the actual manipulation of the physical apparatus used in games of chance cards, dice, roulettes, etc., or, more expeditiously, from a published collection of random sampling numbers purporting to give the actual results of such manipulation. Her task is to divide the 8 cups into two sets of 4, agreeing, if possible, with the treatments received.

#### 3. Import Python libraries

To execute this task, several Python libraries have been utilized. These libraries were chosen for their specific functionalities and capabilities, tailored to the requirements of the project:

- `math`: It provides access to mathematical functions like trigonometry, logarithms, factorials, and other common math operations and it's essential for performing standard mathematical calculations **$^4$**.
- `itertools`: It offers functions for creating iterators for efficient looping. It includes tools for permutations, combinations, product, and other iterator algebra operations **$^5$**. 
- `random`: It is used to generate pseudo-random numbers for various distributions. It's commonly used for simulations, random sampling, and other tasks that require randomization, such as shuffling data or making random choices **$^6$**.
- `numpy`: It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays **$^7$**.
- `matplotlib.pyplot`: It is a widely used library for data visualization in Python. It provides a flexible and comprehensive set of tools to create various types of plots and charts. Its versatility allows to create bar charts, line plots, scatter plots, histograms, and more, making it an essential tool for exploratory data analysis and presentation of findings **$^8$**.

In [1]:
# adding the pytonn libraries
import math
import itertools
import random
import numpy as np
import matplotlib.pyplot as plt

### 4. Understand the Problem

In this variation of the experiment, the task is to determine the probability that the participant correctly selects the **six cups** where *milk was poured first* (or tea, depending on the scenario), assuming random guessing without any special knowledge. This involves solving a combinatorial problem to calculate the chances of *selecting six correct cups out of twelve*. Referring again to The Design of Experiments, it is noteworthy that Fisher himself advocated for the use of **permutations** and **combinations**—*two essential concepts in combinatorics*, which is the branch of mathematics focused on counting and arrangement:

> [...] There are 70 ways of choosing a group pf 4 objects out of 8. This may be demonstrated by an argument familiar to students of "permutations and combinations," namely, that if we were to choose the 4 objects in succession we hould have successively 8, 7, 6, 5 objects to choose from, and could make our succession of choices in 8x7x6x5, or 1680 ways.

(Fisher, 1935. p.14)

By utilizing these combinatorial methods, the number of possible outcomes can be systematically analyzed, allowing for more precise experimental design and interpretation. In the following section, the concepts of **permutations** and **combinations** will be explored further to clarify their differences and their application in this type of problem **$^9$**.

In [2]:
# total number of cups
n = 12 
# number of cups with milk first (or tea first, depending from the scenario)
k= 6

#### 4.1 Permutations

**Permutations** in probability theory refer to the *various ways a set of items can be arranged in a specific order*. A *key characteristic* of permutations is that the **order of the items matters**. For instance, consider a four-digit PIN: each digit must occupy the correct position for the PIN to be valid. If we take the digits 1, 2, 3, and 4, the arrangement "1234" is distinct from "4321." Although both sequences use the same digits, they represent different outcomes because their order differs **$^{10}$**.

There are three main types of permutations:

- **Permutations of distinct items**: Where all items are unique **$^{10}$**.
- **Permutations with repetition**: Where some items may be repeated **$^{10}$**.
- **Circular permutations**: Where the arrangement is in a circle, changing how we calculate the order **$^{11}$**.

For the scope of this task, the study will focus on the first type of permutation: **permutations of distinct items**. 

The mathematical formula for calculating permutations is:

$$P(n, k) = \frac{n!}{(n - k)!}$$

where:
- $n$ = total number of items available to choose from
- $k$ = number of items to arrange
- $n!$ ($n$ factorial) = product of all positive integers up to $n$
- $P(n,k)$ = the number of ways to arrange $k$ items out of $n$
- $(n-k)!$ = factorial of the difference between the total items and the items chosen **$^{12}$**

  




#### 4.2 Combinations

Combinations, on the other hand, are used to determine how many ways a subset of items can be selected from a larger group, where the order of selection does not matter. For example, suppose you need to choose three letters—A, B, and C—from a set. The arrangements ACB, ABC, or BAC would all be considered the same combination since the order doesn't matter. Unlike permutations, where order is important, combinations focus solely on the selection of items, disregarding how they are arranged **$^{13}$**. 

The mathematical formula for calculating Combination is:

$$C(n, k) = \frac{n!}{k!(n - k)!}$$

where:
- $n$ = total number of items
- $k$ = the number of items to choose
- $C(n,k)$ = the number of ways to choose $k$ items out of $n$.

The main difference in the permutation and combination mathematical formula is that combinations include a division by $k!$ to account for the fact that the order of the selected items doesn't matter. In other words, combinations count only unique groupings, ignoring different arrangements of the same items. On the other hand, permutations do not include this division by $k!$, meaning every possible arrangement (or order) of the selected items is counted, making order important in permutations **$^{13}$**.

Now that we have a clear understanding of permutations and combinations, we can analyze the problem to determine which concept applies. As discussed earlier, the key distinction between permutations and combinations is that in permutations, the order matters, whereas in combinations, it does not. This distinction provides a helpful starting point for understanding the Lady Tasting Tea experiment. Since the task involves selecting a specific set of cups without regard to the order in which they are chosen, it is evident that we need to consider combinations, not permutations, to solve the problem effectively.

A straightforward way to calculate combinations in Python is by using the `math.comb(n, k)` function from the `math` package. This function takes two parameters: $n$, which represents the total number of items, and $k$, the number of items to be selected. It then returns the number of possible ways to choose k items from n items without considering the order of selection **$^{14}$**.

In [3]:
#using math.comb to calculate combinations
def calc_combinations(n, k):
    return math.comb(n, k)


print(f"The number of ways to choose 6 cups from 12 is: {calc_combinations(n, k)}")


The number of ways to choose 6 cups from 12 is: 924


To gain deeper insights into the workings of the math.comb(n, k) function, we can employ an alternative approach. This method will also help verify the accuracy of the results obtained from the previous function. First, let’s consider the total number of ways to select six cups from twelve, taking into account the order of selection. This is represented by calculating $12!$ (12 factorial). A factorial is a mathematical function that multiplies a given number $𝑛$ by every positive integer that precedes it. In simpler terms, the factorial of a number is the product of all whole numbers from that number down to one. The formula is expressed as **$^{15}$**:
​
$$n!=n×(n−1)×(n−2)×⋯×1$$

Next, we need to account for the number of ways to arrange the six selected cups. Once we have this information, we can apply the combination formula to calculate the number of ways to choose six cups from twelve without regard to order. The combination formula is given by:

$$C(12, 6) = \frac{12!}{6!(12 - 6)!}$$

​This formula allows us to find the number of unique combinations of six cups selected from a total of twelve, ensuring we are considering all possible selections accurately.


In [4]:
ways_ordered = 12*11*10*9*8*7
no_shuffles = 6 * 5 *4 *3 *2 *1
no_combs = ways_ordered // no_shuffles

print(f"Total number of ordered selections: {ways_ordered}")
print(f"Number of ways to arrange 6 selected cups: {no_shuffles}")
print(f"The total number of combinations of selecting 6 cups from 12 is: {no_combs}")

Total number of ordered selections: 665280
Number of ways to arrange 6 selected cups: 720
The total number of combinations of selecting 6 cups from 12 is: 924


### 5. Visualize the experiment <a class="anchor" id="5"></a>

Now that we have analyzed the data and determined that the total number of combinations for selecting 6 cups from 12 is 924, we can move on to the next part of the task. In this step, we will attempt to visualize the number of ways the lady can select between 1 and 6 correct cups in a random order. This part of the Lady Tasting Tea experiment is important because it allows us to calculate the probability distribution for different outcomes. By understanding how the probability of success changes with each possible number of correct selections, we can better evaluate the chance of the lady's success being due to pure chance.

Furthermore, this visualization plays a vital role in the discussion of the null hypothesis, which will be elaborated upon in the final section of the task.

Let's begin by generating a list that represents the total number of cups using the range function. By default, range starts from 0, but in this scenario, we want to avoid having a case where there are 0 cups. To adjust for this, we can modify the range to start from 1 instead of 0. The line generates a list of unique labels for each cup. Since there are 12 cups, the function will generate a sequence of integers from 1 to 12.

In [5]:
labels = list(range(1, n+1))
print(f"The Total number of cups is: {labels}")

The Total number of cups is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]


Now that we have a list containing the total number of cups, we can generate all possible combinations of selecting 6 cups where the milk is added first (or vice versa). To do this, we use the `itertools.combinations(iterable, r)` function. This function is ideal because it generates all unique combinations of a specified length from an input iterable, ensuring no repetition and returning them in lexicographic order. The `itertools.combinations()` function takes two main arguments:

- `iterable`: The input sequence (e.g., a list of cups) from which the combinations are generated.
- `r`: The number of items in each combination (in this case, 6 cups)**$^{16}$**.

**Note**: This function returns an iterator that generates the combinations on demand, rather than precomputing and storing them all at once. For the purposes of this task, however, the result will be stored in a list, making it easy to access all combinations at once in a straightforward manner **$^{17}$**.

In [6]:
combs = list(itertools.combinations(labels, k))
print(f"Generated {len(combs)} combinations of {k} elements from the total set of cups:\n ")
for comb in combs:
    print(comb)

Generated 924 combinations of 6 elements from the total set of cups:
 
(1, 2, 3, 4, 5, 6)
(1, 2, 3, 4, 5, 7)
(1, 2, 3, 4, 5, 8)
(1, 2, 3, 4, 5, 9)
(1, 2, 3, 4, 5, 10)
(1, 2, 3, 4, 5, 11)
(1, 2, 3, 4, 5, 12)
(1, 2, 3, 4, 6, 7)
(1, 2, 3, 4, 6, 8)
(1, 2, 3, 4, 6, 9)
(1, 2, 3, 4, 6, 10)
(1, 2, 3, 4, 6, 11)
(1, 2, 3, 4, 6, 12)
(1, 2, 3, 4, 7, 8)
(1, 2, 3, 4, 7, 9)
(1, 2, 3, 4, 7, 10)
(1, 2, 3, 4, 7, 11)
(1, 2, 3, 4, 7, 12)
(1, 2, 3, 4, 8, 9)
(1, 2, 3, 4, 8, 10)
(1, 2, 3, 4, 8, 11)
(1, 2, 3, 4, 8, 12)
(1, 2, 3, 4, 9, 10)
(1, 2, 3, 4, 9, 11)
(1, 2, 3, 4, 9, 12)
(1, 2, 3, 4, 10, 11)
(1, 2, 3, 4, 10, 12)
(1, 2, 3, 4, 11, 12)
(1, 2, 3, 5, 6, 7)
(1, 2, 3, 5, 6, 8)
(1, 2, 3, 5, 6, 9)
(1, 2, 3, 5, 6, 10)
(1, 2, 3, 5, 6, 11)
(1, 2, 3, 5, 6, 12)
(1, 2, 3, 5, 7, 8)
(1, 2, 3, 5, 7, 9)
(1, 2, 3, 5, 7, 10)
(1, 2, 3, 5, 7, 11)
(1, 2, 3, 5, 7, 12)
(1, 2, 3, 5, 8, 9)
(1, 2, 3, 5, 8, 10)
(1, 2, 3, 5, 8, 11)
(1, 2, 3, 5, 8, 12)
(1, 2, 3, 5, 9, 10)
(1, 2, 3, 5, 9, 11)
(1, 2, 3, 5, 9, 12)
(1, 2, 3, 5, 10, 11)
(

Now that we have a list containing all possible combinations of selecting 6 cups, we can move forward by focusing on the actual selection of cups for the experiment. To do this, we will randomly select 6 cups where milks is added first from a total of 12 using simple random sampling. This step mirrors how, in the real-world experiment, the cups would be chosen without bias. We implement this randomness using `random.sample(labels, k)`, which ensures that each cup has an equal probability of being chosen. This technique is crucial because it maintains the integrity of the experiment by guaranteeing that no prior assumptions or patterns influence the selection of cups. This random selection will be also the key to evaluating the null hypothesis in the later part of the task **$^{18}$** **$^{19}$**.  

In [17]:
cups_milk_first = random.sample(labels, 6)
print("The randomly selected cups containing milk added first are:\n", cups_milk_first)

The randomly selected cups containing milk added first are:
 [8, 5, 11, 7, 4, 1]


We previously generated 924 combinations and now, with the randomly selected sample of cups, we will calculate the overlap between these two sets. This calculation is crucial for evaluating the accuracy of the lady’s guesses by directly comparing the cups she selected to those that are confirmed to have milk added first. By determining the number of selections that align with the correct combinations, we can gain valuable insights into her guessing performance. Understanding this overlap is not only important for assessing her accuracy but also for evaluating the statistical significance of her ability to distinguish between the cups. If the number of correct selections significantly exceeds what would be expected by random chance, we can start to gather evidence supporting her claim of possessing a special skill. Additionally, this analysis will help us identify whether her guessing pattern shows any systematic bias or if it is merely a result of random selection  **$^{20}$**.

In [18]:
# Calculate the overlap between each element of combs and labels_milk.
no_overlaps = []

for comb in combs:
  # Turn comb into a set.
  s1 = set(comb)
  # Turn labels_milk into a set.
  s2 = set(cups_milk_first)
  # Figure out where they overlap.
  overlap = s1.intersection(s2)
  # Show the combination and the overlap.
  print(comb, overlap, len(overlap))
  # Append overlap to no_overlaps.
  no_overlaps.append(len(overlap))

(1, 2, 3, 4, 5, 6) {1, 4, 5} 3
(1, 2, 3, 4, 5, 7) {1, 4, 5, 7} 4
(1, 2, 3, 4, 5, 8) {8, 1, 4, 5} 4
(1, 2, 3, 4, 5, 9) {1, 4, 5} 3
(1, 2, 3, 4, 5, 10) {1, 4, 5} 3
(1, 2, 3, 4, 5, 11) {1, 11, 4, 5} 4
(1, 2, 3, 4, 5, 12) {1, 4, 5} 3
(1, 2, 3, 4, 6, 7) {1, 4, 7} 3
(1, 2, 3, 4, 6, 8) {8, 1, 4} 3
(1, 2, 3, 4, 6, 9) {1, 4} 2
(1, 2, 3, 4, 6, 10) {1, 4} 2
(1, 2, 3, 4, 6, 11) {1, 11, 4} 3
(1, 2, 3, 4, 6, 12) {1, 4} 2
(1, 2, 3, 4, 7, 8) {8, 1, 4, 7} 4
(1, 2, 3, 4, 7, 9) {1, 4, 7} 3
(1, 2, 3, 4, 7, 10) {1, 4, 7} 3
(1, 2, 3, 4, 7, 11) {1, 11, 4, 7} 4
(1, 2, 3, 4, 7, 12) {1, 4, 7} 3
(1, 2, 3, 4, 8, 9) {8, 1, 4} 3
(1, 2, 3, 4, 8, 10) {8, 1, 4} 3
(1, 2, 3, 4, 8, 11) {8, 1, 11, 4} 4
(1, 2, 3, 4, 8, 12) {8, 1, 4} 3
(1, 2, 3, 4, 9, 10) {1, 4} 2
(1, 2, 3, 4, 9, 11) {1, 11, 4} 3
(1, 2, 3, 4, 9, 12) {1, 4} 2
(1, 2, 3, 4, 10, 11) {1, 11, 4} 3
(1, 2, 3, 4, 10, 12) {1, 4} 2
(1, 2, 3, 4, 11, 12) {1, 11, 4} 3
(1, 2, 3, 5, 6, 7) {1, 5, 7} 3
(1, 2, 3, 5, 6, 8) {8, 1, 5} 3
(1, 2, 3, 5, 6, 9) {1, 5} 2
(1, 2, 3, 5, 6

### References <a class="anchor" id="references"></a>

**$^1$** Zoltan Dienes (2008). "*Understanding Psychology as a Science, An introduction to scientific and statistical inference*". Palgrave Macmillan.

**$^2$** Learn Statistics Easily (2023). "*The Statistical Significance of the ‘Lady Tasting Tea’ Experiment*". [Learn Statistics Easily](https://statisticseasily.com/lady-tasting-tea/)

**$^3$** Fisher R. A. (1935). "*The Design of Experiments*". Chapter II: *The principles of experimentation illustrated by a psycho-physical experiment*, p.13 .Oliver and Boyd

**$^4$** Python Documentation (n.d.). "*math — Mathematical functions*". [Python Documentation](https://docs.python.org/3/library/math.html)


**$^5$**  Python Documentation (n.d.). "*itertools — Functions creating iterators for efficient looping*" [Python Documentation](https://docs.python.org/3/library/itertools.html).
 
**$^6$**  Python Documentation (n.d.). "*random — Generate pseudo-random numbers*". [Python Documentation](https://docs.python.org/3/library/random.html).

**$^7$**  Numpy (n.d.). "*NumPy reference*". [Numpy](https://numpy.org/doc/stable/reference/index.html#reference).

**$^8$**  Matplotlib (n.d.). "*Using Matplotlib*". [Matplotlib](https://matplotlib.org/stable/users/index.html).

**$^9$** Hayes A., Zhi V. H., Tarigan C. (n.d.). "*Combinatorics*. [Brilliant](https://brilliant.org/wiki/combinatorics/#permutations-and-combinations)

**$^{10}$** Frost J. (n.d.). "*Using Permutations to Calculate Probabilities*". [Statistics By Jim](https://statisticsbyjim.com/probability/permutations-probabilities/#:~:text=Permutations%20in%20probability%20theory%20and,order%20of%20numbers%20is%20crucial.)

**$^{11}$** Greeks for greeks (n.d). "*Circular Permutation*". [Greeks for greeks](https://www.geeksforgeeks.org/circular-permutation/)

**$^{12}$** Taylor S. (n.d.). "*Permutation*". [Corporate Finance Institute](https://corporatefinanceinstitute.com/resources/data-science/permutation/)

**$^{13}$** Library and Learning Center (n.d.). "*Statistics*". [Library and Learning Center](https://libraryguides.centennialcollege.ca/c.php?g=717168&p=5128089)

**$^{14}$** Greeks for greeks (2020). "*Python – math.comb() method*". [Greeks for greeks](https://www.geeksforgeeks.org/python-math-comb-method/)

**$^{15}$** Greeks for greeks (2024). "*Factorial in Maths*". [Greeks for greeks](https://www.geeksforgeeks.org/factorial/)

**$^{16}$** LabEx (n.d). "*How to use itertools.combinations in Python?*". [LabEx](https://labex.io/tutorials/python-how-to-use-itertools-combinations-in-python-398083)

**$^{17}$** Stackoverflow (2011). "*Python returning <itertools.combinations object at 0x10049b470> - How can I access this?*". [Stackoverflow](https://stackoverflow.com/questions/5176232/python-returning-itertools-combinations-object-at-0x10049b470-how-can-i-ac)

**$^{18}$** Hayes A., (2024). "*Simple Random Sampling: 6 Basic Steps With Examples*".[Investopedia](https://www.investopedia.com/terms/s/simple-random-sample.asp)

**$^{19}$** Python Documentation (n.d.). "*random — Generate pseudo-random numbers*". [Python Documentation](https://docs.python.org/3/library/random.html#random.sample)

**$^{20}$** Richardson J. T. E. (2021). "*Closer Look at the Lady Tasting Tea*". [Significance](https://doi.org/10.1111/1740-9713.01572), Volume 18, Pages 34–37.

## Task 2: ****


### References

## Task 3: ****


### References

## Task 4: ****


### References

***
End