### Module: Applied Statistics
### Author: Stefania Verduga

#### Task 1: Permutations and Combinations
Suppose we alter the Lady Tasting Tea experiment to involve twelve cups of tea. Six have the milk in first and the other six having tea in first. A person claims they have the special power of being able to tell whether the tea or the milk went into a cup first upon tasting it. You agree to accept their claim if they can tell which of the six cups in your experiment had the milk in first.

Calculate, using Python, the probability that they select the correct six cups. Here you should assume that they have no special powers in figuring it out, that they are just guessing. Remember to show and justify your workings in code and MarkDown cells.

Suppose, now, you are willing to accept one error. Once they select the six cups they think had the milk in first, you will give them the benefit of the doubt should they have selected at least five of the correct cups. Calculate the probability, assuming they have no special powers, that the person makes at most one error.

Would you accept two errors? Explain.

In [1]:
# Imports
import math
import itertools
import random

1. ### Permutations
Permutation is the arrangement of items in a specific order. The order matters in permutations.
Suppose we have three letters: A, B and C. How many ways can we arrange two of these letters?
Possible arrangements are: AB, BA, AC, CA, BC, CB.

2. ### Combinations
Combination is a selection of items where the order does not matter. We only focus on the item chosen and not the sequence in which they appear.
Using the same set of letter explained above: A, B and C. How many ways can we select two of these letters?
Possible selections are: AB, AC, BC. We should take into account that AB and BA are considered the same combination as order does not matter.

In the "Lady Tasting Tea" experiment we are going to select six cups out of twelve, where the order does not matter, so it's a combination problem. We are calculating how many ways someone could randomly select six cups, and figure out the probablity of selecting the correct ones.

### Cups of Tea

In [2]:
# Number of cups of tea in total.
no_cups = 12

# Number of cups of tea with milk in first.
no_cups_milk_first = 6

# Number of cups of tea with tea in first.
no_cups_tea_first = 6

In [3]:
# Number of ways of selecting six cups from twelve.
ways = math.comb(no_cups, no_cups_milk_first)

# Show.
ways

924

### math.comb

In order to better study the probablility of selecting the correct six cups in this experiment, I am using the module math from Python.
https://docs.python.org/3/library/math.html#math.comb

In [4]:
# Number of cups of tea in total.
n = 12

# Number of cups of tea with milk in first.
k = 6

### math.factorial
https://docs.python.org/3/library/math.html#math.factorial

In [5]:
math.factorial(n)

479001600

In [6]:
# Twelve factorial.
math.factorial(n)

479001600

In [7]:
# Six factorial.
math.factorial(k)

720

In [8]:
# Six factorial.
math.factorial(n - k)

720

In [9]:
# Number of ways of selecting k objects from n without replacement and without order.
math.factorial(n) // (math.factorial(k) * math.factorial(n - k))

924

### Results
https://docs.python.org/3/library/itertools.html#itertools.combinations

In [10]:
# The cup labels.
labels = list(range(no_cups))

# Show.
labels

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [11]:
# Different ways of selecting no_cups_milk_first out of no_cups of tea.
combs = list(itertools.combinations(labels, no_cups_milk_first))

# Show.
combs

[(0, 1, 2, 3, 4, 5),
 (0, 1, 2, 3, 4, 6),
 (0, 1, 2, 3, 4, 7),
 (0, 1, 2, 3, 4, 8),
 (0, 1, 2, 3, 4, 9),
 (0, 1, 2, 3, 4, 10),
 (0, 1, 2, 3, 4, 11),
 (0, 1, 2, 3, 5, 6),
 (0, 1, 2, 3, 5, 7),
 (0, 1, 2, 3, 5, 8),
 (0, 1, 2, 3, 5, 9),
 (0, 1, 2, 3, 5, 10),
 (0, 1, 2, 3, 5, 11),
 (0, 1, 2, 3, 6, 7),
 (0, 1, 2, 3, 6, 8),
 (0, 1, 2, 3, 6, 9),
 (0, 1, 2, 3, 6, 10),
 (0, 1, 2, 3, 6, 11),
 (0, 1, 2, 3, 7, 8),
 (0, 1, 2, 3, 7, 9),
 (0, 1, 2, 3, 7, 10),
 (0, 1, 2, 3, 7, 11),
 (0, 1, 2, 3, 8, 9),
 (0, 1, 2, 3, 8, 10),
 (0, 1, 2, 3, 8, 11),
 (0, 1, 2, 3, 9, 10),
 (0, 1, 2, 3, 9, 11),
 (0, 1, 2, 3, 10, 11),
 (0, 1, 2, 4, 5, 6),
 (0, 1, 2, 4, 5, 7),
 (0, 1, 2, 4, 5, 8),
 (0, 1, 2, 4, 5, 9),
 (0, 1, 2, 4, 5, 10),
 (0, 1, 2, 4, 5, 11),
 (0, 1, 2, 4, 6, 7),
 (0, 1, 2, 4, 6, 8),
 (0, 1, 2, 4, 6, 9),
 (0, 1, 2, 4, 6, 10),
 (0, 1, 2, 4, 6, 11),
 (0, 1, 2, 4, 7, 8),
 (0, 1, 2, 4, 7, 9),
 (0, 1, 2, 4, 7, 10),
 (0, 1, 2, 4, 7, 11),
 (0, 1, 2, 4, 8, 9),
 (0, 1, 2, 4, 8, 10),
 (0, 1, 2, 4, 8, 11),
 (0, 1, 2, 4

### Combination Formula

The combination formula is:

$$
C(n, k) = \frac{n!}{k!(n - k)!}
$$

This combination function calculates the number of ways to choose k items from n items, where the order does not matter.


In [12]:
# Function to calculate combinations.

def combinations(n, k):
    return math.comb(n, k)

### Probability of guessing all 6 cups.

- There are 12 cups in total, 6 of which have milk in first. We need to calculate how many ways we can select 6 cups from 12, which is given by combinations(12, 6).
- Since there is only 1 correct combination (the exact 6 cups with milk first), the probability of guessing all 6 cups correctly is:
$$
P(all\ correct) = \frac{1}{C(12,6)}
$$


In [13]:
# Probability of guessing all 6 cups correctly.

total_combinations = combinations(12, 6)
prob_6_correct = 1 / total_combinations

# Show
prob_6_correct

0.0010822510822510823

### Probability of guessing exactly 5 correct.

The person guesses 5 cups correctly and 1 incorrectly. This happens by:
- Choosing 5 correct cups from the 6 actual cups with milk first (combinations(6, 5)).
- Choosing 1 incorrect cup from the 6 cups where tea was poured first (combinations(6, 1)).

The total probability of guessing exactly 5 correct is:
$$
P(5\ correct) = \frac{C(6,5)\ x\ C(6,1)}{C(12,6)}
$$

In [14]:
# Probability of guessing exactly 5 correctly.

prob_5_correct = (combinations(6, 5) * combinations(6, 1)) / total_combinations

# Show.
prob_5_correct

0.03896103896103896

### Probability of guessing exactly 4 correct.

The person guesses 4 cups correctly and 2 incorrectly. This happens by:
- Choosing 4 correct cups from the 6 actual cups with milk first (combinations(6, 4)).
- Choosing 2 incorrect cups from the 6 cups where tea was poured first (combinations(6, 2)).

The total probability of guessing exactly 4 correct is:
$$
P(4\ correct) = \frac{C(6,4)\ x\ C(6,2)}{C(12,6)}
$$

In [15]:
# Probability of guessing exactly 5 correctly.

prob_4_correct = (combinations(6, 4) * combinations(6, 2)) / total_combinations

# Show.
prob_4_correct

0.2435064935064935

### Probability of guessing at least 5 correct.

This is the probability of guessing either all 6 cups correctly or exactly 5 cups correctly (at most 1 error). It's the sum of both probabilities:
$$
P(at\ least\ 4\ correct) = P(6\ correct) + P(5\ correct)
$$

In [16]:
# Probability of guessing at least 5 correctly (at most one error).

prob_at_least_5_correct = prob_6_correct + prob_5_correct

# Show.
prob_at_least_5_correct

0.04004329004329004

#### Task 2: Numpy's Normal Distribution
In this task you will assess whether numpy.random.normal() properly generates normal values. To begin, generate a sample of one hundred thousand values using the function with mean 10.0 and standard deviation 3.0.

Use the scipy.stats.shapiro() function to test whether your sample came from a normal distribution. Explain the results and output.

Plot a histogram of your values and plot the corresponding normal distribution probability density function on top of it.

In [17]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

### Generate a sample.

The first step in this case is to generate one hundred thousand values from a normal distribution with a mean of 10.0 and standard deviation of 3.0. I will need to use numpy.random.normal() to do this.

In [18]:
# Generate a sample.

mean = 10.0
std_dev = 3.0
sample_size = 100000

# Generate 100k random values from a normal distribution.

sample = np.random.normal(loc=mean, scale=std_dev, size=sample_size)

# Show.
sample

array([ 5.2594661 ,  7.00644981, 12.08468448, ..., 11.84457896,
       10.26985393,  8.77414602])