# Advanced Statistics Problems Notebook

This is the notebook containing the problems and solutions from problems.md. As per the guidelines I will follow the Python coding standards and guidelines such as PEP8. The only packages that can be used are contained below as per requirements.txt in the problems brief

## Problem 1: Extending the Lady Tasting Tea

Let's extend the Lady Tasting Tea experiment as follows. The original experiment has 8 cups: 4 tea-first and 4 milk-first. Suppose we prepare 12 cups: 8 tea-first and 4 milk-first. A participant claims they can tell which was poured first.

Simulate this experiment using numpy by randomly shuffling the cups many times and calculating the probability of the participant correctly identifying all cups by chance. Compare your result with the original 8-cup experiment.

In your notebook, explain your simulation process clearly, report and interpret the estimated probability, and discuss whether, based on this probability, you would consider extending or relaxing the p-value threshold compared to the original design.

(applied-statistics/assessment
/problems.md Ian McLoughlan)

In [None]:
# Mathematical functions from the standard library
# https://docs.python.org/3/library/math.html
import math

# Permutations and combinations.
# https://docs.python.org/3/library/itertools.html
from itertools import permutations, combinations

# Random Selections.
# https://docs.python.org/3/library/random.html
import random

# Numerical structures and operations.
# https://numpy.org/doc/stable/user/absolute_beginners.html
import numpy as np

# Plotting.
# https://matplotlib.org/stable/contents.html
import matplotlib.pyplot as plt

# Initial experiment: Lady tasting Tea

In the design of experiments in statistics, the lady tasting tea is a randomized experiment devised by Ronald Fisher and reported in his book The Design of Experiments (1935).[1] The experiment is the original exposition of Fisher's notion of a null hypothesis, which is "never proved or established, but is possibly disproved, in the course of experimentation".[2][3]

The example is loosely based on an event in Fisher's life. The woman in question, phycologist Muriel Bristol, claimed to be able to tell whether the tea or the milk was added first to a cup. Her future husband, William Roach, suggested that Fisher give her eight cups, four of each variety, in random order.[4] One could then ask what the probability was for her getting the specific number of cups she identified correct (in fact all eight), but just by chance.

Fisher's description is less than 10 pages in length and is notable for its simplicity and completeness regarding terminology, calculations and design of the experiment.[5] The test used was Fisher's exact test.

The experiment provides a subject with eight randomly ordered cups of tea – four prepared by pouring milk and then tea, four by pouring tea and then milk. The subject attempts to select the four cups prepared by one method or the other, and may compare cups directly against each other as desired. The method employed in the experiment is fully disclosed to the subject.

The null hypothesis is that the subject has no ability to distinguish the teas. In Fisher's approach, there was no alternative hypothesis,[2] unlike in the Neyman–Pearson approach.

The test statistic is a simple count of the number of successful attempts to select the four cups prepared by a given method. The distribution of possible numbers of successes, assuming the null hypothesis is true, can be computed using the number of combinations. Using the combination formula, with $n=8$ total cups and $k=4$ cups chosen, there are:

$$
{\binom {8}{4}}={\frac {8!}{4!(8-4)!}}=70
$$

possible combinations.

https://en.wikipedia.org/wiki/Lady_tasting_tea

# Permutations and Combinations
In mathematics, a permutation of a set can mean one of two different things:

- An arrangement of its members in a sequence or linear order, or

- The act or process of changing the linear order of an ordered set.

https://en.wikipedia.org/wiki/Permutation

In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike permutations)

https://en.wikipedia.org/wiki/Combination

A Permutation is an ordered combination

## Problem 2: Normal Distribution

Generate 100,000 samples of size 10 from the standard normal distribution. For each sample, compute the standard deviation with ddof=1 (sample SD) and with ddof=0 (population SD). Plot histograms of both sets of values on the same axes with transparency. Describe the differences you see. Explain how you expect these differences to change if the sample size is increased.

(applied-statistics/assessment
/problems.md Ian McLoughlan)

## Problem 3: t-Tests

A type II error occurs when a test fails to reject the null hypothesis even though it is false. For each mean difference 
d
=
0
,
0.1
,
0.2
,
…
,
1.0
, repeat the following simulation 1,000 times:

Draw two samples of size 100, one from the standard normal distribution and one from the normal distribution with mean 
d
 and standard deviation 1.
Run an independent samples t-test on the two samples, rejecting the null hypothesis if the p-value is less than 0.05.
Record the proportion of times the null hypothesis is not rejected.
Plot this proportion against 
d
, and explain how the type II error rate changes as the difference in means increases.

## Problem 4: ANOVA

Generate three independent samples, each of size 30, from normal distributions with means 0, 0.5, and 1, each with standard deviation 1.

Perform a one-way ANOVA to test whether all three means are equal.
Perform three independent two-sample t-tests: samples 1 vs 2, 1 vs 3, and 2 vs 3.
Compare the conclusions.
Write a short note on why ANOVA is preferred over running several t-tests.

## END