# Simulation and Statistical Inference Project

This notebook contains four simulation-based problems designed to explore key concepts in statistical inference, using Python (NumPy, SciPy, Matplotlib). Each problem focuses on understanding core ideas through Monte Carlo simulation and visualization.

-
**Next Step:**  
Begin implementing **Problem 1** — set up your imports, initialize the random generator, and outline the simulation logic for the Lady Tasting Tea extension.


## Problem 1 — Extending the Lady Tasting Tea

**Objective:**  
Extend the classic Lady Tasting Tea experiment by increasing the number of cups. Simulate random guessing to estimate the probability of correctly identifying all cups by chance in both the original and extended setups.

**Key Concepts:**  
- Combinatorics and probability by chance  
- Simulation using random sampling  
- Interpretation of p-values and significance levels

**Steps:**  
1. Describe the original experiment (8 cups: 4 tea-first, 4 milk-first).  
2. Extend it to 12 cups (8 tea-first, 4 milk-first).  
3. Simulate many random guesses.  
4. Estimate and compare the probabilities.  
5. Discuss implications for statistical significance.

https://docs.python.org/3/library/math.html#math.comb

In [45]:
# Simulation and Statistical Inference Project
# Problem 1 — Extending the Lady Tasting Tea

# Import libraries
import numpy as np
import math
import matplotlib.pyplot as plt

# Initialize random generator
# every time we run the notebook, we’ll get exactly the same random results.
# It initializes the random generator so results can be repeated consistently
rng = np.random.default_rng(seed=42)  # for reproducibility




In the original *Lady Tasting Tea* experiment:

- There are **8 cups** of tea: **4 prepared with milk poured first** and **4 with tea poured first**.  
- The lady’s task is to **correctly identify the 4 cups with milk poured first**.  
- If she is **just guessing randomly**, there are  
  \[
  C(8, 4) = 70
  \]
  possible ways to choose 4 cups out of 8.  

This means that, by chance alone, the probability of her selecting all four correct cups is  
\[
\frac{1}{70} \approx 0.0143
\].


In [46]:
# Cups of Tea

# Numbers of cups in total
no_cups = 8

# Number of cups of tea with milk in first
no_cups_milk_first = 4

# Number of cups of tea with tea first
no_cups_tea_first = 4

In [47]:
#Calculate the number of ways to choose k elements from a set of n elements where the order of the chosen elements does not matter and no element is selected more than once.

1 # Number of ways of selecting four cups from eight.


ways = math.comb(no_cups, no_cups_milk_first)

print(ways)


70


In [48]:
# Total number of ways of selecting four cups from eight, keeping them in mind the order.

# ways_ordered = math.perm(8, 4)
ways_ordered = 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 // (4 * 3 * 2 * 1 * 4 * 3 * 2 * 1)

print(ways_ordered)

70


In [49]:
1 # No of ways of shuffling four cups.

no_shuffles = 4 * 3 * 2 * 1

print(no_shuffles)

24


---

## Problem 2 — Normal Distribution and Sample vs Population SD

**Objective:**  
Compare the behavior of the sample standard deviation (`ddof=1`) and population standard deviation (`ddof=0`) when sampling from a standard normal distribution.

**Key Concepts:**  
- Sampling variation  
- Bias in estimators  
- The effect of sample size on variability

**Steps:**  
1. Generate multiple samples from a standard normal distribution.  
2. Compute SDs using both `ddof=1` and `ddof=0`.  
3. Visualize the distributions with overlaid histograms.  
4. Interpret the differences and how they change with larger sample sizes.


## Problem 3 — Type II Error and t-Tests

**Objective:**  
Simulate the behavior of Type II errors in t-tests as the difference between population means increases.

**Key Concepts:**  
- Type I and Type II errors  
- Power of a statistical test  
- The relationship between effect size and error rates

**Steps:**  
1. Define a range of true mean differences (d = 0 to 1).  
2. Generate two samples for each d.  
3. Perform an independent t-test.  
4. Record how often the null hypothesis is not rejected.  
5. Plot the Type II error rate as a function of d.  
6. Discuss how power increases with effect size.

## Problem 4 — ANOVA vs Multiple t-Tests

**Objective:**  
Compare results from one-way ANOVA with results from multiple two-sample t-tests when analyzing group mean differences.

**Key Concepts:**  
- ANOVA as an omnibus test  
- Multiple comparisons problem  
- Controlling Type I error (family-wise error rate)

**Steps:**  
1. Generate three independent samples from normal distributions with different means.  
2. Run a one-way ANOVA.  
3. Perform pairwise t-tests (with and without correction).  
4. Compare results and conclusions.  
5. Explain why ANOVA is preferred before multiple t-tests.


In [50]:
import yfinance as yf