# Hypothesis Testing

## What Is Hypothesis Testing?

**Hypothesis testing** is a method used in statistics to make a decision or judgment about something — based on **data**.

It answers the question:

> “Is what I’m seeing just random chance, or is there really something going on?”

---

### Everyday Analogy

Imagine you have a coin and you think it might be unfair (maybe it lands on heads more often).

You flip it 100 times and get 70 heads.

Now you wonder:
**“Is this coin really unfair, or did I just get lucky?”**

Hypothesis testing helps you answer that question — using math!

---

## Key Terms Explained

| Term                            | What It Means                                                            |
| ------------------------------- | ------------------------------------------------------------------------ |
| **Hypothesis**                  | A statement or claim you're testing                                      |
| **Null Hypothesis (H₀)**        | The starting assumption — usually "nothing's going on"                   |
| **Alternative Hypothesis (H₁)** | What you're trying to prove — "something’s going on"                     |
| **P-value**                     | Probability your results happened by chance                              |
| **Significance Level (α)**      | The cutoff (usually 0.05) for deciding if something is surprising enough |
| **Reject H₀**                   | Say the result is significant (unusual!)                                 |
| **Fail to Reject H₀**           | Say the result could have happened by chance (not enough evidence)       |

---

## Step-by-Step Example

**Scenario:** A school says their average test score is **75**.
You think students in your class score **lower** than that.

You take a **sample of 10 students** and get these scores:

$$
65, 70, 68, 72, 66, 69, 67, 71, 64, 70
$$

### Step 1: Set up the hypotheses

* **H₀ (null hypothesis):** The average score = 75
* **H₁ (alternative hypothesis):** The average score < 75

### Step 2: Collect data

* Sample size $n = 10$
* Sample mean $\bar{x} = 68.2$
* Sample standard deviation $s \approx 2.75$

### Step 3: Do the math (simplified)

Use a **t-test** to calculate the **t-value**:

$$
t = \frac{\bar{x} - \mu}{s / \sqrt{n}} = \frac{68.2 - 75}{2.75 / \sqrt{10}} \approx \frac{-6.8}{0.87} \approx -7.82
$$

(We won’t go into the full calculation — a calculator or software does this.)

### Step 4: Find the **p-value**

With a **t-value of –7.82** and **10 students**, the **p-value is extremely small** (way less than 0.05).

### Step 5: Conclusion

Since the **p-value is very small**, we say:

👉 “There’s strong evidence that our class scores lower than 75.”
✅ **We reject the null hypothesis.**

---

## Summary

| Step                 | What You Do                                           |
| -------------------- | ----------------------------------------------------- |
| 1. Ask a question    | Is the average score really 75?                       |
| 2. Make hypotheses   | H₀: nothing’s wrong, H₁: something’s going on         |
| 3. Collect data      | Get your sample and do the math                       |
| 4. Calculate p-value | How likely is your result if H₀ is true?              |
| 5. Decide            | If p < 0.05, reject H₀ (your result is *significant*) |

---

## Quick Quiz

1. What is the **null hypothesis** usually saying?
   → **Nothing’s going on / No difference**

2. What does a **small p-value** mean?
   → Your result is **unlikely by chance**

3. What does it mean to **reject the null hypothesis**?
   → You have **evidence** something is going on

In [1]:
from scipy import stats  # Import the 'stats' module from SciPy for statistical functions

# Sample data: This represents the observed values in the sample
sample = [65, 70, 68, 72, 66, 69, 67, 71, 64, 70]

# Hypothesized population mean (mu_0): The value we're testing the sample mean against
mu_0 = 75

# Perform a one-sample t-test:
# This test compares the sample mean to the hypothesized mean (mu_0)
# H0 (null hypothesis): The true population mean = mu_0
# H1 (alternative hypothesis): The true population mean ≠ mu_0
t_stat, p_value = stats.ttest_1samp(sample, mu_0)

# Output the calculated t-statistic, which measures how many standard errors
# the sample mean is away from the hypothesized mean
print("t-statistic:", t_stat)

# Output the p-value, which indicates the probability of obtaining a sample
# at least as extreme as the observed one, assuming the null hypothesis is true
print("p-value:", p_value)

t-statistic: -8.089126174325067
p-value: 2.0257552964461817e-05


## What is a One-Sample Z-Test?

A **one-sample Z-test** is a way to check if a sample (a small group) is **significantly different** from a known population (the whole group).

Think of it like this:

> You know the **average score** for all students in your school is 75.
> You take a **sample** of students from your class and see if their average score is *different enough* from 75 to matter — or if it’s just random luck.

---

##  When Do We Use It?

* When we **know** the **population mean** (μ) and **population standard deviation** (σ).
* When our data is **normally distributed** (bell curve shape).
* When we have a **sample mean** and want to test if it’s different from μ.

---

## Formula for the Z-Test

$$
Z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}
$$

Where:

* $\bar{x}$ = **sample mean** (average of our sample)
* $\mu$ = **population mean**
* $\sigma$ = **population standard deviation**
* $n$ = **sample size**
* $Z$ = number of **standard errors** our sample mean is away from the population mean

---

## Example:

**Situation:**
The average SAT math score at your school is **μ = 500** with a population standard deviation **σ = 100**.
You want to see if your math club’s 25 members have a **different** average SAT math score.

**Step 1 – Gather Data**
You test the math club and find their **average score is $\bar{x} = 540$**.

**Step 2 – Apply Formula**

$$
Z = \frac{540 - 500}{\frac{100}{\sqrt{25}}}
$$

First, find the **standard error**:

$$
\frac{\sigma}{\sqrt{n}} = \frac{100}{\sqrt{25}} = \frac{100}{5} = 20
$$

Now calculate Z:

$$
Z = \frac{540 - 500}{20} = \frac{40}{20} = 2
$$

**Step 3 – Interpret**
In statistics, a **Z-score of 2** means your sample mean is **2 standard errors above the population mean**.

If we check a **Z-table** (or p-value), a Z-score of 2 corresponds to a probability of about **0.0228** for one tail (or **0.0456** for two tails).

**If our significance level is α = 0.05**, then:

* **p < 0.05** → We conclude: The math club **really does** have a higher average score than the school average.
* **p ≥ 0.05** → We conclude: No significant difference — it could be just random chance.

---

## Summary

* **One-Sample Z-Test** checks if your sample mean is *different enough* from a known population mean.
* Formula uses the **mean**, **standard deviation**, and **sample size**.
* You compare your **Z-score** to a **Z-table** (or p-value) to decide if the difference is real or just chance.


In [2]:
# One-sample Z-test (two-tailed) using a known population standard deviation

import numpy as np
from scipy.stats import norm  # CDF for the standard normal distribution

# ----------------------------
# Example setup / assumptions:
# - Population standard deviation (sigma) is known and equals 10
# - Null hypothesis H0: population mean μ = 100 (mu_0)
# - Alternative hypothesis H1: μ ≠ 100 (two-tailed)
# ----------------------------

np.random.seed(42)  # For reproducibility of the random sample

# Simulate a sample drawn from a normal population with true mean 102 and sd 10
sample = np.random.normal(loc=102, scale=10, size=50)

mu_0 = 100          # Hypothesized population mean under H0
sigma = 10          # Known population standard deviation
n = len(sample)     # Sample size
sample_mean = np.mean(sample)  # Sample mean (point estimate of μ)

# Compute Z statistic:
#   Z = (x̄ - μ0) / (σ / √n)
# Under H0 and with known σ, Z ~ N(0, 1)
z_score = (sample_mean - mu_0) / (sigma / np.sqrt(n))

# Two-tailed p-value:
#   p = 2 * P(Z ≥ |z|)
# Using the standard normal CDF: P(Z ≥ |z|) = 1 - Φ(|z|)
p_value = 2 * (1 - norm.cdf(abs(z_score)))

alpha = 0.05  # Significance level

# Display the calculated values and the decision
print(f"Sample Mean: {sample_mean:.2f}")
print(f"Z-Score: {z_score:.2f}")
print(f"P-Value: {p_value:.4f}")

# Decision rule:
#   If p < α, reject H0 (evidence against μ = μ0)
#   Otherwise, fail to reject H0
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

Sample Mean: 99.75
Z-Score: -0.18
P-Value: 0.8571
Fail to reject the null hypothesis.


### T-Tests
T-tests are used when the population standard deviation is unknown.
They typically assume normality of the sample data (especially for
small 𝑛)

## What is a One-Sample T-Test?

A **One-Sample T-Test** is a way to check if the **average (mean)** of a group of numbers is **significantly different** from a value we expect.

Think of it as:

> "Is my sample’s average really different from some known or claimed value, or could the difference just be from random chance?"

---

### When to Use It

* You have **one sample** of data.
* You **know the population mean** (or a value you want to compare against), but **not the population standard deviation**.
* Your sample is relatively small (but it works for larger samples too).
* Data should be roughly **normally distributed**.

---

### Real-Life Example

**Situation:**
A school says the **average height** of their 10th-grade students is **165 cm**.
You think it’s higher, so you measure the height of **10 randomly chosen students**.

| Student | Height (cm) |
| ------- | ----------- |
| 1       | 166         |
| 2       | 170         |
| 3       | 172         |
| 4       | 169         |
| 5       | 168         |
| 6       | 171         |
| 7       | 167         |
| 8       | 170         |
| 9       | 169         |
| 10      | 172         |

---

### Step-by-Step

#### Step 1: Write the hypotheses

* **Null hypothesis (H₀):** The true mean height is **165 cm** (no difference).
* **Alternative hypothesis (H₁):** The true mean height is **greater than 165 cm** (your suspicion).

---

#### Step 2: Find the sample mean ($\bar{x}$) and sample standard deviation (s)

$$
\bar{x} = \frac{\text{sum of all heights}}{n} = \frac{1694}{10} = 169.4
$$

Let's say we calculated the **sample standard deviation** $s \approx 2.32$ cm.

---

#### Step 3: Calculate the t-statistic

Formula:

$$
t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}
$$

Where:

* $\bar{x}$ = sample mean = 169.4
* $\mu_0$ = population mean (claimed value) = 165
* $s$ = sample standard deviation = 2.32
* $n$ = number of students = 10

$$
t = \frac{169.4 - 165}{2.32 / \sqrt{10}} = \frac{4.4}{0.733} \approx 6.00
$$

---

#### Step 4: Compare with critical value

From a **t-table** with $df = n - 1 = 9$ and a **significance level** ($\alpha = 0.05$),
**critical t-value** ≈ 1.833 (one-tailed test).

Since **6.00 > 1.833**, we **reject the null hypothesis**.

---

### Conclusion:

There’s **strong evidence** that the average height is **greater than 165 cm**.

---

### Summary for a High School Student

* **One-sample t-test** checks if your sample average is really different from some number.
* If your **t-value is big** (compared to the table value), it means your difference is probably **real**, not just due to random chance.
* It’s like saying: *"Are these students really taller than average, or did I just happen to pick a tall group by luck?"*