# Part 5: Hypothesis Testing Fundamentals

**Goal:** Master the foundation of hypothesis testing - the framework, p-values, and errors

**What you'll learn:**
- What hypothesis testing is
- Null and alternative hypotheses
- Type I and Type II errors
- Significance levels (alpha)
- P-values and what they really mean
- The complete 7-step framework

---

## Setup: Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import norm, binom, poisson

sns.set_style("whitegrid")
print("✓ Libraries loaded successfully!")

✓ Libraries loaded successfully!


---

## SECTION 1: What is Hypothesis Testing?

**The Big Question:**

You observe something. Is it real or just luck?

**Examples:**
- New ad gets 5% click rate vs old 3%. Is it better or just luck?
- Patient improves on drug. Did drug work or just coincidence?
- Sales up 10% this month. Real trend or random variation?
- Manufacturing defects: 3 vs 5 last week. Improvement or luck?

**Hypothesis Testing Answers:** How likely is this if nothing changed?

In [2]:
print("HYPOTHESIS TESTING FRAMEWORK")
print("="*70)
print("\nCore Concept:")
print("Hypothesis Testing = Assuming nothing changed,")
print("                    how surprised should I be by this observation?")
print("\nThe Process:")
print("1. Assume null (nothing changed)")
print("2. Calculate how weird your observation is (p-value)")
print("3. If weird enough, reject null")
print("4. Conclude: Real effect likely exists")

print("\n" + "="*70)
print("\nKey Insight:")
print("This is statistical DETECTIVE WORK")
print("Goal: Distinguish real effects from random luck")
print("Method: Assume nothing changed, then test")

HYPOTHESIS TESTING FRAMEWORK

Core Concept:
Hypothesis Testing = Assuming nothing changed,
                    how surprised should I be by this observation?

The Process:
1. Assume null (nothing changed)
2. Calculate how weird your observation is (p-value)
3. If weird enough, reject null
4. Conclude: Real effect likely exists


Key Insight:
This is statistical DETECTIVE WORK
Goal: Distinguish real effects from random luck
Method: Assume nothing changed, then test


---

## SECTION 2: Null and Alternative Hypotheses

In [3]:
print("NULL vs ALTERNATIVE HYPOTHESES")
print("="*70)
print("\nH₀ (Null Hypothesis):")
print("  - Represents 'no change' or 'no effect'")
print("  - The boring scenario")
print("  - What you assume is true at start")
print("  - What you try to reject")

print("\nH₁ (Alternative Hypothesis):")
print("  - Represents 'there IS a change' or 'there IS an effect'")
print("  - The interesting scenario")
print("  - What you're hoping to prove")
print("  - What you accept if null is rejected")

print("\n" + "="*70)
print("\nKEY RULE: H₀ and H₁ are OPPOSITES")
print("You always TEST the null (H₀)")
print("If H₀ is rejected → H₁ is accepted")

NULL vs ALTERNATIVE HYPOTHESES

H₀ (Null Hypothesis):
  - Represents 'no change' or 'no effect'
  - The boring scenario
  - What you assume is true at start
  - What you try to reject

H₁ (Alternative Hypothesis):
  - Represents 'there IS a change' or 'there IS an effect'
  - The interesting scenario
  - What you're hoping to prove
  - What you accept if null is rejected


KEY RULE: H₀ and H₁ are OPPOSITES
You always TEST the null (H₀)
If H₀ is rejected → H₁ is accepted


In [4]:
print("\nEXAMPLE 1: Drug Testing")
print("-"*70)
print("H₀: Drug has no effect (improvement = placebo)")
print("H₁: Drug has an effect (improvement ≠ placebo)")

print("\nEXAMPLE 2: Ad Performance")
print("-"*70)
print("H₀: New ad = old ad (click rate = 3%)")
print("H₁: New ad ≠ old ad (click rate ≠ 3%)")

print("\nEXAMPLE 3: Manufacturing Quality")
print("-"*70)
print("H₀: Production quality same (defect rate = 5%)")
print("H₁: Production quality different (defect rate ≠ 5%)")

print("\nEXAMPLE 4: Marketing Campaign")
print("-"*70)
print("H₀: Campaign doesn't work (no sales increase)")
print("H₁: Campaign works (sales increase)")


EXAMPLE 1: Drug Testing
----------------------------------------------------------------------
H₀: Drug has no effect (improvement = placebo)
H₁: Drug has an effect (improvement ≠ placebo)

EXAMPLE 2: Ad Performance
----------------------------------------------------------------------
H₀: New ad = old ad (click rate = 3%)
H₁: New ad ≠ old ad (click rate ≠ 3%)

EXAMPLE 3: Manufacturing Quality
----------------------------------------------------------------------
H₀: Production quality same (defect rate = 5%)
H₁: Production quality different (defect rate ≠ 5%)

EXAMPLE 4: Marketing Campaign
----------------------------------------------------------------------
H₀: Campaign doesn't work (no sales increase)
H₁: Campaign works (sales increase)


In [5]:
print("\nONE-TAILED vs TWO-TAILED TESTS")
print("="*70)
print("\nONE-TAILED: H₁ specifies direction")
print("  Example 1: 'New ad is BETTER' (click rate > 3%)")
print("  Example 2: 'New ad is WORSE' (click rate < 3%)")
print("\nTWO-TAILED: H₁ doesn't specify direction")
print("  Example: 'New ad is DIFFERENT' (click rate ≠ 3%)")
print("\nKey: Two-tailed is more conservative (harder to reject H₀)")


ONE-TAILED vs TWO-TAILED TESTS

ONE-TAILED: H₁ specifies direction
  Example 1: 'New ad is BETTER' (click rate > 3%)
  Example 2: 'New ad is WORSE' (click rate < 3%)

TWO-TAILED: H₁ doesn't specify direction
  Example: 'New ad is DIFFERENT' (click rate ≠ 3%)

Key: Two-tailed is more conservative (harder to reject H₀)


---

## SECTION 3: Type I and Type II Errors

In [6]:
print("TYPE I AND TYPE II ERRORS")
print("="*70)
print("\nType I Error (α - Alpha):")
print("  - You REJECT H₀ when it's ACTUALLY TRUE")
print("  - False positive (crying wolf)")
print("  - Probability = α (significance level)")
print("  - Example: Say 'drug works' when it actually doesn't")

print("\nType II Error (β - Beta):")
print("  - You FAIL TO REJECT H₀ when it's ACTUALLY FALSE")
print("  - False negative (missing real effect)")
print("  - Probability = β")
print("  - Example: Say 'drug doesn't work' when it actually does")

print("\n" + "="*70)
print("\nPOWER = 1 - β")
print("  - Probability of correctly detecting real effect")
print("  - Want power > 80% (ideally 80-90%)")

TYPE I AND TYPE II ERRORS

Type I Error (α - Alpha):
  - You REJECT H₀ when it's ACTUALLY TRUE
  - False positive (crying wolf)
  - Probability = α (significance level)
  - Example: Say 'drug works' when it actually doesn't

Type II Error (β - Beta):
  - You FAIL TO REJECT H₀ when it's ACTUALLY FALSE
  - False negative (missing real effect)
  - Probability = β
  - Example: Say 'drug doesn't work' when it actually does


POWER = 1 - β
  - Probability of correctly detecting real effect
  - Want power > 80% (ideally 80-90%)


In [7]:
# Create error table visualization
import pandas as pd

error_table = pd.DataFrame({
    'H₀ Actually TRUE': ['Type I Error (α)\n"False Positive"', 'Correct\n(1-α)'],
    'H₀ Actually FALSE': ['Correct\n(Power = 1-β)', 'Type II Error (β)\n"False Negative"']
}, index=['Reject H₀', 'Fail to Reject H₀'])

print("\nERROR TABLE (MEMORIZE THIS!)")
print("="*70)
print(error_table.to_string())

print("\n" + "="*70)
print("\nKey Insight: You can't eliminate both errors!")
print("Lower α (less Type I) → Higher β (more Type II)")
print("Lower β (less Type II) → Higher α (more Type I)")
print("Solution: Increase sample size to lower both!")


ERROR TABLE (MEMORIZE THIS!)
                                     H₀ Actually TRUE                    H₀ Actually FALSE
Reject H₀          Type I Error (α)\n"False Positive"               Correct\n(Power = 1-β)
Fail to Reject H₀                      Correct\n(1-α)  Type II Error (β)\n"False Negative"


Key Insight: You can't eliminate both errors!
Lower α (less Type I) → Higher β (more Type II)
Lower β (less Type II) → Higher α (more Type I)
Solution: Increase sample size to lower both!


In [8]:
print("\nEXAMPLE: Medical Testing")
print("="*70)
print("\nH₀: You don't have disease")
print("H₁: You have disease")

print("\nType I Error: Test says 'sick' but you're healthy")
print("  → Unnecessary treatment, anxiety, more tests")
print("  → Annoying but not dangerous")

print("\nType II Error: Test says 'healthy' but you're sick")
print("  → No treatment when you need it")
print("  → DANGEROUS!")

print("\nConclusion: In medical testing, we worry more about Type II!")
print("Solution: Use lower α (e.g., 0.01) to minimize Type II")


EXAMPLE: Medical Testing

H₀: You don't have disease
H₁: You have disease

Type I Error: Test says 'sick' but you're healthy
  → Unnecessary treatment, anxiety, more tests
  → Annoying but not dangerous

Type II Error: Test says 'healthy' but you're sick
  → No treatment when you need it
  → DANGEROUS!

Conclusion: In medical testing, we worry more about Type II!
Solution: Use lower α (e.g., 0.01) to minimize Type II


---

## SECTION 4: Significance Level (Alpha)

In [9]:
print("SIGNIFICANCE LEVEL (ALPHA)")
print("="*70)
print("\nWhat is α?")
print("  - Probability of Type I error you're willing to accept")
print("  - Your threshold for 'unusual'")
print("  - Decision rule: If p < α, reject H₀")
print("  - If p ≥ α, fail to reject H₀")

print("\nCommon values:")
print("  α = 0.05 (5%) - Most common in research and business")
print("  α = 0.01 (1%) - More conservative (medical, drug testing)")
print("  α = 0.10 (10%) - More lenient (exploratory research)")

print("\n" + "="*70)
print("\nWhat does α = 0.05 mean?")
print("\n'Out of 100 hypothesis tests where H₀ is ACTUALLY TRUE:")
print("  - 5 will incorrectly show 'significant' (false positive)")
print("  - 95 will correctly show 'not significant'")
print("\nTranslation: 'I'm willing to be wrong 5% of the time'")

print("\nWHAT IT DOES NOT MEAN:")
print("  ✗ 95% sure about my result")
print("  ✗ 95% chance effect is real")
print("  ✗ 5% chance of getting this data randomly")

SIGNIFICANCE LEVEL (ALPHA)

What is α?
  - Probability of Type I error you're willing to accept
  - Your threshold for 'unusual'
  - Decision rule: If p < α, reject H₀
  - If p ≥ α, fail to reject H₀

Common values:
  α = 0.05 (5%) - Most common in research and business
  α = 0.01 (1%) - More conservative (medical, drug testing)
  α = 0.10 (10%) - More lenient (exploratory research)


What does α = 0.05 mean?

'Out of 100 hypothesis tests where H₀ is ACTUALLY TRUE:
  - 5 will incorrectly show 'significant' (false positive)
  - 95 will correctly show 'not significant'

Translation: 'I'm willing to be wrong 5% of the time'

WHAT IT DOES NOT MEAN:
  ✗ 95% sure about my result
  ✗ 95% chance effect is real
  ✗ 5% chance of getting this data randomly


In [10]:
print("\nDECISION RULE")
print("="*70)
print("\nIF p-value < α: REJECT H₀")
print("  → Result is SIGNIFICANT")
print("  → Evidence supports H₁")
print("  → Conclude: Real effect likely exists")

print("\nIF p-value ≥ α: FAIL TO REJECT H₀")
print("  → Result is NOT SIGNIFICANT")
print("  → No strong evidence against H₀")
print("  → Conclude: Effect unclear or doesn't exist")

print("\n" + "="*70)
print("\nPractice Examples (α = 0.05):")
print("\np = 0.032 < 0.05 → REJECT H₀ → Significant!")
print("p = 0.051 ≥ 0.05 → FAIL TO REJECT → Not significant")
print("p = 0.008 < 0.05 → REJECT H₀ → Significant!")
print("p = 0.150 ≥ 0.05 → FAIL TO REJECT → Not significant")


DECISION RULE

IF p-value < α: REJECT H₀
  → Result is SIGNIFICANT
  → Evidence supports H₁
  → Conclude: Real effect likely exists

IF p-value ≥ α: FAIL TO REJECT H₀
  → Result is NOT SIGNIFICANT
  → No strong evidence against H₀
  → Conclude: Effect unclear or doesn't exist


Practice Examples (α = 0.05):

p = 0.032 < 0.05 → REJECT H₀ → Significant!
p = 0.051 ≥ 0.05 → FAIL TO REJECT → Not significant
p = 0.008 < 0.05 → REJECT H₀ → Significant!
p = 0.150 ≥ 0.05 → FAIL TO REJECT → Not significant


---

## SECTION 5: P-Values (THE CRITICAL CONCEPT!)

In [11]:
print("WHAT IS A P-VALUE?")
print("="*70)
print("\nDefinition (MEMORIZE THIS!):")
print("\nP-value = Probability of observing this data")
print("          IF the null hypothesis is TRUE")
print("\nFormula: P(data | H₀ is true)")
print("\nNOT: P(H₀ | data)  ← WRONG!")
print("\nKey difference: Direction of conditional probability matters!")

WHAT IS A P-VALUE?

Definition (MEMORIZE THIS!):

P-value = Probability of observing this data
          IF the null hypothesis is TRUE

Formula: P(data | H₀ is true)

NOT: P(H₀ | data)  ← WRONG!

Key difference: Direction of conditional probability matters!


In [12]:
print("\nP-VALUE EXAMPLE 1: Coin Flipping")
print("="*70)
print("\nScenario: Flip coin 10 times, get 9 heads")
print("\nH₀: Coin is fair (p=0.5 for heads)")
print("H₁: Coin is biased")

# Calculate probability of 9+ heads with fair coin
p_val = sum([binom.pmf(k, 10, 0.5) for k in range(9, 11)])
print(f"\np-value = P(9+ heads | coin is fair) = {p_val:.4f}")
print(f"\nInterpretation:")
print(f"'If coin is truly fair, only {p_val*100:.1f}% chance of seeing 9+ heads'")
print(f"\nConclusion:")
print(f"Either:")
print(f"  1. Coin is fair and we got super lucky ({p_val*100:.1f}% chance)")
print(f"  2. Coin is biased")
print(f"\nMost likely → Coin is BIASED!")


P-VALUE EXAMPLE 1: Coin Flipping

Scenario: Flip coin 10 times, get 9 heads

H₀: Coin is fair (p=0.5 for heads)
H₁: Coin is biased

p-value = P(9+ heads | coin is fair) = 0.0107

Interpretation:
'If coin is truly fair, only 1.1% chance of seeing 9+ heads'

Conclusion:
Either:
  1. Coin is fair and we got super lucky (1.1% chance)
  2. Coin is biased

Most likely → Coin is BIASED!


In [13]:
print("\nP-VALUE EXAMPLE 2: Ad Performance")
print("="*70)
print("\nScenario:")
print("  Old ad click rate: 3%")
print("  New ad tested on 1000 people: 4% click rate")
print("\nH₀: New ad = 3% (no difference)")
print("H₁: New ad ≠ 3% (different)")
print("\nSuppose p-value = 0.03")
print("\nInterpretation:")
print("'If new ad actually clicks at 3%, only 3% chance of seeing 4%'")
print("\nDecision:")
print("α = 0.05")
print("0.03 < 0.05 → REJECT H₀")
print("\nConclusion:")
print("'New ad probably better! Evidence supports H₁'")


P-VALUE EXAMPLE 2: Ad Performance

Scenario:
  Old ad click rate: 3%
  New ad tested on 1000 people: 4% click rate

H₀: New ad = 3% (no difference)
H₁: New ad ≠ 3% (different)

Suppose p-value = 0.03

Interpretation:
'If new ad actually clicks at 3%, only 3% chance of seeing 4%'

Decision:
α = 0.05
0.03 < 0.05 → REJECT H₀

Conclusion:
'New ad probably better! Evidence supports H₁'


In [14]:
print("\nCRITICAL MISCONCEPTIONS ABOUT P-VALUES")
print("="*70)
print("\n❌ MISCONCEPTION 1:")
print("'p = 0.03 means 97% chance effect is real'")
print("\n✓ REALITY:")
print("'If H₀ true, only 3% chance of this data'")
print("These are completely different!")

print("\n❌ MISCONCEPTION 2:")
print("'p-value = probability H₀ is true'")
print("\n✓ REALITY:")
print("'p-value = probability of data IF H₀ true'")
print("Direction of conditional probability is crucial!")

print("\n❌ MISCONCEPTION 3:")
print("'Small p-value = big effect'")
print("\n✓ REALITY:")
print("'Small p-value = rare data if H₀ true'")
print("Effect size is separate from p-value!")

print("\n❌ MISCONCEPTION 4:")
print("'p = 0.051 is almost significant'")
print("\n✓ REALITY:")
print("'0.051 > 0.05, so NOT significant'")
print("Binary threshold: either p < α or p ≥ α")

print("\n❌ MISCONCEPTION 5:")
print("'If p > 0.05, H₀ is true'")
print("\n✓ REALITY:")
print("'Data is consistent with H₀'")
print("Doesn't prove H₀ is true, just compatible with it")


CRITICAL MISCONCEPTIONS ABOUT P-VALUES

❌ MISCONCEPTION 1:
'p = 0.03 means 97% chance effect is real'

✓ REALITY:
'If H₀ true, only 3% chance of this data'
These are completely different!

❌ MISCONCEPTION 2:
'p-value = probability H₀ is true'

✓ REALITY:
'p-value = probability of data IF H₀ true'
Direction of conditional probability is crucial!

❌ MISCONCEPTION 3:
'Small p-value = big effect'

✓ REALITY:
'Small p-value = rare data if H₀ true'
Effect size is separate from p-value!

❌ MISCONCEPTION 4:
'p = 0.051 is almost significant'

✓ REALITY:
'0.051 > 0.05, so NOT significant'
Binary threshold: either p < α or p ≥ α

❌ MISCONCEPTION 5:
'If p > 0.05, H₀ is true'

✓ REALITY:
'Data is consistent with H₀'
Doesn't prove H₀ is true, just compatible with it


In [15]:
print("\nP-VALUE INTERPRETATION GUIDE")
print("="*70)
print("\np < 0.001:  Extremely strong evidence against H₀")
print("p < 0.01:   Very strong evidence against H₀")
print("p < 0.05:   Strong evidence against H₀ (conventional threshold)")
print("p < 0.10:   Modest evidence against H₀")
print("p ≥ 0.10:   Weak or no evidence against H₀")
print("\n" + "="*70)
print("\nSmall p-value = data is SURPRISING if H₀ true")
print("Large p-value = data is COMPATIBLE with H₀")


P-VALUE INTERPRETATION GUIDE

p < 0.001:  Extremely strong evidence against H₀
p < 0.01:   Very strong evidence against H₀
p < 0.05:   Strong evidence against H₀ (conventional threshold)
p < 0.10:   Modest evidence against H₀
p ≥ 0.10:   Weak or no evidence against H₀


Small p-value = data is SURPRISING if H₀ true
Large p-value = data is COMPATIBLE with H₀


---

## SECTION 6: The Complete 7-Step Framework

In [16]:
print("THE 7-STEP HYPOTHESIS TESTING FRAMEWORK")
print("="*70)
print("\nSTEP 1: DEFINE HYPOTHESES")
print("  ├─ State H₀: [null scenario]")
print("  ├─ State H₁: [alternative scenario]")
print("  └─ Determine: One-tailed or two-tailed?")
print("\nSTEP 2: CHOOSE SIGNIFICANCE LEVEL (α)")
print("  ├─ Typically α = 0.05")
print("  ├─ Medical? → α = 0.01 (very conservative)")
print("  └─ Exploratory? → α = 0.10 (more lenient)")
print("\nSTEP 3: COLLECT DATA")
print("  ├─ Random sample")
print("  ├─ Adequate sample size")
print("  └─ Proper methodology")
print("\nSTEP 4: CALCULATE TEST STATISTIC")
print("  ├─ Depends on test type (t, chi-square, etc.)")
print("  └─ Measures how far from H₀ prediction")
print("\nSTEP 5: CALCULATE P-VALUE")
print("  ├─ Probability of statistic this extreme if H₀ true")
print("  └─ Uses test statistic + appropriate distribution")
print("\nSTEP 6: MAKE DECISION")
print("  ├─ If p < α: REJECT H₀ (Significant!)")
print("  └─ If p ≥ α: FAIL TO REJECT H₀ (Not significant)")
print("\nSTEP 7: INTERPRET RESULTS")
print("  ├─ What does this mean in context?")
print("  ├─ Effect size (separate from p-value!)")
print("  └─ Practical importance + limitations")

THE 7-STEP HYPOTHESIS TESTING FRAMEWORK

STEP 1: DEFINE HYPOTHESES
  ├─ State H₀: [null scenario]
  ├─ State H₁: [alternative scenario]
  └─ Determine: One-tailed or two-tailed?

STEP 2: CHOOSE SIGNIFICANCE LEVEL (α)
  ├─ Typically α = 0.05
  ├─ Medical? → α = 0.01 (very conservative)
  └─ Exploratory? → α = 0.10 (more lenient)

STEP 3: COLLECT DATA
  ├─ Random sample
  ├─ Adequate sample size
  └─ Proper methodology

STEP 4: CALCULATE TEST STATISTIC
  ├─ Depends on test type (t, chi-square, etc.)
  └─ Measures how far from H₀ prediction

STEP 5: CALCULATE P-VALUE
  ├─ Probability of statistic this extreme if H₀ true
  └─ Uses test statistic + appropriate distribution

STEP 6: MAKE DECISION
  ├─ If p < α: REJECT H₀ (Significant!)
  └─ If p ≥ α: FAIL TO REJECT H₀ (Not significant)

STEP 7: INTERPRET RESULTS
  ├─ What does this mean in context?
  ├─ Effect size (separate from p-value!)
  └─ Practical importance + limitations


---

## SECTION 7: Complete Worked Example

In [17]:
print("COMPLETE EXAMPLE: Customer Satisfaction")
print("="*70)
print("\nQUESTION: Is customer satisfaction different from 4.0?")
print("\nSTEP 1: DEFINE HYPOTHESES")
print("-"*70)
print("H₀: μ = 4.0 (satisfaction is 4.0)")
print("H₁: μ ≠ 4.0 (satisfaction is different)")
print("Two-tailed test (no direction specified)")

print("\nSTEP 2: CHOOSE SIGNIFICANCE LEVEL")
print("-"*70)
print("α = 0.05 (standard level)")

print("\nSTEP 3: COLLECT DATA")
print("-"*70)
print("Survey 100 random customers")
print("Sample mean = 4.3")
print("Sample std dev = 0.8")

print("\nSTEP 4: CALCULATE TEST STATISTIC")
print("-"*70)
n = 100
sample_mean = 4.3
sample_std = 0.8
null_mean = 4.0

# t-statistic
t_stat = (sample_mean - null_mean) / (sample_std / np.sqrt(n))
print(f"t = (4.3 - 4.0) / (0.8 / √100)")
print(f"t = 0.3 / 0.08")
print(f"t = {t_stat:.2f}")

print("\nSTEP 5: CALCULATE P-VALUE")
print("-"*70)
# Two-tailed p-value
p_val = 2 * (1 - stats.t.cdf(abs(t_stat), df=n-1))
print(f"(Using t-distribution with df = {n-1})")
print(f"p-value = {p_val:.4f}")

print("\nSTEP 6: MAKE DECISION")
print("-"*70)
if p_val < 0.05:
    decision = "REJECT H₀"
    interpretation = "Significant"
else:
    decision = "FAIL TO REJECT H₀"
    interpretation = "Not significant"
    
print(f"p-value = {p_val:.4f}")
print(f"α = 0.05")
print(f"{p_val:.4f} < 0.05 → {decision}")
print(f"Result is {interpretation}")

print("\nSTEP 7: INTERPRET RESULTS")
print("-"*70)
print(f"With {(1-p_val)*100:.1f}% confidence,")
print(f"true customer satisfaction ≠ 4.0")
print(f"\nSpecifically:")
print(f"Sample mean = 4.3")
print(f"This is statistically significantly higher than 4.0")
print(f"\nEffect size: 0.3 points (practical significance?)")

COMPLETE EXAMPLE: Customer Satisfaction

QUESTION: Is customer satisfaction different from 4.0?

STEP 1: DEFINE HYPOTHESES
----------------------------------------------------------------------
H₀: μ = 4.0 (satisfaction is 4.0)
H₁: μ ≠ 4.0 (satisfaction is different)
Two-tailed test (no direction specified)

STEP 2: CHOOSE SIGNIFICANCE LEVEL
----------------------------------------------------------------------
α = 0.05 (standard level)

STEP 3: COLLECT DATA
----------------------------------------------------------------------
Survey 100 random customers
Sample mean = 4.3
Sample std dev = 0.8

STEP 4: CALCULATE TEST STATISTIC
----------------------------------------------------------------------
t = (4.3 - 4.0) / (0.8 / √100)
t = 0.3 / 0.08
t = 3.75

STEP 5: CALCULATE P-VALUE
----------------------------------------------------------------------
(Using t-distribution with df = 99)
p-value = 0.0003

STEP 6: MAKE DECISION
-----------------------------------------------------------------

---

## SECTION 8: Practice Problems

In [18]:
print("PRACTICE PROBLEM 1: Manufacturing Quality")
print("="*70)
print("\nScenario:")
print("  Old process: 5% defect rate")
print("  New process: 100 units tested, 3 defects (3%)")
print("\nH₀: New process = 5% (no improvement)")
print("H₁: New process ≠ 5% (different)")
print("α = 0.05")
print("\nQuestion: Is new process significantly different?")
print("\nAnswer:")
print("-"*70)
print("Observed: 3% defect rate")
print("Expected (H₀): 5% defect rate")
print("\nDifference seems small relative to variation")
print("p-value ≈ 0.12 (hypothetical)")
print("\nDecision: 0.12 > 0.05 → FAIL TO REJECT H₀")
print("Conclusion: NO significant improvement detected")
print("Why: 3% is consistent with 5% rate (could be random)")

PRACTICE PROBLEM 1: Manufacturing Quality

Scenario:
  Old process: 5% defect rate
  New process: 100 units tested, 3 defects (3%)

H₀: New process = 5% (no improvement)
H₁: New process ≠ 5% (different)
α = 0.05

Question: Is new process significantly different?

Answer:
----------------------------------------------------------------------
Observed: 3% defect rate
Expected (H₀): 5% defect rate

Difference seems small relative to variation
p-value ≈ 0.12 (hypothetical)

Decision: 0.12 > 0.05 → FAIL TO REJECT H₀
Conclusion: NO significant improvement detected
Why: 3% is consistent with 5% rate (could be random)


In [19]:
print("\nPRACTICE PROBLEM 2: Drug Effectiveness")
print("="*70)
print("\nScenario:")
print("  Standard treatment: 60% improvement rate")
print("  New drug: 200 patients, 140 improved (70%)")
print("\nH₀: New drug = 60% (no better than standard)")
print("H₁: New drug > 60% (better than standard)")
print("α = 0.05")
print("One-tailed test (direction specified)")
print("\nQuestion: Is new drug significantly better?")
print("\nAnswer:")
print("-"*70)
print("Observed: 70% improvement rate")
print("Expected (H₀): 60% improvement rate")
print("\nLarge difference with large sample size")
print("p-value ≈ 0.008 (hypothetical)")
print("\nDecision: 0.008 < 0.05 → REJECT H₀")
print("Conclusion: Strong evidence new drug is more effective!")
print("Why: 70% is unlikely if drug really 60%")


PRACTICE PROBLEM 2: Drug Effectiveness

Scenario:
  Standard treatment: 60% improvement rate
  New drug: 200 patients, 140 improved (70%)

H₀: New drug = 60% (no better than standard)
H₁: New drug > 60% (better than standard)
α = 0.05
One-tailed test (direction specified)

Question: Is new drug significantly better?

Answer:
----------------------------------------------------------------------
Observed: 70% improvement rate
Expected (H₀): 60% improvement rate

Large difference with large sample size
p-value ≈ 0.008 (hypothetical)

Decision: 0.008 < 0.05 → REJECT H₀
Conclusion: Strong evidence new drug is more effective!
Why: 70% is unlikely if drug really 60%


In [20]:
print("\nPRACTICE PROBLEM 3: Website Redesign")
print("="*70)
print("\nScenario:")
print("  Old site: 2.5% conversion rate")
print("  New site: 5000 visitors, 140 conversions (2.8%)")
print("\nH₀: New site = 2.5% (no improvement)")
print("H₁: New site ≠ 2.5% (different)")
print("α = 0.05")
print("\nQuestion: Is new site significantly different?")
print("\nAnswer:")
print("-"*70)
print("Observed: 2.8% conversion rate")
print("Expected (H₀): 2.5% conversion rate")
print("\nSmall difference with large sample")
print("p-value ≈ 0.31 (hypothetical)")
print("\nDecision: 0.31 > 0.05 → FAIL TO REJECT H₀")
print("Conclusion: No significant improvement detected")
print("Why: 2.8% is compatible with 2.5% (could be variation)")
print("\nNote: Effect size is small (0.3 points)")
print("      Even if significant, practical value questionable")


PRACTICE PROBLEM 3: Website Redesign

Scenario:
  Old site: 2.5% conversion rate
  New site: 5000 visitors, 140 conversions (2.8%)

H₀: New site = 2.5% (no improvement)
H₁: New site ≠ 2.5% (different)
α = 0.05

Question: Is new site significantly different?

Answer:
----------------------------------------------------------------------
Observed: 2.8% conversion rate
Expected (H₀): 2.5% conversion rate

Small difference with large sample
p-value ≈ 0.31 (hypothetical)

Decision: 0.31 > 0.05 → FAIL TO REJECT H₀
Conclusion: No significant improvement detected
Why: 2.8% is compatible with 2.5% (could be variation)

Note: Effect size is small (0.3 points)
      Even if significant, practical value questionable


---

## SECTION 9: Key Takeaways

In [21]:
print("\n" + "="*70)
print("KEY CONCEPTS - PART 5 COMPLETE")
print("="*70)

print("\n1. HYPOTHESIS TESTING")
print("   └─ Statistical detective work to detect real effects")
print("\n2. H₀ vs H₁")
print("   └─ H₀ = boring (test this), H₁ = interesting (accept if H₀ rejected)")
print("\n3. ERRORS")
print("   └─ Type I (α) = false positive, Type II (β) = false negative")
print("   └─ Can't eliminate both - trade-off exists")
print("\n4. SIGNIFICANCE LEVEL (α)")
print("   └─ Threshold for decision: p < α → reject H₀")
print("   └─ Standard: α = 0.05 (willing to be wrong 5% of time)")
print("\n5. P-VALUE")
print("   └─ P(data | H₀ true) NOT P(H₀ | data)")
print("   └─ Small p = data is surprising if H₀ true")
print("\n6. 7-STEP FRAMEWORK")
print("   └─ Define → α → Collect → Calculate → p-value → Decide → Interpret")
print("   └─ Same framework for ALL statistical tests")
print("="*70)


KEY CONCEPTS - PART 5 COMPLETE

1. HYPOTHESIS TESTING
   └─ Statistical detective work to detect real effects

2. H₀ vs H₁
   └─ H₀ = boring (test this), H₁ = interesting (accept if H₀ rejected)

3. ERRORS
   └─ Type I (α) = false positive, Type II (β) = false negative
   └─ Can't eliminate both - trade-off exists

4. SIGNIFICANCE LEVEL (α)
   └─ Threshold for decision: p < α → reject H₀
   └─ Standard: α = 0.05 (willing to be wrong 5% of time)

5. P-VALUE
   └─ P(data | H₀ true) NOT P(H₀ | data)
   └─ Small p = data is surprising if H₀ true

6. 7-STEP FRAMEWORK
   └─ Define → α → Collect → Calculate → p-value → Decide → Interpret
   └─ Same framework for ALL statistical tests
