In [7]:
import numpy as np
import scipy.stats as stats

---

### **Introduction to Hypothesis Testing & Problem Types**
**What is the problem?**  
In real-world data analysis, we often need to make decisions based on incomplete information. For example:  
- Does a new drug lower blood pressure more effectively than the existing one?  
- Does a redesigned website increase sales?  
- Are customer preferences for a product linked to their geographic region?  

**Hypothesis tests** are statistical tools that help answer these questions objectively. They determine whether observed differences or relationships in data are statistically significant (likely real) or due to random chance.  

---

### **1. t-Test**  
**Problem Example**:  
A coffee chain wants to know if a new espresso blend increases average daily sales compared to the old blend. They test the new blend in 10 stores and compare sales to 10 stores using the old blend.  

**What does it solve?**  
Compares the means of **two groups** (e.g., old vs. new) when sample sizes are small (<30) or population variance is unknown.  

**Step-by-Step**:  
1. **Define hypotheses**:  
   - Null (H₀): No difference in sales (mean_old = mean_new).  
   - Alternative (H₁): New blend increases sales (mean_new > mean_old).  
2. **Collect data**: Record daily sales for both groups.  
3. **Calculate t-statistic**:  
   t-test formula = 
   (Difference in means) / (Standard error of the difference).  
4. **Compare to critical value**: If the t-statistic exceeds a threshold (from t-tables), reject H₀.  

In [8]:

# Sample sales data (old vs. new blend)
old_sales = [100, 105, 98, 90, 110]
new_sales = [115, 120, 112, 108, 118]

## t-test
t_stat, p_value = stats.ttest_ind(new_sales, old_sales)
print(f"p-value: {p_value:.4f}")  

# Judgment:
if p_value < 0.05:
    print("Adopt the new espresso blend, it works!")
else:
    print("No significant improvement. Keep the old blend.")

p-value: 0.0080
Adopt the new espresso blend, it works!


---

### **2. z-Test**  
**Problem Example**:  
A battery manufacturer claims their batteries last 10 hours. A sample of 50 batteries has an average lifespan of 10.3 hours with a known population standard deviation of 0.5 hours. Is this difference significant?  

**What does it solve?**  
Compares a sample mean to a population mean **when sample sizes are large (≥30) and population variance is known**.  

**Step-by-Step**:  
1. **Define hypotheses**:  
   - H₀: Battery life = 10 hours.  
   - H₁: Battery life ≠ 10 hours.  
2. **Calculate z-score**:  
   z-test =  
   (Sample mean - Population mean) / (Population SD / √n).  
3. **Compare to z-table**: If |z-score| > 1.96 (for 95% confidence), reject H₀.  


In [6]:

population_mean = 10
sample_mean = 10.3
population_std = 0.5
n = 50

z_score = (sample_mean - population_mean) / (population_std / (n**0.5))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))  # Two-tailed test


print(f"p-value: {p_value:.4f}")  
if p_value < 0.05:
    print("Battery life is significantly different. Update marketing claims.")
else:
    print("No significant difference. Maintain the original claim.")

p-value: 0.0000
Battery life is significantly different. Update marketing claims.


---

### **3. A/B Test**  
**Problem Example**:  
An e-commerce company tests two email subject lines (A: "Sale Ends Soon!" vs. B: "Last Chance for 50% Off!") to see which drives more clicks.  

**What does it solve?**  
Compares **two versions** (A/B) of a product, webpage, or campaign to determine which performs better. Uses t-tests or z-tests under the hood.  

**Step-by-Step**:  
1. **Randomly assign users** to Group A or B.  
2. **Measure metric** (e.g., click-through rate).  
3. **Run t-test or z-test** on the metric.  
4. **Conclude**: If Group B’s CTR is significantly higher, adopt it.  

In [9]:
# Using t-test for A/B testing (small sample)
clicks_A = [120, 115, 125, 110, 130]  # 500 emails each
clicks_B = [140, 135, 145, 130, 150]


t_stat, p_value = stats.ttest_ind(clicks_B, clicks_A)

print(f"p-value: {p_value:.4f}")  
if p_value < 0.05:
    print("Launch Version B—it drives more clicks!")
else:
    print("No significant difference. Keep testing.")

p-value: 0.0039
Launch Version B—it drives more clicks!


---

### **4. Chi-Square Test**  
**Problem Example**:  
A retailer wants to know if customer preference for product categories (Electronics, Apparel, Home) differs across regions (North, South, East).  

**What does it solve?**  
Tests relationships between **categorical variables** (e.g., region vs. product preference).  

**Step-by-Step**:  
1. **Create contingency table**:  
   | Region  | Electronics | Apparel | Home |  
   |---------|-------------|---------|------|  
   | North   | 50          | 30      | 20   |  
   | South   | 40          | 45      | 15   |  
2. **Calculate expected counts** (assuming no relationship).  
3. **Chi-square statistic**: Σ[(Observed - Expected)² / Expected].  
4. **Compare to critical value**: High chi-square → reject H₀.  

In [10]:
import numpy as np
from scipy.stats import chi2_contingency

# Contingency table
data = np.array([[50, 30, 20], [40, 45, 15]])

chi2_stat, p_value, dof, expected = chi2_contingency(data)

print(f"p-value: {p_value:.4f}")  
if p_value < 0.05:
    print("Preferences differ by region. Regionalize product offerings.")
else:
    print("No regional differences. Use a national strategy.")

p-value: 0.0896
No regional differences. Use a national strategy.


---

### **5. ANOVA (Analysis of Variance)**  
**Problem Example**:  
A hospital tests three painkillers (A, B, C) to see if recovery times differ.  

**What does it solve?**  
Compares means of **three or more groups** (e.g., multiple treatments).  

**Step-by-Step**:  
1. **Define hypotheses**:  
   - H₀: All painkillers have the same recovery time.  
   - H₁: At least one differs.  
2. **Calculate F-statistic**:  
   (Variance between groups) / (Variance within groups).  
3. **Compare to F-distribution**: High F-value → reject H₀.  


In [12]:
from scipy.stats import f_oneway

# Recovery times (days) for three drugs
drug_A = [3, 4, 5, 4, 3]
drug_B = [5, 6, 7, 6, 5]
drug_C = [8, 9, 7, 8, 9]

f_stat, p_value = f_oneway(drug_A, drug_B, drug_C)

print(f"p-value: {p_value:.4f}")  
if p_value < 0.05:
    print("Recovery times differ. Investigate which drug is best.")
else:
    print("All drugs perform similarly. Optimize for cost/safety.")

p-value: 0.0000
Recovery times differ. Investigate which drug is best.


---

### **Choosing the Right Test**  
1. **Comparing means**:  
   - **Two groups**: t-test (small samples) or z-test (large samples).  
   - **Three+ groups**: ANOVA.  
2. **Categorical data**: Chi-square test.  
3. **A/B testing**: Use t-test/z-test for continuous metrics (e.g., revenue), chi-square for proportions (e.g., conversion rates).  

