# Statistical Inference

In this notebook:
1. **Hypothesis Testing**: Null and Alternative Hypotheses, p-value, Significance Levels (α)
2. **Confidence Intervals**: Estimating population parameters
3. **Z-test, t-test, Chi-Square Test, ANOVA**: Parametric and Non-parametric tests
4. **Type I and Type II Errors**: False positives and false negatives
5. **Power of a Test**: Understanding how likely a test is to detect an effect

**Requirements:**

`pip install numpy scipy statsmodels`

## 1. Hypothesis Testing

Hypothesis Testing is a way to make decisions or conclusions about data.

You have two ideas:
1. Default
2. Alternative

Then you use your data to decide which idea is more likely to be true

### Key Concepts:

- **Null Hypothesis ($H₀$)**: This is the starting assumption that there is no effect or no difference between groups or things you're testing. It's like saying, "Nothing is happening here".
- **Althernative Hypothesis($H₁$)**: This challenges the null hypothesis, suggesting that there is an efect or difference. It's like saying, "Something is happening here".
- **p-value**: This is a number that tell syou how likely it is to see your data(or something even more unusual) if the null hyothesis were true. A small p-value means your data is strange enought that it makes you doubt the null hypothesis.
- **Significance Level ($α$)**: This is a cut-off point you choose beforehand (like 0.05 or 0.01). It helps you decide whether the p-value is small enough to reject the null hypothesis. If the p-value is less than $a$, you can say, "It's unlikely the null hypothesis is true, so I'll reject it" 

**Real-World Example:**

- A/B Testing: A company tests whether a new landing page (B) leads to higher conversions than the old page (A).
  - H₀: Conversion rates for page A and page B are the same.
  - H₁: Conversion rates for page B are higher than for page A.

In [5]:
from scipy import stats
import numpy as np

# Simulate conversion data for two groups
np.random.seed(55)
group_a = np.random.binomial(1, 0.1, size=1000)  # 10% conversion for A
group_b = np.random.binomial(1, 0.12, size=1000) # 12% conversion for B

# Perform a two-sample t-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)

print(f"t-statistic: {t_stat}, p-value: {p_value}")

# Check if your p-value is less than your significance level (a)
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis (significant difference).")
else:
    print("Fail to reject the null hypothesis (no significant difference).")


t-statistic: -2.239574581444182, p-value: 0.025228165339669376
Reject the null hypothesis (significant difference).


## 2. Confidence Intervals

A confidence interval provides a range of values, derived from sample data, that is likely to contain the population parameter. A 95% confidence interval means that if we were to take 100 different samples and compute a confidence interval for each sample, we expect about 95 of the intervals to contain the population mean.

### Real-World Example

- **Poll Results:** You take a sample of 500 people to estimate the average height of the population. A 95% confidence interval gives you a range within which the true average height lies with 95% confidence.

In [6]:
import scipy.stats as stats
import numpy as np

# Simulate sample data
np.random.seed(42)
sample = np.random.normal(170, 10, 100)  # Sample mean height of 170 cm, std dev 10 cm

# Calculate 95% confidence interval
mean = np.mean(sample)
confidence_interval = stats.t.interval(confidence=0.95, df=len(sample)-1, loc=mean, scale=stats.sem(sample))

print(f"Mean height: {mean.round(2)}")
print(f"95% Confidence Interval: {confidence_interval[0].round(2)}")

Mean height: 168.96
95% Confidence Interval: 167.16


## 3. Z-test, t-test, Chi-Square Test, ANOVA

**Key Concepts**:

- **Z-test**: Used when the sample size is large ($n > 30$) and population variance is known.
- **t-test**: Used when the sample size is small ($n < 30$) or population variance is unkown.
- **Chi-Square Test**: Used to test for independence between categorical variables (e.g., customer preferences across different age groups).
- **ANOVA**: Compares the mean sof three or more groups to see if at least one mean is different.

## Real-World Example:

- **Anova**: A pharmaceutical company wants to test the effectiveness of three different drugs on patients' blood pressure. ANOVA can determine if there is a significant difference in effectiveness between the three drugs.

In [7]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Simulate data
np.random.seed(42)
data = {
    'drug': np.repeat(['DrugA', 'DrugB', 'DrugC'], 30),
    'bp_reduction': np.concatenate([
        np.random.normal(5, 1, 30),  # Drug A
        np.random.normal(7, 1.2, 30), # Drug B
        np.random.normal(6.5, 1.5, 30) # Drug C
    ])
}

df = pd.DataFrame(data)

# Perform ANOVA
model = ols('bp_reduction ~ C(drug)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


              sum_sq    df          F        PR(>F)
C(drug)    72.007144   2.0  25.280509  2.210744e-09
Residual  123.902203  87.0        NaN           NaN


## 4. Type I and Type II Errors

### Key Concepts:
- **Type I Error (False Positive)**: Rejecting the null hypothesis when it is true. You conclude there's an effect when ther isn't one.
  - Ex: Thinking a drug works when it doesn't.
- **Type II Error (False Negative)**: Failing to reject the null hypothesis when it is false. You conclude there is no effect when there is one.
  - ExL Thinking a drug doesn't work when it does.

### Real World Examples

- **Drug Testing**: In clinical trials, a Type I error would mean falsely claiming a new drug is effective, while a Type II error would mean failing to detect that the drug is effective when it actually is.

In [8]:
# Simulate hypothesis testing with known population parameters
def hypothesis_test(n=100, alpha=0.05, true_effect=False):
    np.random.seed(50)
    population_mean = 100
    if true_effect:
        sample = np.random.normal(population_mean + 2, 10, n)  # Adding a true effect
    else:
        sample = np.random.normal(population_mean, 10, n)  # No effect

    # Perform one-sample t-test
    t_stat, p_value = stats.ttest_1samp(sample, population_mean)
    
    print(f"P-value: {p_value}")
    if p_value < alpha:
        print("Reject H0 (possible Type I error if true_effect=False)")
    else:
        print("Fail to reject H0 (possible Type II error if true_effect=True)")

# Example of Type I error
print("Testing for Type I Error:")
hypothesis_test(n=100, true_effect=False)

# Example of Type II error
print("\nTesting for Type II Error:")
hypothesis_test(n=100, true_effect=True)


Testing for Type I Error:
P-value: 0.7120474714276352
Fail to reject H0 (possible Type II error if true_effect=True)

Testing for Type II Error:
P-value: 0.02876240594352344
Reject H0 (possible Type I error if true_effect=False)


## 5. Power of a Test

**Key Concepts**:
- **Power**: 
  - The probability that a test will correctly reject the null hypotheisis
  - Ex: Detect an effect if there is one
  - A higher power means the test is more likely to detect true effects.

Power is infuenced by:
  - Sample size
  - Effect size 
  - Significance level ($a$).

### Real-World Example

- **Medical Trials:**
  - In designing a clinical trial, the power of the test helps researchers determine how large a sample they need to detect a meaningful difference in patient outcomes between treatment groups.

In [12]:
from statsmodels.stats.power import TTestIndPower

# Parameters for power analysis
effect_size = 0.5 # Medium Effect Size
alpha = 0.05 # Significance level
power = 0.8 # Desired power
sample_size = np.arange(10, 500)

power_analysis = TTestIndPower()
sample_sizes_needed = power_analysis.solve_power(
    effect_size=effect_size,
    alpha=alpha,
    power=power,
    alternative='larger'
)

print(f'Sample size needed for 80% power: {round(sample_sizes_needed)}')

Sample size needed for 80% power: 50


# Summary

| **Concept**                 | **When to Use**                                         | **Real-World Example**                               |
|-----------------------------|--------------------------------------------------------|------------------------------------------------------|
| **Hypothesis Testing**       | To test a claim or assumption about a population       | A/B test for website conversion rates                |
| **Confidence Intervals**     | To estimate a range for a population parameter         | Estimating average height from a sample              |
| **Z-test**                  | Comparing means when the sample is large (>30)         | Testing if the average salary in a city has changed  |
| **t-test**                  | Comparing means when the sample is small (<30)         | Comparing average test scores between two small classes |
| **Chi-Square Test**          | Testing relationships between categorical variables    | Testing if customer preferences vary by region       |
| **ANOVA**                   | Comparing means of 3+ groups to check for differences  | Comparing the effectiveness of 3 different drugs     |
| **Type I Error** (False Pos) | Mistakenly detecting an effect when none exists        | Concluding a drug works when it doesn't              |
| **Type II Error** (False Neg)| Missing an effect when one exists                      | Failing to detect a working drug                     |
| **Power of a Test**          | Checking the test’s ability to detect a true effect    | Ensuring a medical trial has enough participants     |
