### **Week 9: Statistical Data Analysis (Part II)**
**Objective**: Deepen the understanding of hypothesis testing with advanced statistical methods such as ANOVA and interpreting confidence intervals. Students will learn to compare data across multiple groups and interpret their findings statistically.

### **1. Recap of t-tests**
#### **Concept**: Reinforce the understanding of the different types of t-tests used to compare means in data.

#### **Topics & Key Concepts**:
- **One-sample t-test**: Compare sample mean to a known or theoretical mean.
- **Two-sample t-test**: Compare means of two independent groups.
- **Paired t-test**: Compare means of two related groups (e.g., before and after treatment).

#### **Examples**:
1. **Paired t-test with Blood Pressure Data**:
   - **Scenario**: Measuring blood pressure in a sample of patients before and after taking a medication.

In [1]:
from scipy import stats
import numpy as np

# Simulated blood pressure data (before and after medication)
before = np.random.normal(loc=140, scale=12, size=30)
after = np.random.normal(loc=130, scale=12, size=30)

# Perform paired t-test
t_stat, p_value = stats.ttest_rel(before, after)

print(f"Paired t-test results: T-statistic = {t_stat:.2f}, P-value = {p_value:.4f}")

Paired t-test results: T-statistic = 3.98, P-value = 0.0004


**Explanation**: The test checks if the medication significantly lowered blood pressure.

#### **Hands-On Exercise**:
- Perform a two-sample t-test to compare cholesterol levels between two patient groups: one on a new diet and another on a traditional diet.
- Use a paired t-test to evaluate the effect of a treatment on glucose levels in a diabetic group before and after intervention.

---


### **2. ANOVA for Comparing Multiple Groups**
#### **Concept**: ANOVA (Analysis of Variance) allows comparison of means across three or more groups to determine if there are any statistically significant differences.


#### **Topics & Key Concepts**:
- **What is ANOVA**: Test to identify differences among group means.
- **When to Use ANOVA**: When comparing more than two groups.
- **Types of ANOVA**:
  - **One-way ANOVA**: One independent variable (e.g., comparing drug effects).
  - **Two-way ANOVA**: Two independent variables (e.g., drug effect and gender).

#### **Examples**:
1. **One-way ANOVA with Plant Growth Data**:
   - **Scenario**: Comparing the effects of three different fertilizers on plant growth.

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

# Simulated dataset of plant height with three different fertilizers
data = {
    'Fertilizer': ['A'] * 10 + ['B'] * 10 + ['C'] * 10,
    'Plant_Height': np.random.normal(25, 3, 10).tolist() + 
                    np.random.normal(30, 3, 10).tolist() + 
                    np.random.normal(28, 3, 10).tolist()
}
df = pd.DataFrame(data)

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(df[df['Fertilizer'] == 'A']['Plant_Height'],
                                df[df['Fertilizer'] == 'B']['Plant_Height'],
                                df[df['Fertilizer'] == 'C']['Plant_Height'])

print(f"ANOVA results: F-statistic = {f_stat:.2f}, P-value = {p_value:.4f}")

ANOVA results: F-statistic = 8.34, P-value = 0.0015


**Explanation**: A significant result suggests that at least one fertilizer impacts growth differently.

#### **Hands-On Exercise**:
- Perform a one-way ANOVA to compare glucose levels in three different patient groups receiving different treatments.
- Compare the growth of bacteria in different chemical environments using ANOVA.

---

### **3. Confidence Intervals and Error Margins**
#### **Concept**: Confidence intervals (CIs) provide a range of values that are likely to contain the true population parameter. They quantify the uncertainty of an estimate.


#### **Topics & Key Concepts**:
- **Definition of Confidence Interval**: Interpretation of a 95% confidence interval.
- **Calculating CIs**: Use of mean, standard deviation, and sample size.
- **Error Margins**: Understanding how error margins reflect data precision.

#### **Examples**:
1. **Confidence Interval for Average Blood Sugar Level**:
   - **Scenario**: Estimating the average blood sugar level in a sample.

In [3]:
import numpy as np
import scipy.stats as stats

# Sample blood sugar levels
blood_sugar = np.random.normal(110, 15, size=50)

# Calculate the 95% confidence interval
mean = np.mean(blood_sugar)
std_err = stats.sem(blood_sugar)  # Standard error
ci = stats.t.interval(0.95, len(blood_sugar)-1, loc=mean, scale=std_err)

print(f"Mean Blood Sugar: {mean:.2f} mg/dL")
print(f"95% Confidence Interval: ({ci[0]:.2f}, {ci[1]:.2f}) mg/dL")

Mean Blood Sugar: 112.91 mg/dL
95% Confidence Interval: (108.21, 117.62) mg/dL


**Explanation**: The CI shows the range within which the true average likely falls.


#### **Hands-On Exercise**:
- Calculate the 95% confidence interval for the mean cholesterol levels of a patient sample.
- Estimate the average protein concentration in a chemical experiment with a given dataset and find the confidence interval.

---

### **4. Practical Assignment**
1. **Dataset Overview**:
   - **Dataset**: Laboratory results comparing effects of three different supplements on heart health (cholesterol, blood pressure, glucose).
   
2. **Statistical Analysis**:
   - *Perform a one-way ANOVA to compare the impact of supplements on cholesterol levels.*
   - *Calculate and interpret the 95% confidence interval for each group's average cholesterol level.*

3. **Report Findings**:
   - *Interpret the results of the ANOVA. Are the differences significant?*
   - *Explain what the confidence intervals tell about the group means.*

4. **Presentation**:
   - *Visualize the data using box plots and bar charts to summarize findings.*
   - *Discuss which supplements show statistically significant differences and how the confidence intervals support the conclusions.*


### **Week Recap**
- **Concepts Mastered**: ANOVA for comparing multiple groups, confidence intervals, p-values, statistical significance.
- **Skills Gained**: Ability to perform ANOVA, interpret confidence intervals, and apply statistical methods to real-world biology and chemistry datasets.