In [None]:
'''
### ‚ùñ **Basics of Hypothesis Testing**

**Q: What is hypothesis testing in statistics?**
**A:** Hypothesis testing is a statistical method used to make inferences or decisions about population parameters based on sample data. It involves testing an assumption (the hypothesis) using probability theory.

---

**Q: What is the null hypothesis, and how does it differ from the alternative hypothesis?**
**A:** The **null hypothesis (H‚ÇÄ)** states that there is no effect or difference. The **alternative hypothesis (H‚ÇÅ or Ha)** is what you want to prove‚Äîindicating there is an effect or difference. Hypothesis testing evaluates whether data provides enough evidence to reject H‚ÇÄ in favor of H‚ÇÅ.

---

**Q: What is the significance level in hypothesis testing, and why is it important?**
**A:** The significance level (Œ±) is the probability of rejecting the null hypothesis when it is actually true (Type I error). Common values are 0.05, 0.01, etc. It determines the threshold for considering a result statistically significant.

---

### ‚ùñ **Understanding P-values and Errors**

**Q: What does a P-value represent in hypothesis testing?**
**A:** A **P-value** is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true. It helps assess the strength of the evidence against H‚ÇÄ.

---

**Q: How do you interpret the P-value in hypothesis testing?**
**A:**
- If **P ‚â§ Œ±**, reject the null hypothesis (significant result).
- If **P > Œ±**, fail to reject the null hypothesis (not significant).

---

**Q: What are Type I and Type II errors in hypothesis testing?**
**A:**
- **Type I error (Œ±):** Rejecting H‚ÇÄ when it is true.
- **Type II error (Œ≤):** Failing to reject H‚ÇÄ when H‚ÇÅ is true.

---

### ‚ùñ **Test Types and Distributions**

**Q: What is the difference between a one-tailed and a two-tailed test in hypothesis testing?**
**A:**
- **One-tailed test:** Tests for an effect in one direction (greater than or less than).
- **Two-tailed test:** Tests for any significant difference (in either direction).

---

**Q: What is the Z-test, and when is it used in hypothesis testing?**
**A:** A **Z-test** is used when the population variance is known, and the sample size is large (n ‚â• 30). It tests whether the sample mean significantly differs from a known population mean.

---

**Q: How do you calculate the Z-score, and what does it represent in hypothesis testing?**
**A:**
\[
Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}
\]
Where:
- \( \bar{X} \): sample mean
- \( \mu \): population mean
- \( \sigma \): population standard deviation
- \( n \): sample size
It tells how many standard deviations a data point is from the mean.

---

**Q: What is the T-distribution, and when should it be used instead of the normal distribution?**
**A:** The **T-distribution** is used when the sample size is small (n < 30) and the population standard deviation is unknown. It is wider and has heavier tails than the normal distribution.

---

**Q: What is the T-test, and how is it used in hypothesis testing?**
**A:** A **T-test** compares the means of two groups or a sample mean with a population mean. It helps determine if differences are statistically significant.

---

**Q: What is the difference between a Z-test and a T-test?**
**A:**
- **Z-test:** Known population variance, large sample size.
- **T-test:** Unknown population variance, small sample size.

---

**Q: What is the relationship between Z-test and T-test in hypothesis testing?**
**A:** Both are used to compare means. The **T-test** converges to the **Z-test** as the sample size increases. The key difference lies in whether population variance is known and the sample size.

---

### ‚ùñ **Confidence Intervals and Bayes' Theorem**

**Q: What is a confidence interval, and how is it used to interpret statistical results?**
**A:** A **confidence interval (CI)** provides a range of values within which the true population parameter is likely to lie, with a certain level of confidence (e.g., 95%). It reflects the precision of the estimate.

---

**Q: What is the margin of error, and how does it affect the confidence interval?**
**A:** The **margin of error** quantifies the uncertainty in the estimate. A larger margin of error results in a wider confidence interval, indicating less precision.

---

**Q: How is Bayes' Theorem used in statistics, and what is its significance?**
**A:** **Bayes' Theorem** calculates the probability of an event based on prior knowledge and new evidence. It‚Äôs fundamental in Bayesian inference and decision-making under uncertainty.

---

### ‚ùñ **Chi-Square and F-Tests**

**Q: What is the Chi-square distribution, and when is it used?**
**A:** The **Chi-square distribution** is used for categorical data to test relationships between variables or goodness of fit. It's skewed and depends on degrees of freedom.

---

**Q: What is the Chi-square goodness of fit test, and how is it applied?**
**A:** It tests whether observed frequencies differ significantly from expected frequencies in categorical data. It helps determine if a sample matches a known distribution.

---

**Q: What is the F-distribution, and when is it used in hypothesis testing?**
**A:** The **F-distribution** is used to compare variances between two populations, primarily in ANOVA and F-tests. It is positively skewed and used for testing multiple group means.

---

### ‚ùñ **ANOVA and F-tests**

**Q: What is an ANOVA test, and what are its assumptions?**
**A:** **ANOVA (Analysis of Variance)** tests if there are significant differences among group means.
**Assumptions:**
- Independence of observations
- Normally distributed populations
- Equal variances (homoscedasticity)

---

**Q: What are the different types of ANOVA tests?**
**A:**
- **One-way ANOVA:** One independent variable
- **Two-way ANOVA:** Two independent variables
- **Repeated measures ANOVA:** Same subjects measured under different conditions

---

**Q: What is the F-test, and how does it relate to hypothesis testing?**
**A:** An **F-test** compares two variances or multiple group means (in ANOVA). It‚Äôs used to test whether group differences are significant or occurred by chance.

---
**practical
Wow, this is an awesome list of **Python-based hypothesis testing tasks**‚Äîperfect for deep-dive interview prep or academic projects! üöÄ Below is a structured breakdown of each item with a **brief explanation** and a **starter Python implementation** to help you out.

---

## üî∑ **Z-Test Related Tasks**

### 1. **Z-Test for Comparing Sample Mean to Population Mean**
```python
from scipy import stats
import numpy as np

def z_test(sample_data, population_mean, population_std):
    sample_mean = np.mean(sample_data)
    n = len(sample_data)
    z = (sample_mean - population_mean) / (population_std / np.sqrt(n))
    p = 2 * (1 - stats.norm.cdf(abs(z)))
    return z, p

# Example
data = np.random.normal(52, 5, 30)
z_score, p_value = z_test(data, population_mean=50, population_std=5)
print(f"Z-score: {z_score}, P-value: {p_value}")
```

---

### 2. **Simulate Data & Calculate P-Value**
```python
sample = np.random.normal(100, 10, 100)
z, p = z_test(sample, 98, 10)
print(f"P-value: {p}")
```

---

### 3. **One-Sample Z-Test Implementation**
Same as task 1, reuse the `z_test()` function.

---

### 4. **Two-Tailed Z-Test Visualization**
```python
import matplotlib.pyplot as plt

def plot_z_test(z_score, alpha=0.05):
    x = np.linspace(-4, 4, 1000)
    y = stats.norm.pdf(x)

    plt.plot(x, y)
    plt.fill_between(x, y, where=(x <= stats.norm.ppf(alpha/2)), color='red', alpha=0.5)
    plt.fill_between(x, y, where=(x >= stats.norm.ppf(1 - alpha/2)), color='red', alpha=0.5)
    plt.axvline(z_score, color='blue', linestyle='--', label=f"Z = {z_score:.2f}")
    plt.legend()
    plt.title("Two-tailed Z-test Decision Region")
    plt.show()

plot_z_test(z_score)
```

---

## üî∑ **Error Visualization**

### 5. **Visualize Type 1 and Type 2 Errors**
```python
def plot_type1_type2(mu0=0, mu1=1, sigma=1, alpha=0.05):
    x = np.linspace(-4, 6, 1000)
    y0 = stats.norm.pdf(x, mu0, sigma)
    y1 = stats.norm.pdf(x, mu1, sigma)
    z_critical = stats.norm.ppf(1 - alpha)

    plt.plot(x, y0, label='H0 Distribution')
    plt.plot(x, y1, label='H1 Distribution')
    plt.axvline(z_critical, color='black', linestyle='--', label='Critical Value')

    plt.fill_between(x, y0, where=(x >= z_critical), color='red', alpha=0.3, label='Type I Error')
    plt.fill_between(x, y1, where=(x < z_critical), color='blue', alpha=0.3, label='Type II Error')
    plt.legend()
    plt.title("Type I and Type II Errors")
    plt.show()

plot_type1_type2()
```

---

## üî∑ **T-Test Related Tasks**

### 6. **Independent T-Test**
```python
from scipy.stats import ttest_ind

group1 = np.random.normal(60, 5, 30)
group2 = np.random.normal(62, 5, 30)

t_stat, p_val = ttest_ind(group1, group2)
print(f"T-statistic: {t_stat}, P-value: {p_val}")
```

---

### 7. **Paired Sample T-Test**
```python
from scipy.stats import ttest_rel

before = np.random.normal(100, 10, 30)
after = before + np.random.normal(1, 5, 30)

t_stat, p_val = ttest_rel(before, after)
print(f"Paired T-statistic: {t_stat}, P-value: {p_val}")
```

---

### 8. **Compare Z-Test and T-Test**
```python
# Run both on the same sample
t_stat, p_t = stats.ttest_1samp(data, 50)
z_stat, p_z = z_test(data, 50, 5)

print(f"T-test P: {p_t}, Z-test P: {p_z}")
```

---

### 9. **Confidence Interval Function**
```python
def confidence_interval(data, confidence=0.95):
    mean = np.mean(data)
    sem = stats.sem(data)
    margin = sem * stats.t.ppf((1 + confidence) / 2, len(data) - 1)
    return (mean - margin, mean + margin)

print("Confidence Interval:", confidence_interval(data))
```

---

### 10. **Margin of Error**
```python
def margin_of_error(data, confidence=0.95):
    sem = stats.sem(data)
    moe = sem * stats.t.ppf((1 + confidence) / 2, len(data) - 1)
    return moe

print("Margin of Error:", margin_of_error(data))
```

---

## üî∑ **Bayesian Inference**

### 11. **Bayes‚Äô Theorem in Python**
```python
def bayes_theorem(prior_A, likelihood_B_given_A, prior_not_A, likelihood_B_given_not_A):
    numerator = likelihood_B_given_A * prior_A
    denominator = numerator + likelihood_B_given_not_A * prior_not_A
    return numerator / denominator

posterior = bayes_theorem(0.01, 0.95, 0.99, 0.05)
print(f"Posterior Probability: {posterior}")
```

---

## üî∑ **Chi-Square Tests**

### 12. **Chi-square Test for Independence**
```python
import pandas as pd
from scipy.stats import chi2_contingency

data = pd.DataFrame([[10, 20], [20, 30]])
chi2, p, dof, expected = chi2_contingency(data)
print(f"Chi2: {chi2}, P-value: {p}")
```

---

### 13. **Expected Frequencies Calculation**
```python
print("Expected Frequencies:\n", expected)
```

---

### 14. **Chi-square Goodness of Fit**
```python
observed = [30, 14, 56]
expected = [33, 33, 33]

chi2, p = stats.chisquare(f_obs=observed, f_exp=expected)
print(f"Chi2: {chi2}, P-value: {p}")
```

---

### 15. **Visualize Chi-square Distribution**
```python
x = np.linspace(0, 20, 1000)
y = stats.chi2.pdf(x, df=4)

plt.plot(x, y)
plt.title('Chi-square Distribution (df=4)')
plt.show()
```

---

## üî∑ **F-Test & ANOVA**

### 16. **F-test for Variances**
```python
def f_test(var1, var2):
    f = np.var(var1, ddof=1) / np.var(var2, ddof=1)
    df1, df2 = len(var1) - 1, len(var2) - 1
    p = 1 - stats.f.cdf(f, df1, df2)
    return f, p

f_val, p_val = f_test(group1, group2)
print(f"F-value: {f_val}, P-value: {p_val}")
```

---

### 17. **One-Way ANOVA**
```python
from scipy.stats import f_oneway

groupA = np.random.normal(60, 5, 30)
groupB = np.random.normal(65, 5, 30)
groupC = np.random.normal(70, 5, 30)

f_stat, p = f_oneway(groupA, groupB, groupC)
print(f"F-statistic: {f_stat}, P-value: {p}")
```

---

### 18. **Assumption Check Function**
```python
from scipy.stats import shapiro, levene

def check_anova_assumptions(*groups):
    for i, group in enumerate(groups):
        stat, p = shapiro(group)
        print(f"Group {i+1} normality p-value: {p}")
    stat, p = levene(*groups)
    print(f"Levene‚Äôs test for equal variances p-value: {p}")

check_anova_assumptions(groupA, groupB, groupC)
```

---

### 19. **Two-Way ANOVA + Visualization**
```python
import statsmodels.api as sm
from statsmodels.formula.api import ols

df = pd.DataFrame({
    'score': np.random.normal(70, 10, 90),
    'group1': np.repeat(['A', 'B', 'C'], 30),
    'group2': ['X']*15 + ['Y']*15 + ['X']*15 + ['Y']*15 + ['X']*15 + ['Y']*15
})

model = ols('score ~ C(group1) + C(group2) + C(group1):C(group2)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
```

---

### 20. **Visualize F-Distribution**
```python
x = np.linspace(0, 5, 1000)
y = stats.f.pdf(x, dfn=3, dfd=20)

plt.plot(x, y)
plt.title('F-distribution (dfn=3, dfd=20)')
plt.show()
```

---

### 21. **One-Way ANOVA with Boxplots**
```python
import seaborn as sns

sns.boxplot(x='group1', y='score', data=df)
plt.title("Boxplot of Groups")
plt.show()
```

---

### 22. **Simulate Normal Data and Test Mean**
```python
data = np.random.normal(50, 10, 100)
t_stat, p_val = stats.ttest_1samp(data, popmean=50)
print(f"T-statistic: {t_stat}, P-value: {p_val}")
```

---

### 23. **Test for Population Variance (Chi-square)**
```python
sample_var = np.var(data, ddof=1)
n = len(data)
chi2 = (n - 1) * sample_var / 100  # Assuming œÉ¬≤ = 100
p_val = 1 - stats.chi2.cdf(chi2, df=n-1)
print(f"Chi2: {chi2}, P-value: {p_val}")
```

---

### 24. **Z-Test for Comparing Proportions**
```python
from statsmodels.stats.proportion import proportions_ztest

success = [30, 40]
nobs = [100, 100]

z_stat, p_val = proportions_ztest(success, nobs)
print(f"Z-statistic: {z_stat}, P-value: {p_val}")
```

---

### 25. **F-Test for Two Datasets**
Same as Task #16 ‚Äî Use `f_test()` function and visualize if needed.

---

### 26. **Chi-square Goodness-of-Fit with Simulation**
```python
observed = np.random.randint(20, 40, size=4)
expected = [30, 30, 30, 30]

chi2_stat, p_val = stats.chisquare(observed, expected)
print(f"Chi2: {chi2_stat}, P-value: {p_val}")

'''