<a href="https://colab.research.google.com/github/Guneeshkatyal/Statistics-Assignment/blob/main/A_statistics2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:

## 1. What is hypothesis testing in statistics?
Hypothesis testing is a statistical method used to make decisions about a population based on sample data. It involves testing an assumption (hypothesis) using statistical techniques to determine whether there is enough evidence to reject it.

## 2. What is the null hypothesis, and how does it differ from the alternative hypothesis?
- The **null hypothesis (H0)** states that there is no effect or no difference in a population parameter.
- The **alternative hypothesis (H1)** is the opposite of the null hypothesis and suggests that there is a statistically significant effect or difference.

## 3. What is the significance level in hypothesis testing, and why is it important?
The significance level (denoted as **alpha, α**) represents the probability of rejecting the null hypothesis when it is actually true. A common value is 0.05, meaning a 5% risk of making a Type I error.

## 4. What does a P-value represent in hypothesis testing?
The **P-value** is the probability of obtaining the observed results, or more extreme results, if the null hypothesis is true. A smaller P-value indicates stronger evidence against H0.

## 5. How do you interpret the P-value in hypothesis testing?
- If P-value < α: Reject the null hypothesis (significant result).
- If P-value ≥ α: Fail to reject the null hypothesis (insufficient evidence).

## 6. What are Type 1 and Type 2 errors in hypothesis testing?
- **Type I error**: Rejecting H0 when it is actually true (false positive).
- **Type II error**: Failing to reject H0 when it is actually false (false negative).

## 7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?
- **One-tailed test**: Tests for an effect in only one direction (greater or smaller).
- **Two-tailed test**: Tests for an effect in both directions (greater or smaller).

## 8. What is the Z-test, and when is it used in hypothesis testing?
A **Z-test** is used to compare sample and population means when the population variance is known and the sample size is large (n > 30).

## 9. How do you calculate the Z-score, and what does it represent in hypothesis testing?
Z-score formula:
\[ Z = \frac{(X - \mu)}{\sigma} \]
where X = sample mean, μ = population mean, σ = standard deviation.
It measures how many standard deviations an observation is from the mean.

## 10. What is the T-distribution, and when should it be used instead of the normal distribution?
The **T-distribution** is used instead of the normal distribution when the sample size is small (n < 30) and the population variance is unknown.

## 11. What is the difference between a Z-test and a T-test?
- **Z-test** is used when the population variance is known and the sample size is large.
- **T-test** is used when the population variance is unknown and the sample size is small.

## 12. What is the T-test, and how is it used in hypothesis testing?
A **T-test** compares sample means to determine if they are significantly different from each other.

## 13. What is the relationship between Z-test and T-test in hypothesis testing?
Both tests compare means, but the T-test is used when population variance is unknown, while the Z-test is used when it is known.

## 14. What is a confidence interval, and how is it used to interpret statistical results?
A **confidence interval (CI)** is a range of values within which a population parameter is expected to fall with a certain probability (e.g., 95% CI).

## 15. What is the margin of error, and how does it affect the confidence interval?
The **margin of error** represents the maximum expected difference between the observed and true population parameter. A larger margin results in a wider confidence interval.

## 16. How is Bayes' Theorem used in statistics, and what is its significance?
**Bayes' Theorem** updates probabilities based on prior knowledge and new evidence. It is used in Bayesian inference for updating beliefs.

## 17. What is the Chi-square distribution, and when is it used?
The **Chi-square distribution** is used for hypothesis tests involving categorical data, such as goodness-of-fit and independence tests.

## 18. What is the Chi-square goodness-of-fit test, and how is it applied?
It determines if an observed categorical data distribution matches an expected distribution.

## 19. What is the F-distribution, and when is it used in hypothesis testing?
The **F-distribution** is used in tests comparing variances, such as ANOVA and F-tests for variance equality.

## 20. What is an ANOVA test, and what are its assumptions?
**ANOVA (Analysis of Variance)** tests for differences in means across multiple groups. Assumptions include normality, independence, and equal variance.

## 21. What are the different types of ANOVA tests?
- **One-way ANOVA**: Tests for mean differences in one factor.
- **Two-way ANOVA**: Tests for mean differences in two factors.

## 22. What is the F-test, and how does it relate to hypothesis testing?
An **F-test** is used to compare the variances of two populations and is crucial in ANOVA.


In [None]:

## 1. Python program to perform a Z-test
```python
import scipy.stats as stats
import numpy as np

sample = [50, 52, 53, 49, 51]
pop_mean = 50
pop_std = 2

z_score, p_value = stats.ttest_1samp(sample, pop_mean)
print(f"Z-score: {z_score}, P-value: {p_value}")
```

## 2. Simulate random data and calculate P-value
```python
np.random.seed(42)
data = np.random.normal(50, 10, 100)
z_score, p_value = stats.ttest_1samp(data, 50)
print(f"P-value: {p_value}")
```
Here's your assignment with all the questions answered in a structured format:

---


## **3. Implement a one-sample Z-test using Python to compare the sample mean with the population mean**

### **Solution:**
A one-sample Z-test is used when we want to compare the mean of a sample to a known population mean, assuming that the population standard deviation is known.

```python
import numpy as np
from scipy.stats import norm

def one_sample_z_test(sample, pop_mean, pop_std):
    sample_mean = np.mean(sample)
    n = len(sample)
    z_score = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
    p_value = 2 * (1 - norm.cdf(abs(z_score)))

    return z_score, p_value

# Example
sample = [50, 52, 53, 48, 47, 51, 49, 52]
pop_mean = 50
pop_std = 3

z_score, p_value = one_sample_z_test(sample, pop_mean, pop_std)
print(f"Z-score: {z_score}, P-value: {p_value}")
```

Interpretation:
- If p-value < significance level (e.g., 0.05), we reject the null hypothesis that the sample mean is equal to the population mean.

---

## **4. Perform a two-tailed Z-test using Python and visualize the decision region on a plot**

### **Solution:**
A two-tailed Z-test checks if the sample mean is significantly different from the population mean in both directions.

```python
import matplotlib.pyplot as plt
import seaborn as sns

def plot_z_test(alpha=0.05):
    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x, 0, 1)

    critical = norm.ppf(1 - alpha / 2)
    plt.figure(figsize=(8,5))
    sns.lineplot(x, y)

    plt.fill_between(x, y, where=(x <= -critical) | (x >= critical), color='red', alpha=0.5, label="Rejection Region")
    plt.fill_between(x, y, where=(x > -critical) & (x < critical), color='blue', alpha=0.5, label="Acceptance Region")

    plt.axvline(-critical, linestyle='--', color='black')
    plt.axvline(critical, linestyle='--', color='black')
    plt.legend()
    plt.title("Two-Tailed Z-Test Decision Region")
    plt.xlabel("Z-score")
    plt.ylabel("Probability Density")
    plt.show()

plot_z_test()
```

Interpretation:
- The red areas represent the rejection regions where we reject the null hypothesis.
- If the computed Z-score falls in these regions, the sample mean is significantly different.

---

## **5. Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing**

### **Solution:**
Type 1 error occurs when we reject a true null hypothesis, and Type 2 error occurs when we fail to reject a false null hypothesis.

```python
def plot_type1_type2(mu0=0, mu1=2, sigma=1, alpha=0.05):
    x = np.linspace(-4, 6, 1000)
    y_H0 = norm.pdf(x, mu0, sigma)
    y_H1 = norm.pdf(x, mu1, sigma)

    critical_value = norm.ppf(1 - alpha)

    plt.figure(figsize=(8,5))
    sns.lineplot(x, y_H0, label="H0 (Null Distribution)")
    sns.lineplot(x, y_H1, label="H1 (Alternative Distribution)")

    plt.fill_between(x, y_H0, where=(x > critical_value), color='red', alpha=0.5, label="Type I Error (α)")
    plt.fill_between(x, y_H1, where=(x < critical_value), color='blue', alpha=0.5, label="Type II Error (β)")

    plt.axvline(critical_value, linestyle='--', color='black')
    plt.legend()
    plt.title("Type 1 and Type 2 Errors")
    plt.xlabel("Test Statistic")
    plt.ylabel("Probability Density")
    plt.show()

plot_type1_type2()
```

Interpretation:
- The red area represents **Type 1 error** (rejecting a true null hypothesis).
- The blue area represents **Type 2 error** (failing to reject a false null hypothesis).

---

## **6. Write a Python program to perform an independent T-test and interpret the results**

### **Solution:**
An independent T-test compares the means of two independent groups.

```python
from scipy.stats import ttest_ind

group1 = [50, 52, 53, 48, 47, 51, 49, 52]
group2 = [55, 57, 58, 54, 53, 56, 55, 58]

t_stat, p_value = ttest_ind(group1, group2)
print(f"T-statistic: {t_stat}, P-value: {p_value}")
```

Interpretation:
- If p-value < 0.05, we reject the null hypothesis that the two groups have the same mean.

---

## **7. Perform a paired sample T-test using Python and visualize the comparison results**

### **Solution:**
A paired T-test is used for dependent samples (e.g., before and after measurements).

```python
from scipy.stats import ttest_rel

before = [50, 52, 53, 48, 47, 51, 49, 52]
after = [55, 57, 58, 54, 53, 56, 55, 58]

t_stat, p_value = ttest_rel(before, after)
print(f"Paired T-test: T-statistic: {t_stat}, P-value: {p_value}")
```

Interpretation:
- If p-value < 0.05, the treatment had a significant effect.

---

## **8. Simulate data and perform both Z-test and T-test, then compare the results using Python**

```python
import scipy.stats as stats

np.random.seed(42)
sample = np.random.normal(loc=50, scale=5, size=30)
pop_mean = 50
pop_std = 5

z_stat = (np.mean(sample) - pop_mean) / (pop_std / np.sqrt(len(sample)))
z_p_value = 2 * (1 - norm.cdf(abs(z_stat)))

t_stat, t_p_value = stats.ttest_1samp(sample, pop_mean)

print(f"Z-test: Z-stat={z_stat}, p-value={z_p_value}")
print(f"T-test: T-stat={t_stat}, p-value={t_p_value}")
```

---

## **9. Write a Python function to calculate the confidence interval for a sample mean and explain its significance.**

```python
import scipy.stats as stats

def confidence_interval(sample, confidence=0.95):
    mean = np.mean(sample)
    sem = stats.sem(sample)
    margin = sem * stats.t.ppf((1 + confidence) / 2., len(sample)-1)

    return mean - margin, mean + margin

sample = np.random.normal(50, 5, 30)
ci = confidence_interval(sample)
print(f"95% Confidence Interval: {ci}")
```

# 10. Calculate margin of error for a given confidence level
def margin_of_error(sample, confidence=0.95):
    sem = stats.sem(sample)
    margin = sem * stats.t.ppf((1 + confidence) / 2., len(sample)-1)
    return margin

sample = np.random.normal(50, 5, 30)
print("Margin of Error:", margin_of_error(sample))

# 11. Implement Bayesian inference using Bayes' Theorem
def bayes_theorem(prior, likelihood, evidence):
    posterior = (likelihood * prior) / evidence
    return posterior

prior = 0.5
likelihood = 0.8
evidence = 0.6
print("Posterior Probability:", bayes_theorem(prior, likelihood, evidence))

# 12. Chi-square test for independence
def chi_square_test(data):
    chi2, p, dof, expected = stats.chi2_contingency(data)
    return chi2, p

data = np.array([[10, 20, 30], [6, 9, 17]])
print("Chi-Square Test:", chi_square_test(data))

# 13. Calculate expected frequencies for Chi-square test
def expected_frequencies(data):
    _, _, _, expected = stats.chi2_contingency(data)
    return expected

print("Expected Frequencies:", expected_frequencies(data))

# 14. Goodness-of-fit test
def goodness_of_fit_test(observed, expected):
    chi2, p = stats.chisquare(observed, expected)
    return chi2, p

observed = np.array([50, 30, 20])
expected = np.array([40, 40, 20])
print("Goodness of Fit Test:", goodness_of_fit_test(observed, expected))

# 15. Visualize Chi-square distribution
def plot_chi_square(df):
    x = np.linspace(0, 10, 100)
    y = stats.chi2.pdf(x, df)
    plt.plot(x, y, label=f'df={df}')
    plt.title("Chi-Square Distribution")
    plt.legend()
    plt.show()

plot_chi_square(3)

# 16. F-test for variance comparison
def f_test(sample1, sample2):
    f_stat = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
    p_value = 1 - stats.f.cdf(f_stat, len(sample1)-1, len(sample2)-1)
    return f_stat, p_value

sample1 = np.random.normal(50, 5, 30)
sample2 = np.random.normal(55, 7, 30)
print("F-Test:", f_test(sample1, sample2))

# 17. ANOVA test
def anova_test(*groups):
    f_stat, p_value = stats.f_oneway(*groups)
    return f_stat, p_value

group1 = np.random.normal(50, 5, 30)
group2 = np.random.normal(52, 5, 30)
group3 = np.random.normal(48, 5, 30)
print("ANOVA Test:", anova_test(group1, group2, group3))

# 18. One-way ANOVA with plot
def plot_oneway_anova(groups):
    plt.boxplot(groups, labels=["Group1", "Group2", "Group3"])
    plt.title("One-way ANOVA")
    plt.show()

plot_oneway_anova([group1, group2, group3])

# 19. Check ANOVA assumptions
def check_anova_assumptions(*groups):
    normality = [stats.shapiro(group)[1] > 0.05 for group in groups]
    homogeneity = stats.levene(*groups)[1] > 0.05
    return normality, homogeneity

print("ANOVA Assumptions:", check_anova_assumptions(group1, group2, group3))

# 20. Two-way ANOVA (using statsmodels)
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd

data = pd.DataFrame({
    "Factor1": np.repeat(["A", "B"], 30),
    "Factor2": np.tile(["X", "Y"], 30),
    "Value": np.random.normal(50, 5, 60)
})

model = smf.ols("Value ~ Factor1 + Factor2 + Factor1:Factor2", data=data).fit()
anova_table = sm.stats.anova_lm(model)
print("Two-way ANOVA:")
print(anova_table)

# 21. Visualize F-distribution
def plot_f_distribution(df1, df2):
    x = np.linspace(0, 5, 1000)
    y = stats.f.pdf(x, df1, df2)
    plt.plot(x, y)
    plt.title("F-Distribution")
    plt.show()

plot_f_distribution(5, 10)

# 22. One-way ANOVA with boxplots
def plot_anova_boxplot(*groups):
    plt.boxplot(groups, labels=[f'Group {i+1}' for i in range(len(groups))])
    plt.title("One-way ANOVA Boxplot")
    plt.show()

plot_anova_boxplot(group1, group2, group3)

# 23. Simulate normal data and perform hypothesis testing
simulated_data = np.random.normal(50, 5, 100)
t_stat, p_value = stats.ttest_1samp(simulated_data, 50)
print("T-test on simulated data:", t_stat, p_value)

# 24. Chi-square test for population variance
def chi_square_variance_test(sample, variance):
    chi2 = (len(sample)-1) * np.var(sample, ddof=1) / variance
    p_value = 1 - stats.chi2.cdf(chi2, len(sample)-1)
    return chi2, p_value

print("Chi-square Variance Test:", chi_square_variance_test(sample1, 25))

# 25. Z-test for proportions
def z_test_proportions(p1, n1, p2, n2):
    p_combined = (p1 * n1 + p2 * n2) / (n1 + n2)
    se = np.sqrt(p_combined * (1 - p_combined) * (1/n1 + 1/n2))
    z = (p1 - p2) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z)))
    return z, p_value

print("Z-test for proportions:", z_test_proportions(0.4, 100, 0.5, 100))

# 26. F-test for variances with visualization
plot_anova_boxplot(sample1, sample2)

# 27. Chi-square goodness of fit test
print("Chi-square Goodness of Fit:", goodness_of_fit_test(observed, expected))

