### 1. **What is hypothesis testing in statistics?**  
Hypothesis testing is a statistical method to determine if there is enough evidence in a sample to infer that a certain condition is true for the entire population.

---

### 2. **What is the null hypothesis, and how does it differ from the alternative hypothesis?**  
- **Null Hypothesis (H₀):** Assumes no effect, difference, or relationship (e.g., "The treatment has no effect").  
- **Alternative Hypothesis (H₁):** Opposes H₀, suggesting an effect, difference, or relationship exists.  

---

### 3. **What is the significance level in hypothesis testing, and why is it important?**  
The **significance level (α)** is the threshold for rejecting the null hypothesis, typically set at 0.05 or 5%. It represents the probability of making a Type 1 error (rejecting a true null hypothesis).

---

### 4. **What does a P-value represent in hypothesis testing?**  
The **P-value** measures the probability of observing the test results, or more extreme results, under the assumption that the null hypothesis is true.

---

### 5. **How do you interpret the P-value in hypothesis testing?**  
- If **P-value ≤ α**, reject the null hypothesis (evidence supports H₁).  
- If **P-value > α**, fail to reject the null hypothesis (insufficient evidence to support H₁).

---

### 6. **What are Type 1 and Type 2 errors in hypothesis testing?**  
- **Type 1 Error:** Rejecting a true null hypothesis (false positive).  
- **Type 2 Error:** Failing to reject a false null hypothesis (false negative).

---

### 7. **What is the difference between a one-tailed and a two-tailed test?**  
- **One-tailed test:** Tests for an effect in one direction (e.g., "greater than").  
- **Two-tailed test:** Tests for an effect in both directions (e.g., "not equal to").

---

### 8. **What is the Z-test, and when is it used?**  
A **Z-test** is used to test hypotheses about population means when the sample size is large (n ≥ 30) or the population standard deviation is known.

---

### 9. **How do you calculate the Z-score, and what does it represent?**  
**Z-score = (Sample Mean - Population Mean) / (Standard Error)**  
It measures how many standard deviations a data point is from the population mean.

---

### 10. **What is the T-distribution, and when should it be used instead of the Z-distribution?**  
The **T-distribution** is used when the sample size is small (n < 30) and the population standard deviation is unknown.

---

### 11. **What is the difference between a Z-test and a T-test?**  
- **Z-test:** Requires a known population standard deviation; used for large samples.  
- **T-test:** Does not require the population standard deviation; suitable for small samples.

---

### 12. **What is the T-test, and how is it used?**  
The **T-test** compares sample means to determine if there is a significant difference between them. It is commonly used for small sample sizes.

---

### 13. **What is the relationship between the Z-test and T-test?**  
Both are used to test hypotheses about means. The Z-test assumes a known population standard deviation, while the T-test uses the sample standard deviation.

---

### 14. **What is a confidence interval, and how is it used?**  
A **confidence interval (CI)** is a range of values within which the true population parameter is likely to fall, based on a specified confidence level (e.g., 95%).

---

### 15. **What is the margin of error, and how does it affect the confidence interval?**  
The **margin of error** represents the range of uncertainty in an estimate. A smaller margin of error leads to a narrower confidence interval.

---

### 16. **How is Bayes' Theorem used in statistics, and what is its significance?**  
Bayes' Theorem calculates the probability of an event based on prior knowledge. It’s significant in updating probabilities with new evidence.

---

### 17. **What is the Chi-square distribution, and when is it used?**  
The **Chi-square distribution** is used in tests for categorical data, such as goodness of fit or independence tests.

---

### 18. **What is the Chi-square goodness-of-fit test, and how is it applied?**  
This test evaluates whether observed frequencies match expected frequencies in one categorical variable.

---

### 19. **What is the F-distribution, and when is it used in hypothesis testing?**  
The **F-distribution** is used in tests comparing variances (e.g., ANOVA, F-tests). It is asymmetric and depends on degrees of freedom.

---

### 20. **What is an ANOVA test, and what are its assumptions?**  
**ANOVA (Analysis of Variance)** compares means across multiple groups.  
**Assumptions:**  
1. Independence of observations.  
2. Normal distribution of data.  
3. Homogeneity of variances.

---

### 21. **What are the different types of ANOVA tests?**  
1. **One-way ANOVA:** Tests one independent variable.  
2. **Two-way ANOVA:** Tests two independent variables.  
3. **Repeated measures ANOVA:** Tests the same subjects under different conditions.

---

### 22. **What is the F-test, and how does it relate to hypothesis testing?**  
The **F-test** compares variances between groups. It is often a preliminary step in ANOVA or regression analysis.

In [None]:
import scipy.stats as stats
import numpy as np

# Sample data
sample = [2.1, 2.5, 2.9, 3.0, 2.7, 2.6]
population_mean = 2.5
population_std = 0.5
n = len(sample)

# Calculate the Z-statistic
sample_mean = np.mean(sample)
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(n))

# Calculate P-value
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

# Results
print(f"Z-score: {z_score}")
print(f"P-value: {p_value}")

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: Significant difference between sample and population mean.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")


In [None]:
np.random.seed(42)  # For reproducibility
data = np.random.normal(50, 10, 100)  # Simulate data with mean=50, std=10, size=100
sample_mean = np.mean(data)
population_mean = 50
population_std = 10

z_score = (sample_mean - population_mean) / (population_std / np.sqrt(len(data)))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

print(f"Sample Mean: {sample_mean}, Z-score: {z_score}, P-value: {p_value}")


In [None]:
def one_sample_z_test(sample, population_mean, population_std):
    n = len(sample)
    sample_mean = np.mean(sample)
    z_score = (sample_mean - population_mean) / (population_std / np.sqrt(n))
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))
    return z_score, p_value

sample = [10, 12, 11, 14, 13, 15]
z, p = one_sample_z_test(sample, 12, 2)

print(f"Z-score: {z}, P-value: {p}")


In [None]:
import matplotlib.pyplot as plt

# Decision regions for a two-tailed Z-test
x = np.linspace(-4, 4, 1000)
critical_value = stats.norm.ppf(1 - 0.05/2)  # 95% confidence level
plt.plot(x, stats.norm.pdf(x), label="Standard Normal Curve")

# Highlight critical regions
plt.fill_between(x, 0, stats.norm.pdf(x), where=(x <= -critical_value) | (x >= critical_value), color="red", alpha=0.3)

plt.axvline(-critical_value, color="red", linestyle="--", label="Critical value (-1.96)")
plt.axvline(critical_value, color="red", linestyle="--", label="Critical value (1.96)")
plt.title("Two-tailed Z-test Decision Regions")
plt.legend()
plt.show()


In [None]:
def visualize_errors(alpha, beta):
    x = np.linspace(-4, 4, 1000)
    plt.plot(x, stats.norm.pdf(x), label="Null Distribution (H0)")
    plt.plot(x, stats.norm.pdf(x - 2), label="Alternative Distribution (H1)")

    # Type I error (alpha)
    plt.fill_between(x, 0, stats.norm.pdf(x), where=(x > stats.norm.ppf(1 - alpha)), color="red", alpha=0.3, label="Type I Error (α)")

    # Type II error (beta)
    plt.fill_between(x, 0, stats.norm.pdf(x - 2), where=(x < stats.norm.ppf(1 - alpha)), color="blue", alpha=0.3, label="Type II Error (β)")

    plt.legend()
    plt.title("Type I and Type II Errors")
    plt.show()

visualize_errors(alpha=0.05, beta=0.2)


In [None]:
# Independent T-test
group1 = [10, 12, 14, 16, 18]
group2 = [8, 11, 13, 15, 17]

t_stat, p_value = stats.ttest_ind(group1, group2)

print(f"T-statistic: {t_stat}, P-value: {p_value}")


In [None]:
# Paired T-test
before = [20, 25, 30, 35, 40]
after = [22, 27, 29, 36, 39]

t_stat, p_value = stats.ttest_rel(before, after)
print(f"T-statistic: {t_stat}, P-value: {p_value}")


In [None]:
np.random.seed(42)
sample_data = np.random.normal(50, 10, 15)

# Z-test
z_score = (np.mean(sample_data) - 50) / (10 / np.sqrt(len(sample_data)))
z_p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

# T-test
t_stat, t_p_value = stats.ttest_1samp(sample_data, 50)

print(f"Z-test: Z-score={z_score}, P-value={z_p_value}")
print(f"T-test: T-stat={t_stat}, P-value={t_p_value}")


In [None]:
def confidence_interval(data, confidence=0.95):
    mean = np.mean(data)
    std_err = stats.sem(data)
    margin = std_err * stats.t.ppf((1 + confidence) / 2, df=len(data) - 1)
    return mean - margin, mean + margin

sample_data = [12, 14, 16, 18, 20]
ci = confidence_interval(sample_data)

print(f"Confidence Interval: {ci}")


In [None]:
import scipy.stats as stats
import numpy as np

def calculate_margin_of_error(data, confidence_level=0.95):
    n = len(data)
    mean = np.mean(data)
    std_err = stats.sem(data)  # Standard error
    z_score = stats.norm.ppf((1 + confidence_level) / 2)  # Z-value for confidence level
    margin_of_error = z_score * std_err
    return margin_of_error

sample_data = [12, 14, 15, 16, 17, 18, 19]
moe = calculate_margin_of_error(sample_data)
print(f"Margin of Error: {moe}")


In [None]:
def bayes_theorem(prior, likelihood, marginal):
    posterior = (likelihood * prior) / marginal
    return posterior

# Example
prior = 0.2  # P(A)
likelihood = 0.8  # P(B|A)
marginal = 0.5  # P(B)

posterior = bayes_theorem(prior, likelihood, marginal)
print(f"Posterior Probability: {posterior}")


In [None]:
from scipy.stats import chi2_contingency

# Contingency table
data = [[10, 20, 30], [6, 9, 17]]
chi2, p, dof, expected = chi2_contingency(data)

print(f"Chi2 Statistic: {chi2}, P-value: {p}")
print(f"Degrees of Freedom: {dof}")
print(f"Expected Frequencies:\n{expected}")


In [None]:
def calculate_expected_frequencies(observed):
    row_sums = np.sum(observed, axis=1)
    col_sums = np.sum(observed, axis=0)
    total = np.sum(observed)
    expected = np.outer(row_sums, col_sums) / total
    return expected

observed_data = np.array([[10, 20], [30, 40]])
expected_freqs = calculate_expected_frequencies(observed_data)
print(f"Expected Frequencies:\n{expected_freqs}")


In [None]:
from scipy import stats

observed = [16, 18, 16, 14, 12, 12]
expected_mean = 15

t_stat, p_value = stats.ttest_1samp(observed, popmean=expected_mean)
print(f"t-statistic: {t_stat}, P-value: {p_value}")

In [None]:
import matplotlib.pyplot as plt

df = 5  # Degrees of freedom
x = np.linspace(0, 20, 500)
y = stats.chi2.pdf(x, df)

plt.plot(x, y, label=f"Chi-square (df={df})")
plt.title("Chi-square Distribution")
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend()
plt.show()


In [None]:
def f_test(sample1, sample2):
    f_stat = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
    dfn = len(sample1) - 1
    dfd = len(sample2) - 1
    p_value = 1 - stats.f.cdf(f_stat, dfn, dfd)
    return f_stat, p_value

sample1 = [10, 12, 14, 16, 18]
sample2 = [9, 11, 13, 15, 17]
f_stat, p = f_test(sample1, sample2)

print(f"F-statistic: {f_stat}, P-value: {p}")


In [None]:
from scipy.stats import f_oneway

group1 = [12, 14, 16, 18, 20]
group2 = [22, 24, 26, 28, 30]
group3 = [32, 34, 36, 38, 40]

f_stat, p_value = f_oneway(group1, group2, group3)
print(f"F-statistic: {f_stat}, P-value: {p_value}")


In [None]:
import seaborn as sns
import pandas as pd

data = {'Group': ['A'] * 5 + ['B'] * 5 + ['C'] * 5,
        'Value': [12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40]}
df = pd.DataFrame(data)

sns.boxplot(x='Group', y='Value', data=df)
plt.title("One-way ANOVA Visualization")
plt.show()


In [None]:
from scipy.stats import shapiro, levene

def check_anova_assumptions(groups):
    # Normality
    for i, group in enumerate(groups):
        stat, p = shapiro(group)
        print(f"Group {i+1} Normality Test: P-value={p}")

    # Equal Variance
    stat, p = levene(*groups)
    print(f"Levene's Test for Equal Variance: P-value={p}")

groups = [[12, 14, 16], [22, 24, 26], [32, 34, 36]]
check_anova_assumptions(groups)


In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

# Example Data: Two factors - 'Factor1' and 'Factor2', dependent variable 'Value'
data = {'Factor1': ['A', 'A', 'B', 'B', 'C', 'C'],
        'Factor2': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
        'Value': [12, 14, 16, 18, 20, 22]}

df = pd.DataFrame(data)

# Two-way ANOVA model
model = ols('Value ~ C(Factor1) + C(Factor2) + C(Factor1):C(Factor2)', data=df).fit()
anova_table = sm.stats.anova_lm(model)
print(anova_table)


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Degrees of freedom
dfn = 5  # numerator
dfd = 2  # denominator

# Generate x values
x = np.linspace(0, 5, 500)
y = stats.f.pdf(x, dfn, dfd)

# Plot the F-distribution
plt.plot(x, y, label=f"F-distribution (dfn={dfn}, dfd={dfd})")
plt.title("F-distribution")
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend()
plt.show()


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Sample Data for One-way ANOVA
group1 = [12, 14, 16, 18, 20]
group2 = [22, 24, 26, 28, 30]
group3 = [32, 34, 36, 38, 40]

data = {'Group': ['A']*5 + ['B']*5 + ['C']*5, 'Value': group1 + group2 + group3}
df = pd.DataFrame(data)

# One-way ANOVA Test
sns.boxplot(x='Group', y='Value', data=df)
plt.title("One-way ANOVA Test Visualization")
plt.show()


In [None]:
import numpy as np
from scipy import stats

# Simulate random data
np.random.seed(0)
sample1 = np.random.normal(50, 10, 100)  # Mean 50, SD 10, 100 samples
sample2 = np.random.normal(55, 10, 100)  # Mean 55, SD 10, 100 samples

# Perform Two-sample T-test
t_stat, p_value = stats.ttest_ind(sample1, sample2)
print(f"T-statistic: {t_stat}, P-value: {p_value}")


In [None]:
from scipy.stats import chi2

# Sample data and variance
sample_data = [10, 12, 14, 16, 18]
sample_variance = np.var(sample_data, ddof=1)
n = len(sample_data)
alpha = 0.05

# Chi-square test statistic
chi2_stat = (n - 1) * sample_variance / 25  # Assume population variance = 25
df = n - 1

# Critical value from Chi-square distribution
chi2_critical = chi2.ppf(1 - alpha / 2, df)

print(f"Chi2 Statistic: {chi2_stat}, Critical value: {chi2_critical}")


In [None]:
import statsmodels.api as sm

# Data for two proportions
success1 = 60
total1 = 100
success2 = 50
total2 = 100

# Proportion 1 and 2
p1 = success1 / total1
p2 = success2 / total2

# Calculate Z-score for comparing proportions
z_stat = (p1 - p2) / np.sqrt(p1 * (1 - p1) / total1 + p2 * (1 - p2) / total2)
print(f"Z-statistic: {z_stat}")


In [None]:
from scipy.stats import f

# Sample data
sample1 = [10, 12, 14, 16, 18]
sample2 = [9, 11, 13, 15, 17]

# Calculate F-statistic
f_stat = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
dfn = len(sample1) - 1
dfd = len(sample2) - 1

# P-value for the F-test
p_value = 1 - f.cdf(f_stat, dfn, dfd)
print(f"F-statistic: {f_stat}, P-value: {p_value}")


In [None]:
from scipy.stats import chisquare

# Simulated observed data (e.g., dice rolls)
observed = [10, 15, 20, 25, 30]
expected = [20, 20, 20, 20, 20]

# Perform Chi-square goodness of fit test
chi2_stat, p_value = chisquare(observed, expected)
print(f"Chi2 Statistic: {chi2_stat}, P-value: {p_value}")
