📘 Theory Questions
1. What is hypothesis testing in statistics?

Hypothesis testing is a statistical method used to make decisions or inferences about population parameters based on sample data.

2. What is the null hypothesis, and how does it differ from the alternative hypothesis?

The null hypothesis (H₀) assumes no effect or difference; the alternative hypothesis (H₁) contradicts H₀ and suggests an effect or difference exists.

3. What is the significance level in hypothesis testing, and why is it important?

It’s the probability of rejecting a true null hypothesis (Type I error), typically set at 0.05. It defines the threshold for statistical significance.

4. What does a P-value represent in hypothesis testing?

It is the probability of observing the test results under the null hypothesis.

5. How do you interpret the P-value in hypothesis testing?

If P-value < α, reject H₀; otherwise, fail to reject H₀.

6. What are Type 1 and Type 2 errors in hypothesis testing?

Type I error: Rejecting H₀ when it is true.

Type II error: Failing to reject H₀ when it is false.

7. What is the difference between a one-tailed and a two-tailed test?

One-tailed: Tests for effect in one direction.

Two-tailed: Tests for effects in both directions.

8. What is the Z-test, and when is it used?

Used when population variance is known and sample size is large (n > 30), to compare means or proportions.

9. How do you calculate the Z-score and what does it represent?
𝑍
=
𝑥
ˉ
−
𝜇
𝜎
/
𝑛
Z=
σ/
n
​

x
ˉ
 −μ
​

Represents how many standard deviations the sample mean is from the population mean.

10. What is the T-distribution, and when should it be used?

Used when the sample size is small and population standard deviation is unknown.

11. Difference between a Z-test and a T-test?

Z-test uses known population variance; T-test uses sample variance.

12. What is the T-test, and how is it used?

A T-test assesses whether the means of two groups are statistically different.

Z-test vs T-test in hypothesis testing:
Both compare means, but Z-test is for large samples with known variance; T-test is for smaller samples with unknown variance.

14. What is a confidence interval?

A range of values derived from sample data that is likely to contain the population parameter.

15. What is the margin of error?

The range added/subtracted to a point estimate in a confidence interval; increases with confidence level and decreases with larger samples.

16. Bayes’ Theorem in statistics:

Updates the probability of a hypothesis based on new evidence.
𝑃
(
𝐻
∣
𝐸
)
=
𝑃
(
𝐸
∣
𝐻
)
𝑃
(
𝐻
)
𝑃
(
𝐸
)
P(H∣E)=
P(E)
P(E∣H)P(H)
​


17. What is the Chi-square distribution?

A distribution used for categorical data to test goodness-of-fit or independence.

18. Chi-square goodness-of-fit test:

Compares observed vs expected frequencies to test distribution fit.

19. What is the F-distribution?

A right-skewed distribution used in ANOVA and variance comparison.

20. What is ANOVA and its assumptions?

ANOVA tests mean differences across groups. Assumptions: independence, normality, and equal variance.

21. Types of ANOVA tests:

One-way ANOVA

Two-way ANOVA

Repeated measures ANOVA

22. What is the F-test in hypothesis testing?

Compares variances; used in ANOVA.



In [None]:
1.'''Write a Python program to perform a Z-test for comparing a sample mean to a known population mean and
interpret the results@'''

import scipy.stats as stats
import numpy as np

def z_test(sample_data, population_mean, population_std, alpha=0.05):
    # Convert sample data to a NumPy array
    sample = np.array(sample_data)
    sample_mean = np.mean(sample)
    sample_size = len(sample)

    # Calculate the standard error
    standard_error = population_std / np.sqrt(sample_size)

    # Calculate the Z-score
    z_score = (sample_mean - population_mean) / standard_error

    # Calculate the p-value (two-tailed test)
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

    # Output
    print("Sample Mean:", sample_mean)
    print("Z-score:", z_score)
    print("P-value:", p_value)

    # Interpretation
    if p_value < alpha:
        print(f"Reject the null hypothesis at alpha = {alpha}")
        print("There is a statistically significant difference between the sample mean and the population mean.")
    else:
        print(f"Fail to reject the null hypothesis at alpha = {alpha}")
        print("There is no statistically significant difference between the sample mean and the population mean.")

# Example usage
sample_data = [50, 52, 47, 49, 53, 48, 50, 51, 49, 50]  # Example sample data
population_mean = 48                                   # Known population mean
population_std = 2.5                                    # Known population standard deviation

z_test(sample_data, population_mean, population_std)


In [None]:
2.''' Simulate random data to perform hypothesis testing and calculate the corresponding P-value using Python'''

import numpy as np
import scipy.stats as stats

def simulate_z_test(sample_size=30, population_mean=100, population_std=15, sample_mean_shift=0, alpha=0.05):
    # Simulate random sample from a normal distribution
    np.random.seed(42)  # For reproducibility
    sample = np.random.normal(loc=population_mean + sample_mean_shift, scale=population_std, size=sample_size)

    # Sample statistics
    sample_mean = np.mean(sample)
    standard_error = population_std / np.sqrt(sample_size)

    # Z-test calculation
    z_score = (sample_mean - population_mean) / standard_error
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))  # Two-tailed test

    # Results
    print("Sample Mean:", round(sample_mean, 2))
    print("Population Mean:", population_mean)
    print("Z-score:", round(z_score, 4))
    print("P-value:", round(p_value, 4))

    # Interpretation
    if p_value < alpha:
        print("Reject the null hypothesis.")
        print("The sample mean is significantly different from the population mean.")
    else:
        print("Fail to reject the null hypothesis.")
        print("No significant difference between the sample and population mean.")

# Example run
simulate_z_test(sample_size=50, population_mean=100, population_std=15, sample_mean_shift=5)


In [None]:
3.'''@ Implement a one-sample Z-test using Python to compare the sample mean with the population mean'''

import numpy as np
from scipy.stats import norm

def one_sample_z_test(sample_data, population_mean, population_std, alpha=0.05):
    """
    Perform a one-sample Z-test.

    Parameters:
    - sample_data (list or array): The sample observations.
    - population_mean (float): The known population mean.
    - population_std (float): The known population standard deviation.
    - alpha (float): Significance level (default 0.05).

    Returns:
    - z_score (float)
    - p_value (float)
    - test_result (str)
    """
    sample_data = np.array(sample_data)
    sample_mean = np.mean(sample_data)
    sample_size = len(sample_data)

    # Standard error
    standard_error = population_std / np.sqrt(sample_size)

    # Z-score calculation
    z_score = (sample_mean - population_mean) / standard_error

    # Two-tailed p-value
    p_value = 2 * (1 - norm.cdf(abs(z_score)))

    # Interpretation
    if p_value < alpha:
        test_result = "Reject the null hypothesis: sample mean is significantly different from the population mean."
    else:
        test_result = "Fail to reject the null hypothesis: no significant difference between sample and population mean."

    # Output
    print(f"Sample Mean: {sample_mean:.4f}")
    print(f"Z-Score: {z_score:.4f}")
    print(f"P-Value: {p_value:.4f}")
    print("Result:", test_result)

    return z_score, p_value, test_result

# Example usage
sample_data = [102, 100, 98, 101, 99, 97, 100, 103, 98, 100]
population_mean = 100
population_std = 2  # Known population standard deviation

one_sample_z_test(sample_data, population_mean, population_std)


In [None]:
4.'''@ Perform a two-tailed Z-test using Python and visualize the decision region on a plot@'''

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def two_tailed_z_test_with_plot(sample_data, population_mean, population_std, alpha=0.05):
    # Step 1: Calculate sample statistics
    sample_data = np.array(sample_data)
    sample_mean = np.mean(sample_data)
    n = len(sample_data)
    standard_error = population_std / np.sqrt(n)

    # Step 2: Calculate Z-score and p-value
    z_score = (sample_mean - population_mean) / standard_error
    p_value = 2 * (1 - norm.cdf(abs(z_score)))  # Two-tailed test

    # Step 3: Print results
    print(f"Sample Mean: {sample_mean:.2f}")
    print(f"Z-score: {z_score:.4f}")
    print(f"P-value: {p_value:.4f}")

    if p_value < alpha:
        print("Reject the null hypothesis.")
    else:
        print("Fail to reject the null hypothesis.")

    # Step 4: Plot the decision region
    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x, 0, 1)

    z_critical = norm.ppf(1 - alpha / 2)

    plt.figure(figsize=(10, 6))
    plt.plot(x, y, label='Standard Normal Distribution', color='blue')
    plt.fill_between(x, y, where=(x < -z_critical) | (x > z_critical), color='red', alpha=0.3, label='Rejection Region')
    plt.axvline(z_score, color='black', linestyle='--', label=f'Z-score = {z_score:.2f}')
    plt.axvline(-z_critical, color='red', linestyle='--', label=f'Critical Z = ±{z_critical:.2f}')
    plt.axvline(z_critical, color='red', linestyle='--')
    plt.title('Two-Tailed Z-Test')
    plt.xlabel('Z')
    plt.ylabel('Probability Density')
    plt.legend()
    plt.grid(True)
    plt.show()

# Example usage
sample_data = [102, 100, 98, 101, 99, 97, 100, 103, 98, 100]
population_mean = 100
population_std = 2  # Known population standard deviation

two_tailed_z_test_with_plot(sample_data, population_mean, population_std)


In [None]:
5.'''Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing'''

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def visualize_type1_type2_errors(mu0=100, mu1=105, sigma=10, n=30, alpha=0.05):
    """
    Visualizes Type I and Type II errors for a one-sample Z-test.

    Parameters:
    - mu0: Mean under null hypothesis (H0)
    - mu1: Mean under alternative hypothesis (H1)
    - sigma: Population standard deviation
    - n: Sample size
    - alpha: Significance level (Type I error rate)
    """
    # Standard error of the mean
    se = sigma / np.sqrt(n)

    # Critical value for right-tailed test
    z_crit = norm.ppf(1 - alpha)
    x_crit = mu0 + z_crit * se  # Critical sample mean under H0

    # Beta (Type II error): P(fail to reject H0 when H1 is true)
    z_beta = (x_crit - mu1) / se
    beta = norm.cdf(z_beta)

    # Plot range
    x = np.linspace(mu0 - 4 * se, mu1 + 4 * se, 1000)
    h0_pdf = norm.pdf(x, mu0, se)
    h1_pdf = norm.pdf(x, mu1, se)

    plt.figure(figsize=(10, 6))
    plt.plot(x, h0_pdf, label='H₀ Distribution (μ₀)', color='blue')
    plt.plot(x, h1_pdf, label='H₁ Distribution (μ₁)', color='green')

    # Shade Type I error region (alpha)
    x_alpha = np.linspace(x_crit, mu0 + 4 * se, 1000)
    plt.fill_between(x_alpha, norm.pdf(x_alpha, mu0, se), color='red', alpha=0.3, label='Type I Error (α)')

    # Shade Type II error region (beta)
    x_beta = np.linspace(mu0 - 4 * se, x_crit, 1000)
    plt.fill_between(x_beta, norm.pdf(x_beta, mu1, se), color='orange', alpha=0.4, label='Type II Error (β)')

    # Mark critical value
    plt.axvline(x_crit, color='black', linestyle='--', label=f'Critical Value = {x_crit:.2f}')

    # Labels and legend
    plt.title('Visualization of Type I and Type II Errors')
    plt.xlabel('Sample Mean')
    plt.ylabel('Probability Density')
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()

    print(f"Critical value (sample mean cutoff): {x_crit:.2f}")
    print(f"Type II Error (β): {beta:.4f}")
    print(f"Power of the test (1 - β): {1 - beta:.4f}")


In [None]:
6'''Write a Python program to perform an independent T-test and interpret the results'''

import numpy as np
from scipy import stats

def independent_t_test(sample1, sample2, alpha=0.05, equal_var=False):
    """
    Perform an independent two-sample t-test and interpret the results.

    Parameters:
    - sample1: List or array of values for group 1
    - sample2: List or array of values for group 2
    - alpha: Significance level (default = 0.05)
    - equal_var: Assume equal population variances (default = False)

    Returns:
    - t_stat: T-statistic value
    - p_value: P-value from the test
    - conclusion: Interpretation of the hypothesis test
    """

    # Convert to NumPy arrays
    sample1 = np.array(sample1)
    sample2 = np.array(sample2)

    # Perform the t-test
    t_stat, p_value = stats.ttest_ind(sample1, sample2, equal_var=equal_var)

    # Output results
    print(f"Sample 1 Mean: {np.mean(sample1):.2f}")
    print(f"Sample 2 Mean: {np.mean(sample2):.2f}")
    print(f"T-statistic: {t_stat:.4f}")
    print(f"P-value: {p_value:.4f}")

    # Interpretation
    if p_value < alpha:
        conclusion = "Reject the null hypothesis: There is a significant difference between the two groups."
    else:
        conclusion = "Fail to reject the null hypothesis: No significant difference between the two groups."

    print("Conclusion:", conclusion)

    return t_stat, p_value, conclusion

# Example usage
group1 = [22, 25, 19, 23, 20, 18, 24]
group2 = [28, 30, 27, 29, 31, 26, 32]

independent_t_test(group1, group2)


In [None]:
7.'''Perform a paired sample T-test using Python and visualize the comparison results'''
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_rel

def paired_t_test_with_plot(before, after, alpha=0.05):
    """
    Performs a paired sample t-test and visualizes the results.

    Parameters:
    - before: List or array of 'before' measurements.
    - after: List or array of 'after' measurements.
    - alpha: Significance level (default = 0.05).
    """
    before = np.array(before)
    after = np.array(after)
    differences = after - before

    # Paired t-test
    t_stat, p_value = ttest_rel(before, after)

    # Output results
    print(f"Mean (Before): {np.mean(before):.2f}")
    print(f"Mean (After): {np.mean(after):.2f}")
    print(f"Mean Difference: {np.mean(differences):.2f}")
    print(f"T-statistic: {t_stat:.4f}")
    print(f"P-value: {p_value:.4f}")

    if p_value < alpha:
        print("Reject the null hypothesis: There is a significant difference between 'before' and 'after'.")
    else:
        print("Fail to reject the null hypothesis: No significant difference found.")

    # Visualization: Before vs After
    plt.figure(figsize=(12, 5))

    plt.subplot(1, 2, 1)
    plt.plot(before, label='Before', marker='o')
    plt.plot(after, label='After', marker='o')
    plt.title('Before vs After')
    plt.xlabel('Subject Index')
    plt.ylabel('Measurement')
    plt.legend()
    plt.grid(True)

    # Visualization: Differences
    plt.subplot(1, 2, 2)
    plt.hist(differences, bins=10, color='skyblue', edgecolor='black')
    plt.axvline(np.mean(differences), color='red', linestyle='--', label='Mean Difference')
    plt.title('Distribution of Paired Differences (After - Before)')
    plt.xlabel('Difference')
    plt.ylabel('Frequency')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

# Example usage
before_scores = [72, 75, 78, 71, 69, 74, 70, 68, 73, 76]
after_scores  = [75, 78, 80, 74, 72, 76, 73, 70, 76, 79]

paired_t_test_with_plot(before_scores, after_scores)


In [None]:
8.'''Simulate data and perform both Z-test and T-test, then compare the results using Python'''

import numpy as np
from scipy import stats

# Simulate data
np.random.seed(42)
sample_size = 40
population_mean = 50
population_std = 10  # Known population std dev for Z-test

# Generate a sample with a true mean around 52 (slightly different from 50)
sample = np.random.normal(loc=52, scale=population_std, size=sample_size)

# 1. Z-test (one sample)
# Z = (sample_mean - population_mean) / (population_std / sqrt(n))
sample_mean = np.mean(sample)
z_stat = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
# two-tailed p-value for Z-test
z_p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

# 2. T-test (one sample)
# Use scipy ttest_1samp, population mean = 50
t_stat, t_p_value = stats.ttest_1samp(sample, population_mean)

# Print results
print(f"Sample Mean: {sample_mean:.4f}")
print(f"Z-test statistic: {z_stat:.4f}, p-value: {z_p_value:.4f}")
print(f"T-test statistic: {t_stat:.4f}, p-value: {t_p_value:.4f}")


In [None]:
9'''Write a Python function to calculate the confidence interval for a sample mean and explain its significance'''

import numpy as np
from scipy import stats

def confidence_interval(data, confidence=0.95):
    """
    Calculate the confidence interval for the mean of a sample.

    Parameters:
    - data: list or numpy array of sample observations
    - confidence: confidence level (default 0.95 for 95% confidence)

    Returns:
    - (lower_bound, upper_bound): tuple with confidence interval limits
    """
    n = len(data)
    mean = np.mean(data)
    sem = stats.sem(data)  # standard error of the mean

    # t critical value for two-tailed test
    t_crit = stats.t.ppf((1 + confidence) / 2, df=n-1)

    margin_of_error = t_crit * sem
    lower_bound = mean - margin_of_error
    upper_bound = mean + margin_of_error

    return lower_bound, upper_bound


In [None]:
10'''Write a Python program to calculate the margin of error for a given confidence level using sample data'''

import numpy as np
from scipy import stats

def margin_of_error(data, confidence=0.95):
    """
    Calculate the margin of error for the sample mean at a specified confidence level.

    Parameters:
    - data: list or numpy array of sample observations
    - confidence: confidence level (default is 0.95 for 95%)

    Returns:
    - margin_of_error: the margin of error for the mean
    """
    n = len(data)
    sem = stats.sem(data)  # standard error of the mean

    # Get t critical value for two-tailed test
    t_crit = stats.t.ppf((1 + confidence) / 2, df=n-1)

    return t_crit * sem

# Example usage:
if __name__ == "__main__":
    # Sample data
    sample_data = [12, 15, 14, 10, 13, 15, 16, 14, 13, 12]
    conf_level = 0.95  # 95% confidence

    me = margin_of_error(sample_data, conf_level)
    print(f"Margin of Error at {conf_level*100}% confidence: {me:.4f}")


In [None]:
11'''Implement a Bayesian inference method using Bayes' Theorem in Python and explain the process'''

def bayes_theorem(P_disease, P_pos_given_disease, P_pos_given_no_disease):
    """
    Calculate posterior probability P(Disease | Positive) using Bayes' Theorem.

    Parameters:
    - P_disease: Prior probability of disease
    - P_pos_given_disease: Probability of positive test given disease (True Positive rate)
    - P_pos_given_no_disease: Probability of positive test given no disease (False Positive rate)

    Returns:
    - Posterior probability P(Disease | Positive)
    """
    P_no_disease = 1 - P_disease

    # Total probability of positive test
    P_positive = (P_pos_given_disease * P_disease) + (P_pos_given_no_disease * P_no_disease)

    # Bayes theorem
    P_disease_given_positive = (P_pos_given_disease * P_disease) / P_positive

    return P_disease_given_positive


# Parameters:
P_disease = 0.005  # 0.5% prevalence
P_pos_given_disease = 0.99  # True positive rate
P_pos_given_no_disease = 0.01  # False positive rate (1 - specificity)

posterior_prob = bayes_theorem(P_disease, P_pos_given_disease, P_pos_given_no_disease)

print(f"Probability of having the disease given a positive test: {posterior_prob:.4f} ({posterior_prob*100:.2f}%)")


In [None]:
12''' Perform a Chi-square test for independence between two categorical variables in Python'''

import numpy as np
from scipy.stats import chi2_contingency

# Example contingency table
# Rows: Gender (Male, Female)
# Columns: Preference (Product A, Product B)
# Values: Counts of observations
contingency_table = np.array([[30, 10],   # Male
                              [20, 40]])  # Female

# Perform Chi-square test
chi2, p, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"P-value: {p:.4f}")
print("Expected frequencies if independent:")
print(expected)

# Interpretation
alpha = 0.05
if p < alpha:
    print("Reject null hypothesis: Variables are dependent (not independent).")
else:
    print("Fail to reject null hypothesis: No evidence variables are dependent.")


In [None]:
13''' Write a Python program to calculate the expected frequencies for a Chi-square test based on observed
data'''

import numpy as np

def expected_frequencies(observed):
    """
    Calculate expected frequencies for a contingency table.

    Parameters:
    - observed: 2D numpy array of observed frequencies

    Returns:
    - expected: 2D numpy array of expected frequencies
    """
    row_totals = observed.sum(axis=1, keepdims=True)
    col_totals = observed.sum(axis=0, keepdims=True)
    grand_total = observed.sum()

    expected = row_totals @ col_totals / grand_total
    return expected

# Example observed data
observed_data = np.array([[30, 10],
                          [20, 40]])

expected = expected_frequencies(observed_data)
print("Observed frequencies:")
print(observed_data)
print("\nExpected frequencies:")
print(expected)


In [None]:
14''' Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution'''

from scipy.stats import chisquare
import numpy as np

# Observed counts (e.g., counts of categories)
observed = np.array([50, 30, 20])

# Expected proportions or counts
# If proportions, multiply by total to get expected counts
expected_proportions = np.array([0.4, 0.4, 0.2])
expected = expected_proportions * observed.sum()

# Perform Chi-square goodness-of-fit test
chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

print(f"Chi-square statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print("Reject null hypothesis: Observed data does NOT fit the expected distribution.")
else:
    print("Fail to reject null hypothesis: Observed data fits the expected distribution.")



In [None]:
15'''Create a Python script to simulate and visualize the Chi-square distribution and discuss its characteristics'''

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Parameters
df = 4  # degrees of freedom
sample_size = 10000

# Simulate data from Chi-square distribution
data = np.random.chisquare(df, size=sample_size)

# Plot histogram of simulated data
plt.hist(data, bins=50, density=True, alpha=0.6, color='skyblue', label='Simulated data')

# Plot theoretical Chi-square PDF
x = np.linspace(0, np.max(data), 1000)
pdf = chi2.pdf(x, df)
plt.plot(x, pdf, 'r-', lw=2, label=f'Chi-square PDF (df={df})')

# Labels and title
plt.title('Chi-square Distribution Simulation')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
16'''Implement an F-test using Python to compare the variances of two random samples'''

import numpy as np
from scipy.stats import f

def f_test(sample1, sample2, alpha=0.05):
    """
    Perform an F-test to compare variances of two samples.

    Parameters:
    - sample1, sample2: arrays or lists of sample data
    - alpha: significance level (default 0.05)

    Returns:
    - F statistic, p-value, conclusion string
    """
    var1 = np.var(sample1, ddof=1)
    var2 = np.var(sample2, ddof=1)

    # Arrange so that var1 >= var2
    if var1 < var2:
        var1, var2 = var2, var1
        dfn, dfd = len(sample2) - 1, len(sample1) - 1
    else:
        dfn, dfd = len(sample1) - 1, len(sample2) - 1

    F = var1 / var2

    # Two-tailed p-value calculation
    p_value = 2 * min(f.cdf(F, dfn, dfd), 1 - f.cdf(F, dfn, dfd))

    conclusion = "Fail to reject null hypothesis: variances are equal."
    if p_value < alpha:
        conclusion = "Reject null hypothesis: variances are different."

    return F, p_value, conclusion

# Example usage:
np.random.seed(42)
sample1 = np.random.normal(0, 5, 30)  # std dev = 5
sample2 = np.random.normal(0, 7, 35)  # std dev = 7

F_stat, p_val, result = f_test(sample1, sample2)
print(f"F-statistic: {F_stat:.4f}")
print(f"P-value: {p_val:.4f}")
print(result)


In [None]:
17'''Write a Python program to perform an ANOVA test to compare means between multiple groups and
interpret the results'''

import numpy as np
from scipy.stats import f_oneway

# Sample data: Suppose we have three groups with observed values
group1 = [23, 20, 22, 21, 24]
group2 = [30, 29, 31, 28, 32]
group3 = [22, 23, 21, 20, 19]

# Perform one-way ANOVA
F_statistic, p_value = f_oneway(group1, group2, group3)

print(f"ANOVA F-statistic: {F_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between group means.")
else:
    print("Fail to reject the null hypothesis: No significant difference between group means.")


In [None]:
18''' Perform a one-way ANOVA test using Python to compare the means of different groups and plot the results'''

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f_oneway

# Sample data for three groups
group1 = [23, 20, 22, 21, 24]
group2 = [30, 29, 31, 28, 32]
group3 = [22, 23, 21, 20, 19]

# Perform one-way ANOVA
F_statistic, p_value = f_oneway(group1, group2, group3)

print(f"ANOVA F-statistic: {F_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print("Reject null hypothesis: At least one group mean is different.")
else:
    print("Fail to reject null hypothesis: No significant difference between means.")

# Plot boxplots of the groups
plt.boxplot([group1, group2, group3], labels=['Group 1', 'Group 2', 'Group 3'])
plt.title('Group Comparison using Boxplots')
plt.ylabel('Values')
plt.grid(True)
plt.show()


In [None]:
19'''Write a Python function to check the assumptions (normality, independence, and equal variance) for ANOVA'''

import numpy as np
from scipy.stats import shapiro, levene

def check_anova_assumptions(*groups, alpha=0.05):
    """
    Check ANOVA assumptions:
    - Normality (Shapiro-Wilk test) for each group
    - Homogeneity of variances (Levene's test)
    - Independence (not testable here, just a reminder)

    Parameters:
    - groups: lists or arrays of group data
    - alpha: significance level (default 0.05)

    Returns:
    - dict with results and interpretations
    """
    results = {}

    # Normality test per group
    normality = []
    for i, group in enumerate(groups, start=1):
        stat, p = shapiro(group)
        normality.append((stat, p))
        print(f"Group {i} Shapiro-Wilk p-value: {p:.4f} -> {'Normal' if p > alpha else 'Not normal'}")

    results['normality'] = normality

    # Homogeneity of variances (Levene's test)
    stat, p = levene(*groups)
    results['levene'] = (stat, p)
    print(f"Levene's test p-value: {p:.4f} -> {'Equal variances' if p > alpha else 'Unequal variances'}")

    # Independence reminder
    print("Independence assumption: Must be ensured by study design (random sampling, no related samples).")

    return results

# Example usage:
group1 = [23, 20, 22, 21, 24]
group2 = [30, 29, 31, 28, 32]
group3 = [22, 23, 21, 20, 19]

check_anova_assumptions(group1, group2, group3)


In [None]:
20'''Perform a two-way ANOVA test using Python to study the interaction between two factors and visualize the  result'''

import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
import seaborn as sns

# Simulate example data
np.random.seed(0)
factor_A = ['Low', 'High']
factor_B = ['Control', 'Treatment']

# Create balanced dataset
data = []
for a in factor_A:
    for b in factor_B:
        # simulate 10 observations per group with some interaction effect
        mean = 10 + (5 if a == 'High' else 0) + (3 if b == 'Treatment' else 0) + (2 if a == 'High' and b == 'Treatment' else 0)
        observations = np.random.normal(loc=mean, scale=2, size=10)
        for obs in observations:
            data.append([a, b, obs])

df = pd.DataFrame(data, columns=['FactorA', 'FactorB', 'Value'])

# Fit two-way ANOVA model with interaction
model = ols('Value ~ C(FactorA) * C(FactorB)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

# Visualization: Interaction plot
plt.figure(figsize=(8,6))
sns.pointplot(data=df, x='FactorA', y='Value', hue='FactorB', dodge=True, markers=['o', 's'], capsize=.1, errwidth=1)
plt.title('Interaction Plot between FactorA and FactorB')
plt.ylabel('Mean Value')
plt.show()


In [None]:
21'''Write a Python program to visualize the F-distribution and discuss its use in hypothesis testing'''

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f

# Parameters for the F-distribution
dfn = 5  # degrees of freedom numerator
dfd = 10 # degrees of freedom denominator

# Generate x values (F-distribution is only defined for x >= 0)
x = np.linspace(0, 5, 500)

# Compute the PDF of the F-distribution
pdf = f.pdf(x, dfn, dfd)

# Plot the F-distribution PDF
plt.figure(figsize=(8, 5))
plt.plot(x, pdf, 'b-', lw=2, label=f'F-distribution\n(dfnum={dfn}, dfden={dfd})')
plt.title('F-distribution Probability Density Function')
plt.xlabel('F value')
plt.ylabel('Probability Density')
plt.grid(True)
plt.legend()
plt.show()


In [None]:
22''' Perform a one-way ANOVA test in Python and visualize the results with boxplots to compare group means'''

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Generate sample data with 3 groups
np.random.seed(42)
group1 = np.random.normal(loc=50, scale=10, size=30)
group2 = np.random.normal(loc=55, scale=10, size=30)
group3 = np.random.normal(loc=60, scale=10, size=30)

# Combine into a DataFrame for easier handling
data = pd.DataFrame({
    'values': np.concatenate([group1, group2, group3]),
    'group': ['Group1'] * 30 + ['Group2'] * 30 + ['Group3'] * 30
})

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(group1, group2, group3)

print("One-Way ANOVA Results:")
print(f"F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpret the p-value
alpha = 0.05
if p_value < alpha:
    print("\nThe p-value is less than 0.05. We reject the null hypothesis.")
    print("There are statistically significant differences between group means.")
else:
    print("\nThe p-value is greater than 0.05. We fail to reject the null hypothesis.")
    print("There are no statistically significant differences between group means.")

# Create boxplot to visualize the results
plt.figure(figsize=(8, 6))
sns.boxplot(x='group', y='values', data=data)
plt.title('Comparison of Group Means with Boxplots', fontsize=14)
plt.xlabel('Groups', fontsize=12)
plt.ylabel('Values', fontsize=12)

# Add annotation with ANOVA results
plt.text(0.5, max(data['values']) * 1.05,
         f'One-Way ANOVA: F = {f_stat:.2f}, p = {p_value:.4f}',
         ha='center', va='center', fontsize=12)

plt.tight_layout()
plt.show()

In [None]:
23'''Simulate random data from a normal distribution, then perform hypothesis testing to evaluate the means'''

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Set random seed for reproducibility
np.random.seed(42)

# Parameters for the normal distributions
mean1, std1 = 50, 10
mean2, std2 = 55, 10
sample_size = 30

# Generate two samples from normal distributions
sample1 = np.random.normal(loc=mean1, scale=std1, size=sample_size)
sample2 = np.random.normal(loc=mean2, scale=std2, size=sample_size)

# Perform independent two-sample t-test (assuming equal variances)
t_stat, p_value = stats.ttest_ind(sample1, sample2)

print("Simulation Parameters:")
print(f"Sample 1: Mean = {mean1}, Std = {std1}")
print(f"Sample 2: Mean = {mean2}, Std = {std2}")
print(f"Sample size for each group: {sample_size}\n")

print("Hypothesis Test Results (Two-Sample t-test):")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")

# Interpret the results
alpha = 0.05
print("\nConclusion:")
if p_value < alpha:
    print(f"We reject the null hypothesis (p < {alpha}). There is significant evidence that the means are different.")
else:
    print(f"We fail to reject the null hypothesis (p ≥ {alpha}). No significant difference in means detected.")



In [None]:
24''' Perform a hypothesis test for population variance using a Chi-square distribution and interpret the results'''

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Set random seed for reproducibility
np.random.seed(42)

## 1. Generate sample data from a normal distribution
true_mean = 50
true_variance = 25  # σ² = 25 → σ = 5
sample_size = 30
sample = np.random.normal(loc=true_mean, scale=np.sqrt(true_variance), size=sample_size)

## 2. State the hypotheses
# H₀: σ² = 25 (null hypothesis - population variance equals 25)
# H₁: σ² ≠ 25 (alternative hypothesis - two-tailed test)
hypothesized_variance = 25
alpha = 0.05  # significance level

## 3. Calculate test statistic
sample_variance = np.var(sample, ddof=1)  # sample variance with Bessel's correction (n-1)
chi_square_stat = (sample_size - 1) * sample_variance / hypothesized_variance

## 4. Calculate critical values and p-value
df = sample_size - 1  # degrees of freedom
lower_critical = stats.chi2.ppf(alpha/2, df)
upper_critical = stats.chi2.ppf(1 - alpha/2, df)
p_value = 2 * min(stats.chi2.cdf(chi_square_stat, df), 1 - stats.chi2.cdf(chi_square_stat, df))

## 5. Print results
print(f"Sample variance: {sample_variance:.4f}")
print(f"Chi-square test statistic: {chi_square_stat:.4f}")
print(f"Critical values: ({lower_critical:.4f}, {upper_critical:.4f})")
print(f"P-value: {p_value:.4f}")

## 6. Make decision
if chi_square_stat < lower_critical or chi_square_stat > upper_critical:
    print(f"Reject H₀ at α={alpha} (test statistic outside critical region)")
else:
    print(f"Fail to reject H₀ at α={alpha}")

## 7. Interpretation
print("\nInterpretation:")
print(f"We are testing whether the population variance differs from {hypothesized_variance}.")
if p_value < alpha:
    print(f"With p-value {p_value:.4f} < {alpha}, we have significant evidence to conclude")
    print("that the population variance is different from the hypothesized value.")
else:
    print(f"With p-value {p_value:.4f} ≥ {alpha}, we don't have sufficient evidence")
    print("to conclude that the population variance differs from the hypothesized value.")

## 8. Visualization
plt.figure(figsize=(10, 6))
x = np.linspace(0, stats.chi2.ppf(0.999, df), 1000)
y = stats.chi2.pdf(x, df)

plt.plot(x, y, label=f'Chi-square distribution (df={df})')
plt.axvline(lower_critical, color='red', linestyle='--', label=f'Lower critical value ({lower_critical:.2f})')
plt.axvline(upper_critical, color='red', linestyle='--', label=f'Upper critical value ({upper_critical:.2f})')
plt.axvline(chi_square_stat, color='blue', linestyle='-', label=f'Test statistic ({chi_square_stat:.2f})')

plt.fill_between(x, y, where=(x < lower_critical) | (x > upper_critical),
                color='red', alpha=0.2, label='Rejection region')

plt.title('Chi-Square Test for Population Variance', fontsize=14)
plt.xlabel('Chi-square statistic', fontsize=12)
plt.ylabel('Probability density', fontsize=12)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
25'''Write a Python script to perform a Z-test for comparing proportions between two datasets or groups'''

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def ztest_proportions_two_samples(successes_a, sample_size_a, successes_b, sample_size_b):
    """
    Perform a two-sample Z-test for proportions.

    Parameters:
    successes_a, successes_b - Number of successes in each sample
    sample_size_a, sample_size_b - Size of each sample

    Returns:
    z_score - The computed Z-statistic
    p_value - The two-tailed p-value
    """
    # Calculate proportions
    p_a = successes_a / sample_size_a
    p_b = successes_b / sample_size_b
    p_pooled = (successes_a + successes_b) / (sample_size_a + sample_size_b)

    # Calculate standard error
    se = np.sqrt(p_pooled * (1 - p_pooled) * (1/sample_size_a + 1/sample_size_b))

    # Calculate Z-score
    z_score = (p_a - p_b) / se

    # Calculate two-tailed p-value
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

    return z_score, p_value

# Example usage:
# Group A: 120 successes out of 500 trials
# Group B: 150 successes out of 500 trials
successes_a = 120
sample_size_a = 500
successes_b = 150
sample_size_b = 500

# Perform the test
z_score, p_value = ztest_proportions_two_samples(successes_a, sample_size_a, successes_b, sample_size_b)

# Print results
print(f"Group A proportion: {successes_a/sample_size_a:.4f}")
print(f"Group B proportion: {successes_b/sample_size_b:.4f}")
print(f"\nZ-test results:")
print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpretation
alpha = 0.05
print("\nConclusion:")
if p_value < alpha:
    print(f"We reject the null hypothesis (p < {alpha}). There is significant evidence that the proportions are different.")
else:
    print(f"We fail to reject the null hypothesis (p ≥ {alpha}). No significant difference in proportions detected.")

# Visualization
plt.figure(figsize=(8, 6))
plt.bar(['Group A', 'Group B'],
        [successes_a/sample_size_a, successes_b/sample_size_b],
        color=['blue', 'orange'])
plt.errorbar(['Group A', 'Group B'],
             [successes_a/sample_size_a, successes_b/sample_size_b],
             yerr=[1.96*np.sqrt((successes_a/sample_size_a)*(1-successes_a/sample_size_a)/sample_size_a,
                   1.96*np.sqrt((successes_b/sample_size_b)*(1-successes_b/sample_size_b)/sample_size_b)],
             fmt='none', color='black', capsize=10)
plt.title('Proportion Comparison Between Groups', fontsize=14)
plt.xlabel('Groups', fontsize=12)
plt.ylabel('Proportion', fontsize=12)
plt.grid(True, alpha=0.3)
plt.text(0.5, max(successes_a/sample_size_a, successes_b/sample_size_b)*0.9,
         f'Z = {z_score:.2f}, p = {p_value:.4f}',
         ha='center', va='center', fontsize=12)
plt.tight_layout()
plt.show()

In [None]:
26'''Implement an F-test for comparing the variances of two datasets, then interpret and visualize the results'''

