#THEORY QUESTIONS

1. What is hypothesis testing in statistics?
   - Hypothesis testing is a statistical method used to make decisions or inferences about a population based on a sample. It involves testing an assumption (hypothesis) about a population parameter.



2. What is the null hypothesis, and how does it differ from the alternative hypothesis?
   - Null hypothesis (H₀): Assumes no effect or no difference in the population.

Alternative hypothesis (H₁ or Ha): Suggests there is an effect or a difference.

They are mutually exclusive. You either reject H₀ or fail to reject it in favor of H₁.



3. What is the significance level in hypothesis testing, and why is it important?
   - The significance level (α) is the probability of rejecting the null hypothesis when it is actually true. Common values are 0.05 or 0.01. It sets the threshold for how strong the evidence must be to reject H₀.

4. What does a P-value represent in hypothesis testing?
   - The P-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the observed one, assuming H₀ is true.

5. How do you interpret the P-value in hypothesis testing?
   - If P ≤ α, reject the null hypothesis (evidence supports H₁).

If P > α, fail to reject the null hypothesis (insufficient evidence to support H₁).

6. What are Type 1 and Type 2 errors in hypothesis testing?
   - Type I error (α): Rejecting H₀ when it is true.

Type II error (β): Failing to reject H₀ when it is false.



7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?
   - One-tailed test: Tests for effect in one direction (e.g., μ > μ₀).

Two-tailed test: Tests for effect in both directions (e.g., μ ≠ μ₀).

8. What is the Z-test, and when is it used in hypothesis testing?
   - A Z-test is used when the population variance is known and the sample size is large (n > 30). It assesses whether sample means differ significantly from population means.

9. How do you calculate the Z-score, and what does it represent in hypothesis testing?
   - It measures how many standard deviations the sample mean is from the population mean (μ).

10. What is the T-distribution, and when should it be used instead of the normal distribution?
   - The T-distribution is used when the population standard deviation is unknown and the sample size is small (n < 30). It has heavier tails than the normal distribution.



11. What is the difference between a Z-test and a T-test?
   - Z-test: Known population variance, large sample.

T-test: Unknown population variance, small sample.

12. What is the T-test, and how is it used in hypothesis testing?
   - A T-test assesses whether the means of two groups (or a sample and a population) are significantly different, using the sample standard deviation.

13. What is the relationship between Z-test and T-test in hypothesis testing?
   - Both tests compare sample statistics to population parameters. The T-test is a generalization of the Z-test used when the population variance is unknown.

14. What is a confidence interval, and how is it used to interpret statistical results?
   - A confidence interval (CI) provides a range of values within which the true population parameter likely lies, with a certain confidence level (e.g., 95%).

15. What is the margin of error, and how does it affect the confidence interval?
   - The margin of error is the range added/subtracted from the point estimate to create the CI. A larger margin results in a wider interval, indicating more uncertainty.

16. How is Bayes' Theorem used in statistics, and what is its significance?
   - Bayes’ Theorem calculates the probability of a hypothesis given new evidence. It is fundamental in Bayesian statistics and used in areas like diagnostic testing and machine learning.

17. What is the Chi-square distribution, and when is it used?
   - The Chi-square (χ²) distribution is used in tests of categorical data (e.g., goodness-of-fit, independence). It’s right-skewed and based on squared deviations.

18. What is the Chi-square goodness of fit test, and how is it applied?
   - It tests whether an observed frequency distribution differs from an expected distribution. Used to assess if a sample matches a population.

19. What is the F-distribution, and when is it used in hypothesis testing?
   - The F-distribution is used to compare two variances or in ANOVA to test if means across multiple groups are equal. It's right-skewed and based on ratio of variances.

20. What is an ANOVA test, and what are its assumptions?
   - ANOVA (Analysis of Variance) tests for significant differences between group means. Assumptions:

Independence of observations

Normality

Equal variances (homoscedasticity)



21. What are the different types of ANOVA tests?
   - One-way ANOVA: One independent variable

Two-way ANOVA: Two independent variables

Repeated measures ANOVA: Same subjects tested multiple times

22. What is the F-test, and how does it relate to hypothesis testing?
   - The F-test compares variances to determine if they are significantly different. It is the basis of the ANOVA test and is used to test overall model significance.




# PRACTICAL QUESTIONS

In [None]:
# Write a Python program to perform a Z-test for comparing a sample mean to a known population mean and interpret the results

import numpy as np
from scipy import stats

# Sample data
sample = [52, 55, 53, 54, 56, 58, 52, 53, 57, 54]  # example sample data
sample_mean = np.mean(sample)
sample_size = len(sample)

# Known population parameters
population_mean = 50
population_std = 3  # Known population standard deviation

# Z-test calculation
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))  # Two-tailed test

# Output the results
print(f"Sample Mean: {sample_mean:.2f}")
print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpret the result
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: The sample mean is not significantly different from the population mean.")


In [None]:
# Simulate random data to perform hypothesis testing and calculate the corresponding P-value using Python

import numpy as np
from scipy import stats

# Simulate random data for the sample (e.g., heights in cm)
np.random.seed(42)  # for reproducibility
sample_size = 30
population_mean = 170
population_std = 10  # known population standard deviation

# Generate sample data assuming it's drawn from a population with mean = 172
sample_data = np.random.normal(loc=172, scale=10, size=sample_size)
sample_mean = np.mean(sample_data)

# Perform a one-sample Z-test
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))  # two-tailed

# Display results
print("Sample Data:", np.round(sample_data, 2))
print(f"\nSample Mean = {sample_mean:.2f}")
print(f"Z-score = {z_score:.4f}")
print(f"P-value = {p_value:.4f}")

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("Conclusion: Reject the null hypothesis (significant difference).")
else:
    print("Conclusion: Fail to reject the null hypothesis (no significant difference).")



In [None]:
# Implement a one-sample Z-test using Python to compare the sample mean with the population mean

import numpy as np
from scipy.stats import norm

# Step 1: Define the sample data
sample_data = [102, 100, 98, 101, 99, 100, 97, 103, 99, 98]  # Example sample
sample_mean = np.mean(sample_data)
sample_size = len(sample_data)

# Step 2: Define known population parameters
population_mean = 100
population_std = 2  # Known standard deviation

# Step 3: Calculate the Z-score
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

# Step 4: Calculate the two-tailed P-value
p_value = 2 * (1 - norm.cdf(abs(z_score)))

# Step 5: Output results
print(f"Sample Mean: {sample_mean:.2f}")
print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")

# Step 6: Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: Sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: No significant difference between sample and population mean.")


In [None]:
# Perform a two-tailed Z-test using Python and visualize the decision region on a plot

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Step 1: Define sample and population parameters
sample_data = [102, 100, 98, 101, 99, 100, 97, 103, 99, 98]
sample_mean = np.mean(sample_data)
sample_size = len(sample_data)

population_mean = 100
population_std = 2  # Known population standard deviation
alpha = 0.05  # Significance level

# Step 2: Calculate Z-score
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
p_value = 2 * (1 - norm.cdf(abs(z_score)))

# Step 3: Print results
print(f"Sample Mean: {sample_mean:.2f}")
print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < alpha:
    print("Reject the null hypothesis (significant difference).")
else:
    print("Fail to reject the null hypothesis (no significant difference).")

# Step 4: Visualization
z_critical = norm.ppf(1 - alpha/2)

# Create range for x-axis
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x, 0, 1)  # standard normal curve

plt.figure(figsize=(10, 5))
plt.plot(x, y, label='Standard Normal Distribution', color='blue')

# Fill critical regions
plt.fill_between(x, y, where=(x <= -z_critical), color='red', alpha=0.5, label='Rejection Region (Left)')
plt.fill_between(x, y, where=(x >= z_critical), color='red', alpha=0.5, label='Rejection Region (Right)')

# Mark observed Z-score
plt.axvline(z_score, color='green', linestyle='--', linewidth=2, label=f'Observed Z = {z_score:.2f}')

# Add annotations
plt.title('Two-Tailed Z-Test with Rejection Regions')
plt.xlabel('Z-score')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
# Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def visualize_type1_type2_errors(mu0=100, mu1=103, sigma=10, n=30, alpha=0.05):
    """
    Visualizes Type I and Type II errors for hypothesis testing.

    Parameters:
        mu0 (float): Mean under null hypothesis (H0)
        mu1 (float): Mean under alternative hypothesis (H1)
        sigma (float): Population standard deviation
        n (int): Sample size
        alpha (float): Significance level
    """
    # Standard error
    se = sigma / np.sqrt(n)

    # Critical value (two-tailed test)
    z_critical = norm.ppf(1 - alpha / 2)
    x_crit_low = mu0 - z_critical * se
    x_crit_high = mu0 + z_critical * se

    # Plotting range
    x = np.linspace(mu0 - 4*se, mu1 + 4*se, 1000)

    # Distributions under H0 and H1
    y0 = norm.pdf(x, mu0, se)
    y1 = norm.pdf(x, mu1, se)

    plt.figure(figsize=(10, 6))
    plt.plot(x, y0, label='H₀: μ = {}'.format(mu0), color='blue')
    plt.plot(x, y1, label='H₁: μ = {}'.format(mu1), color='orange')

    # Type I Error: Reject H₀ when it's true (Red)
    plt.fill_between(x, y0, where=(x < x_crit_low) | (x > x_crit_high), color='red', alpha=0.3, label='Type I Error (α)')

    # Type II Error: Fail to reject H₀ when H₁ is true (Purple)
    plt.fill_between(x, y1, where=(x > x_crit_low) & (x < x_crit_high), color='purple', alpha=0.3, label='Type II Error (β)')

    # Decision boundary
    plt.axvline(x_crit_low, color='black', linestyle='--')
    plt.axvline(x_crit_high, color='black', linestyle='--')

    plt.title('Type I and Type II Errors Visualization')
    plt.xlabel('Sample Mean')
    plt.ylabel('Probability Density')
    plt.legend()
    plt.grid(True)
    plt.show()

    # Calculating beta (Type II error probability)
    beta = norm.cdf(x_crit_high, mu1, se) - norm.cdf(x_crit_low, mu1, se)
    power = 1 - beta

    print(f"Critical values: {x_crit_low:.2f}, {x_crit_high:.2f}")
    print(f"Type I error rate (α): {alpha}")
    print(f"Type II error rate (β): {beta:.4f}")
    print(f"Power of the test (1 - β): {power:.4f}")

# Example usage:
visualize_type1_type2_errors()


In [None]:
# Write a Python program to perform an independent T-test and interpret the results

import numpy as np
from scipy.stats import ttest_ind

# Step 1: Define two independent sample datasets
group1 = [88, 92, 85, 91, 87, 90, 86]
group2 = [84, 83, 89, 81, 85, 87, 82]

# Step 2: Perform the independent t-test
t_stat, p_value = ttest_ind(group1, group2, equal_var=True)  # assume equal variances

# Step 3: Output results
print("Group 1 Mean:", np.mean(group1))
print("Group 2 Mean:", np.mean(group2))
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Step 4: Interpret the result
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the two groups.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the two groups.")


In [None]:
# Perform a paired sample T-test using Python and visualize the comparison results

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_rel

# Step 1: Define paired sample data (before and after treatment, for example)
before = [88, 90, 85, 87, 86, 91, 89, 90, 88, 84]
after =  [91, 93, 86, 89, 88, 92, 90, 91, 90, 86]

# Step 2: Perform paired sample t-test
t_stat, p_value = ttest_rel(before, after)

# Step 3: Output results
print("Mean (Before):", np.mean(before))
print("Mean (After):", np.mean(after))
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Step 4: Interpret results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the paired samples.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the paired samples.")

# Step 5: Visualization - Line plot for individual changes
plt.figure(figsize=(12, 5))

# Line plot
plt.subplot(1, 2, 1)
for i in range(len(before)):
    plt.plot(['Before', 'After'], [before[i], after[i]], marker='o', linestyle='-', color='gray')
plt.title('Paired Sample Line Plot')
plt.ylabel('Scores')
plt.grid(True)

# Boxplot
plt.subplot(1, 2, 2)
plt.boxplot([before, after], labels=['Before', 'After'])
plt.title('Before vs After Comparison')
plt.ylabel('Scores')
plt.grid(True)

plt.tight_layout()
plt.show()


In [None]:
# Simulate data and perform both Z-test and T-test, then compare the results using Python

import numpy as np
from scipy.stats import norm, t

# Step 1: Simulate sample data
np.random.seed(42)
true_mean = 100
sample_size = 25
population_std = 10  # Known for Z-test
sample_data = np.random.normal(loc=102, scale=10, size=sample_size)

# Sample statistics
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)  # ddof=1 for sample std dev

# Step 2: Perform Z-test
z_score = (sample_mean - true_mean) / (population_std / np.sqrt(sample_size))
z_p_value = 2 * (1 - norm.cdf(abs(z_score)))

# Step 3: Perform one-sample T-test
t_score = (sample_mean - true_mean) / (sample_std / np.sqrt(sample_size))
t_p_value = 2 * (1 - t.cdf(abs(t_score), df=sample_size - 1))

# Step 4: Output comparison
print("===== Sample Info =====")
print(f"Sample Mean     = {sample_mean:.2f}")
print(f"Sample Std Dev  = {sample_std:.2f}")
print(f"Sample Size     = {sample_size}\n")

print("===== Z-test Results (σ known) =====")
print(f"Z-score         = {z_score:.4f}")
print(f"P-value         = {z_p_value:.4f}")
print("Assumes known population std deviation.\n")

print("===== T-test Results (σ unknown) =====")
print(f"T-score         = {t_score:.4f}")
print(f"P-value         = {t_p_value:.4f}")
print("Uses sample std deviation.\n")

# Step 5: Interpretation
alpha = 0.05
z_result = "Reject H0" if z_p_value < alpha else "Fail to reject H0"
t_result = "Reject H0" if t_p_value < alpha else "Fail to reject H0"

print("===== Conclusion =====")
print(f"Z-test decision: {z_result}")
print(f"T-test decision: {t_result}")


In [None]:
# Write a Python function to calculate the confidence interval for a sample mean and explain its significance

import numpy as np
from scipy.stats import t

def confidence_interval(data, confidence=0.95):
    """
    Calculate the confidence interval for the sample mean.

    Parameters:
        data (list or array): Sample data
        confidence (float): Confidence level (default is 0.95 for 95%)

    Returns:
        (mean, lower_bound, upper_bound)
    """
    n = len(data)
    mean = np.mean(data)
    std_err = np.std(data, ddof=1) / np.sqrt(n)
    t_crit = t.ppf((1 + confidence) / 2, df=n - 1)

    margin_of_error = t_crit * std_err
    lower = mean - margin_of_error
    upper = mean + margin_of_error

    return mean, lower, upper

# Example usage
sample_data = [88, 92, 85, 91, 87, 90, 86, 89, 93, 88]
mean, lower, upper = confidence_interval(sample_data)

print(f"Sample Mean: {mean:.2f}")
print(f"95% Confidence Interval: ({lower:.2f}, {upper:.2f})")


In [None]:
# Write a Python program to calculate the margin of error for a given confidence level using sample data

import numpy as np
from scipy.stats import t

def calculate_margin_of_error(data, confidence=0.95):
    """
    Calculate the margin of error for the sample mean at a given confidence level.

    Parameters:
        data (list or array): Sample data
        confidence (float): Confidence level (default = 0.95)

    Returns:
        margin_of_error (float)
    """
    n = len(data)
    sample_std = np.std(data, ddof=1)  # sample standard deviation
    std_error = sample_std / np.sqrt(n)  # standard error of the mean
    t_critical = t.ppf((1 + confidence) / 2, df=n - 1)  # t-critical value
    margin_of_error = t_critical * std_error
    return margin_of_error

# Example usage
sample_data = [45, 50, 55, 52, 48, 49, 53, 51, 47, 54]
confidence_level = 0.95

moe = calculate_margin_of_error(sample_data, confidence_level)
print(f"Margin of Error at {int(confidence_level * 100)}% confidence: ±{moe:.2f}")


In [None]:
# Implement a Bayesian inference method using Bayes' Theorem in Python and explain the process

def bayes_theorem(prior_H, likelihood_E_given_H, likelihood_E_given_not_H):
    """
    Compute the posterior probability using Bayes' Theorem.

    Parameters:
        prior_H: Prior probability of hypothesis H
        likelihood_E_given_H: Likelihood of evidence given H is true
        likelihood_E_given_not_H: Likelihood of evidence given H is false

    Returns:
        posterior_H: Updated probability of H given E
    """
    prior_not_H = 1 - prior_H

    # Total probability of evidence
    marginal_E = (likelihood_E_given_H * prior_H) + (likelihood_E_given_not_H * prior_not_H)

    # Bayes' theorem
    posterior_H = (likelihood_E_given_H * prior_H) / marginal_E

    return posterior_H

# Example scenario:
# A medical test for a disease is 99% accurate (true positive rate),
# but the disease prevalence in the population is only 1%.

prior = 0.01                        # P(Disease)
likelihood = 0.99                  # P(Positive Test | Disease)
false_positive = 0.05              # P(Positive Test | No Disease)

posterior = bayes_theorem(prior, likelihood, false_positive)

print(f"Probability of having the disease given a positive test result: {posterior:.4f}")


In [None]:
# Perform a Chi-square test for independence between two categorical variables in Python

import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

# Step 1: Create a contingency table
# Example: Relationship between Gender and Preference
#          Preference
# Gender     A   B
#   M       20  30
#   F       25  25

data = [[20, 30],
        [25, 25]]
labels = ['Male', 'Female']
columns = ['Pref A', 'Pref B']

# Convert to DataFrame (optional for clarity)
table = pd.DataFrame(data, index=labels, columns=columns)
print("Contingency Table:")
print(table)

# Step 2: Perform the Chi-square test
chi2, p, dof, expected = chi2_contingency(data)

# Step 3: Output results
print("\nChi-square Test Results:")
print(f"Chi2 Statistic = {chi2:.4f}")
print(f"Degrees of Freedom = {dof}")
print(f"P-value = {p:.4f}")
print("\nExpected Frequencies:")
print(pd.DataFrame(expected, index=labels, columns=columns))

# Step 4: Interpretation
alpha = 0.05
if p < alpha:
    print("\nConclusion: Reject the null hypothesis — the variables are dependent (associated).")
else:
    print("\nConclusion: Fail to reject the null hypothesis — the variables are independent.")


In [None]:
# Write a Python program to calculate the expected frequencies for a Chi-square test based on observed data

import numpy as np
import pandas as pd

# Step 1: Define observed data (contingency table)
# Example: Product preference by gender
observed = np.array([
    [20, 30],  # Male
    [25, 25]   # Female
])

row_labels = ['Male', 'Female']
col_labels = ['Preference A', 'Preference B']

# Step 2: Convert to DataFrame for clarity
observed_df = pd.DataFrame(observed, index=row_labels, columns=col_labels)

# Step 3: Calculate row totals, column totals, and grand total
row_totals = observed.sum(axis=1).reshape(-1, 1)
col_totals = observed.sum(axis=0).reshape(1, -1)
grand_total = observed.sum()

# Step 4: Calculate expected frequencies
expected = (row_totals @ col_totals) / grand_total

# Step 5: Display results
expected_df = pd.DataFrame(expected, index=row_labels, columns=col_labels)

print("Observed Frequencies:\n", observed_df)
print("\nExpected Frequencies:\n", expected_df.round(2))



In [None]:
# Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution

import numpy as np
from scipy.stats import chisquare

# Step 1: Define observed and expected frequencies
# Example: Rolling a die 60 times — does it appear fair?
observed = np.array([8, 9, 10, 11, 12, 10])  # observed counts for sides 1 to 6
expected = np.array([10, 10, 10, 10, 10, 10])  # expected if die is fair

# Step 2: Perform Chi-square goodness-of-fit test
chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

# Step 3: Output results
print("Observed Frequencies:", observed)
print("Expected Frequencies:", expected)
print(f"\nChi-square Statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Step 4: Interpret the result
alpha = 0.05
if p_value < alpha:
    print("\nConclusion: Reject the null hypothesis — observed data does NOT fit the expected distribution.")
else:
    print("\nConclusion: Fail to reject the null hypothesis — observed data fits the expected distribution.")


In [None]:
# Implement an F-test using Python to compare the variances of two random samples

import numpy as np
from scipy.stats import f

def f_test(sample1, sample2, alpha=0.05):
    """
    Perform a two-tailed F-test to compare variances of two samples.

    Parameters:
        sample1, sample2: Input arrays of sample data
        alpha: Significance level (default = 0.05)

    Returns:
        f_statistic, p_value, conclusion
    """
    # Sample variances
    var1 = np.var(sample1, ddof=1)
    var2 = np.var(sample2, ddof=1)

    # Degrees of freedom
    df1 = len(sample1) - 1
    df2 = len(sample2) - 1

    # Ensure the larger variance is numerator (F ≥ 1)
    if var1 > var2:
        f_stat = var1 / var2
        dfn, dfd = df1, df2
    else:
        f_stat = var2 / var1
        dfn, dfd = df2, df1

    # Calculate two-tailed p-value
    p_value = 2 * min(f.cdf(f_stat, dfn, dfd), 1 - f.cdf(f_stat, dfn, dfd))

    # Conclusion
    conclusion = "Reject null hypothesis: variances are significantly different." \
        if p_value < alpha else "Fail to reject null hypothesis: no significant difference in variances."

    return f_stat, p_value, conclusion

# Example usage
np.random.seed(42)
sample1 = np.random.normal(50, 10, 30)  # mean=50, std=10
sample2 = np.random.normal(50, 15, 30)  # mean=50, std=15

f_statistic, p_val, decision = f_test(sample1, sample2)

print(f"F-statistic: {f_statistic:.4f}")
print(f"P-value: {p_val:.4f}")
print(f"Conclusion: {decision}")


In [None]:
# Create a Python script to simulate and visualize the Chi-square distribution and discuss its characteristics

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Step 1: Set parameters
df = 5  # degrees of freedom
x = np.linspace(0, 30, 500)
pdf_values = chi2.pdf(x, df)

# Step 2: Plot the Chi-square distribution
plt.figure(figsize=(10, 6))
plt.plot(x, pdf_values, label=f'Chi-square (df={df})', color='blue')
plt.fill_between(x, pdf_values, alpha=0.2)
plt.title(f'Chi-square Distribution with {df} Degrees of Freedom')
plt.xlabel('Chi-square Value')
plt.ylabel('Probability Density')
plt.grid(True)
plt.legend()
plt.show()


In [None]:
# Write a Python program to perform an ANOVA test to compare means between multiple groups and interpret the results

import numpy as np
from scipy.stats import f_oneway

# Step 1: Define sample data for 3 groups
group1 = [88, 92, 85, 91, 87]
group2 = [78, 74, 80, 76, 79]
group3 = [94, 96, 92, 95, 97]

# Step 2: Perform one-way ANOVA
f_stat, p_value = f_oneway(group1, group2, group3)

# Step 3: Output results
print("Means:")
print(f"Group 1 Mean: {np.mean(group1):.2f}")
print(f"Group 2 Mean: {np.mean(group2):.2f}")
print(f"Group 3 Mean: {np.mean(group3):.2f}\n")

print("ANOVA Test Result:")
print(f"F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Step 4: Interpret the result
alpha = 0.05
if p_value < alpha:
    print("\nConclusion: Reject the null hypothesis — at least one group mean is significantly different.")
else:
    print("\nConclusion: Fail to reject the null hypothesis — no significant difference in group means.")


In [None]:
# Perform a one-way ANOVA test using Python to compare the means of different groups and plot the results

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f_oneway

# Step 1: Define sample data for 3 groups
group_A = [88, 92, 85, 91, 87]
group_B = [78, 74, 80, 76, 79]
group_C = [94, 96, 92, 95, 97]

groups = [group_A, group_B, group_C]
labels = ['Group A', 'Group B', 'Group C']

# Step 2: Perform one-way ANOVA
f_stat, p_value = f_oneway(group_A, group_B, group_C)

# Step 3: Print results
print("Group Means:")
for label, group in zip(labels, groups):
    print(f"{label}: {np.mean(group):.2f}")

print("\nANOVA Results:")
print(f"F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print("\nConclusion: Reject H₀ — At least one group mean is significantly different.")
else:
    print("\nConclusion: Fail to reject H₀ — No significant difference in group means.")

# Step 4: Visualization - Boxplot
plt.figure(figsize=(8, 6))
plt.boxplot(groups, labels=labels)
plt.title('Group Comparison Using One-Way ANOVA')
plt.ylabel('Values')
plt.grid(True)
plt.show()


In [None]:
# Write a Python function to check the assumptions (normality, independence, and equal variance) for ANOVA

import numpy as np
from scipy.stats import shapiro, levene
import pandas as pd

def check_anova_assumptions(*groups):
    """
    Checks ANOVA assumptions: normality and equal variance.

    Parameters:
        groups: Variable number of group arrays (lists or numpy arrays)

    Prints:
        Shapiro-Wilk test results for normality
        Levene's test result for equal variances
    """
    print("=== Checking Normality (Shapiro-Wilk Test) ===")
    for i, group in enumerate(groups, 1):
        stat, p = shapiro(group)
        print(f"Group {i}: W={stat:.4f}, p={p:.4f} -> {'Normal' if p > 0.05 else 'Not Normal'}")

    print("\n=== Checking Equal Variance (Levene's Test) ===")
    levene_stat, levene_p = levene(*groups)
    print(f"Levene’s Test: Stat={levene_stat:.4f}, p={levene_p:.4f} -> {'Equal variances' if levene_p > 0.05 else 'Unequal variances'}")


In [None]:
# Perform a two-way ANOVA test using Python to study the interaction between two factors and visualize the results
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Step 1: Simulate data for two factors (e.g., Drug and Gender)
np.random.seed(42)

# Factors
gender = ['Male', 'Female']
drug = ['A', 'B']

# Create a DataFrame
data = []
for g in gender:
    for d in drug:
        for _ in range(10):  # 10 observations per group
            effect = 5 if g == 'Female' else 0
            treatment = 3 if d == 'B' else 0
            interaction = 2 if (g == 'Female' and d == 'B') else 0
            score = 50 + effect + treatment + interaction + np.random.normal(0, 2)
            data.append([g, d, score])

df = pd.DataFrame(data, columns=['Gender', 'Drug', 'Score'])

# Step 2: Perform two-way ANOVA
model = ols('Score ~ C(Gender) + C(Drug) + C(Gender):C(Drug)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print("=== Two-Way ANOVA Results ===")
print(anova_table)

# Step 3: Visualization
plt.figure(figsize=(8, 6))
sns.boxplot(x='Drug', y='Score', hue='Gender', data=df)
plt.title('Two-Way ANOVA: Drug and Gender Interaction')
plt.grid(True)
plt.show()


In [None]:
# Write a Python program to visualize the F-distribution and discuss its use in hypothesis testing

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f

# Step 1: Define degrees of freedom for numerator and denominator
dfn = 5   # Degrees of freedom for the numerator
dfd = 20  # Degrees of freedom for the denominator

# Step 2: Generate F-distribution values
x = np.linspace(0, 5, 500)
pdf = f.pdf(x, dfn, dfd)

# Step 3: Plot the F-distribution
plt.figure(figsize=(10, 6))
plt.plot(x, pdf, 'b-', label=f'F-distribution (dfn={dfn}, dfd={dfd})')
plt.fill_between(x, pdf, color='skyblue', alpha=0.4)
plt.title('F-Distribution')
plt.xlabel('F-value')
plt.ylabel('Probability Density')
plt.grid(True)

# Step 4: Add critical region for alpha = 0.05
alpha = 0.05
f_critical = f.ppf(1 - alpha, dfn, dfd)
plt.axvline(f_critical, color='red', linestyle='--', label=f'Critical F-value (α={alpha}) ≈ {f_critical:.2f}')
plt.fill_between(x[x > f_critical], pdf[x > f_critical], color='red', alpha=0.3)

plt.legend()
plt.show()


In [None]:
# Perform a one-way ANOVA test in Python and visualize the results with boxplots to compare group means

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import f_oneway
import pandas as pd

# Step 1: Create sample data for three groups
group1 = [88, 92, 85, 91, 87]
group2 = [78, 74, 80, 76, 79]
group3 = [94, 96, 92, 95, 97]

# Combine data into a DataFrame for visualization
data = {
    'Score': group1 + group2 + group3,
    'Group': ['Group 1'] * len(group1) + ['Group 2'] * len(group2) + ['Group 3'] * len(group3)
}
df = pd.DataFrame(data)

# Step 2: Perform one-way ANOVA
f_stat, p_value = f_oneway(group1, group2, group3)

# Step 3: Print results
print("=== One-Way ANOVA Test ===")
print(f"F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
    print("Conclusion: Reject H₀ — At least one group mean is significantly different.")
else:
    print("Conclusion: Fail to reject H₀ — No significant difference in group means.")

# Step 4: Visualize with boxplots
plt.figure(figsize=(8, 6))
sns.boxplot(x='Group', y='Score', data=df, palette="Set2")
plt.title('Group Comparison using One-Way ANOVA')
plt.xlabel('Group')
plt.ylabel('Score')
plt.grid(True)
plt.show()


In [None]:
# Simulate random data from a normal distribution, then perform hypothesis testing to evaluate the means

import numpy as np
from scipy.stats import ttest_ind
import matplotlib.pyplot as plt
import seaborn as sns

# Step 1: Simulate random data from normal distributions
np.random.seed(42)
sample1 = np.random.normal(loc=50, scale=5, size=100)  # Mean=50, SD=5
sample2 = np.random.normal(loc=52, scale=5, size=100)  # Mean=52, SD=5

# Step 2: Perform independent t-test
t_stat, p_value = ttest_ind(sample1, sample2)

# Step 3: Print results
print("=== Independent Two-Sample T-Test ===")
print(f"Sample 1 Mean: {np.mean(sample1):.2f}")
print(f"Sample 2 Mean: {np.mean(sample2):.2f}")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print("Conclusion: Reject H₀ — Means are significantly different.")
else:
    print("Conclusion: Fail to reject H₀ — No significant difference in means.")

# Step 4: Visualize distributions
plt.figure(figsize=(8, 5))
sns.histplot(sample1, kde=True, label='Sample 1 (Mean=50)', color='blue', stat="density", bins=20)
sns.histplot(sample2, kde=True, label='Sample 2 (Mean=52)', color='orange', stat="density", bins=20)
plt.axvline(np.mean(sample1), color='blue', linestyle='--')
plt.axvline(np.mean(sample2), color='orange', linestyle='--')
plt.title('Comparison of Two Normal Distributions')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
# Perform a hypothesis test for population variance using a Chi-square distribution and interpret the results

import numpy as np
from scipy.stats import chi2

# Step 1: Sample data and parameters
np.random.seed(42)
sample = np.random.normal(loc=100, scale=10, size=30)  # Population std=10

# Hypothesized population variance
sigma0_squared = 100  # i.e., std = 10

# Step 2: Calculate sample variance
n = len(sample)
sample_var = np.var(sample, ddof=1)

# Step 3: Compute Chi-square statistic
chi2_stat = (n - 1) * sample_var / sigma0_squared

# Step 4: Compute critical values for two-tailed test
alpha = 0.05
chi2_lower = chi2.ppf(alpha / 2, df=n - 1)
chi2_upper = chi2.ppf(1 - alpha / 2, df=n - 1)

# Step 5: Output results
print("=== Chi-square Test for Population Variance ===")
print(f"Sample Variance: {sample_var:.4f}")
print(f"Chi-square Statistic: {chi2_stat:.4f}")
print(f"Degrees of Freedom: {n - 1}")
print(f"Critical Values: Lower={chi2_lower:.4f}, Upper={chi2_upper:.4f}")

# Step 6: Interpret the result
if chi2_stat < chi2_lower or chi2_stat > chi2_upper:
    print("Conclusion: Reject H₀ — Variance is significantly different from hypothesized.")
else:
    print("Conclusion: Fail to reject H₀ — No significant difference in variance.")


In [None]:
# Write a Python script to perform a Z-test for comparing proportions between two datasets or groups

import numpy as np
from statsmodels.stats.proportion import proportions_ztest

# Step 1: Define successes and observations for two groups
# Example: Group A: 40 successes out of 100
#          Group B: 55 successes out of 120

successes = np.array([40, 55])
n_obs = np.array([100, 120])

# Step 2: Perform the Z-test
z_stat, p_value = proportions_ztest(count=successes, nobs=n_obs)

# Step 3: Output the results
print("=== Z-Test for Two Proportions ===")
print(f"Group A: {successes[0]}/{n_obs[0]} = {successes[0]/n_obs[0]:.3f}")
print(f"Group B: {successes[1]}/{n_obs[1]} = {successes[1]/n_obs[1]:.3f}")
print(f"\nZ-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Step 4: Interpret the result
alpha = 0.05
if p_value < alpha:
    print("Conclusion: Reject H₀ — The proportions are significantly different.")
else:
    print("Conclusion: Fail to reject H₀ — No significant difference in proportions.")


In [None]:
# Implement an F-test for comparing the variances of two datasets, then interpret and visualize the results

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f

# Step 1: Generate two random samples
np.random.seed(42)
sample1 = np.random.normal(loc=50, scale=10, size=30)  # std=10
sample2 = np.random.normal(loc=50, scale=15, size=30)  # std=15

# Step 2: Calculate sample variances
var1 = np.var(sample1, ddof=1)
var2 = np.var(sample2, ddof=1)

# Ensure F >= 1 by setting larger variance in numerator
if var1 > var2:
    f_stat = var1 / var2
    dfn, dfd = len(sample1) - 1, len(sample2) - 1
else:
    f_stat = var2 / var1
    dfn, dfd = len(sample2) - 1, len(sample1) - 1

# Step 3: Set significance level and compute critical values
alpha = 0.05
f_critical_low = f.ppf(alpha / 2, dfn, dfd)
f_critical_high = f.ppf(1 - alpha / 2, dfn, dfd)

# Step 4: Output test result
print("=== F-Test for Equality of Variances ===")
print(f"Sample 1 Variance: {var1:.4f}")
print(f"Sample 2 Variance: {var2:.4f}")
print(f"F-statistic: {f_stat:.4f}")
print(f"Degrees of Freedom: df1={dfn}, df2={dfd}")
print(f"Critical Values: Lower={f_critical_low:.4f}, Upper={f_critical_high:.4f}")

if f_stat < f_critical_low or f_stat > f_critical_high:
    print("Conclusion: Reject H₀ — The variances are significantly different.")
else:
    print("Conclusion: Fail to reject H₀ — No significant difference in variances.")

# Step 5: Visualization of F-distribution
x = np.linspace(0.1, 5, 500)
y = f.pdf(x, dfn, dfd)

plt.figure(figsize=(10, 6))
plt.plot(x, y, 'b-', label=f'F-distribution (df1={dfn}, df2={dfd})')
plt.fill_between(x, y, where=(x < f_critical_low) | (x > f_critical_high), color='red', alpha=0.3, label='Rejection region')
plt.axvline(f_stat, color='green', linestyle='--', label=f'F-statistic = {f_stat:.2f}')
plt.title("F-Test for Comparing Two Variances")
plt.xlabel("F-value")
plt.ylabel("Probability Density")
plt.legend()
plt.grid(True)
plt.show()


In [None]:
# Perform a Chi-square test for goodness of fit with simulated data and analyze the results

import numpy as np
from scipy.stats import chisquare
import matplotlib.pyplot as plt

# Step 1: Simulate observed data
# Example: 6 categories (like dice faces), not equally likely
np.random.seed(42)
observed = np.random.multinomial(120, [0.15, 0.20, 0.20, 0.10, 0.15, 0.20])

# Step 2: Define expected frequencies (uniform or custom)
# Let's test if it's uniformly distributed
expected = np.full_like(observed, fill_value=np.sum(observed) / len(observed))

# Step 3: Perform the Chi-square goodness-of-fit test
chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

# Step 4: Output the results
print("=== Chi-square Goodness-of-Fit Test ===")
print(f"Observed: {observed}")
print(f"Expected: {expected}")
print(f"Chi-square Statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print("Conclusion: Reject H₀ — The observed distribution significantly differs from expected.")
else:
    print("Conclusion: Fail to reject H₀ — No significant difference from the expected distribution.")

# Step 5: Visualize the observed vs. expected frequencies
categories = [f'Cat {i+1}' for i in range(len(observed))]
x = np.arange(len(categories))

plt.figure(figsize=(8, 6))
plt.bar(x - 0.2, observed, width=0.4, label='Observed', color='skyblue')
plt.bar(x + 0.2, expected, width=0.4, label='Expected', color='orange')
plt.xticks(x, categories)
plt.ylabel("Frequency")
plt.title("Chi-square Goodness-of-Fit: Observed vs Expected")
plt.legend()
plt.grid(True, axis='y')
plt.tight_layout()
plt.show()
