1. Types of Data (Qualitative & Quantitative)
Qualitative (Categorical): Describes qualities or categories.

Examples: Eye color, type of car, gender.

Quantitative (Numerical): Measurable quantities.

Examples: Height, weight, temperature.

Scales:

Nominal: Categories without order (e.g., colors).

Ordinal: Categories with order (e.g., ranks).

Interval: Ordered, equal intervals, no true zero (e.g., temperature in °C).

Ratio: Like interval, but with a true zero (e.g., weight, height).

2. Measures of Central Tendency
Mean: Arithmetic average; used when data is symmetrically distributed.

Median: Middle value; useful for skewed data.

Mode: Most frequent value; good for categorical data.

3. Dispersion
Describes data spread.

Variance: Average of squared deviations from the mean.

Standard Deviation (SD): Square root of variance; more interpretable.

4. Box Plot
Visual representation showing:

Median, quartiles, min/max, and outliers.

Helps detect skewness and outliers.

5. Random Sampling
Ensures each individual has an equal chance of selection, helping generalize results to the entire population.

6. Skewness
Indicates data asymmetry.

Positive skew: Tail on the right.

Negative skew: Tail on the left.
Skewness affects the mean and interpretation.

7. Interquartile Range (IQR)
IQR = Q3 - Q1

Detects outliers (data points < Q1 - 1.5×IQR or > Q3 + 1.5×IQR)

8. Binomial Distribution Conditions
Fixed number of trials

Two outcomes (success/failure)

Constant probability

Independent trials

9. Normal Distribution & Empirical Rule
Bell-shaped, symmetric curve

68-95-99.7 rule:

68% within 1 SD

95% within 2 SDs

99.7% within 3 SDs

10. Poisson Process Example
Call center receiving calls per minute
If λ = 5 calls/min:
P(X=3) = (e^–5 × 5^3) / 3! = 0.1404

11. Random Variables
Discrete: Countable (e.g., no. of kids)

Continuous: Measurable (e.g., height)

12. Variance & Correlation Example
python
Copy
Edit
import numpy as np

x = [2, 4, 6, 8]
y = [1, 3, 5, 7]

print("Variance of x:", np.var(x))
print("Covariance matrix:\n", np.cov(x, y))
print("Correlation coefficient:", np.corrcoef(x, y)[0,1])
Next, I’ll continue with the Hypothesis Testing Theoretical Questions (Screenshot 4) in the following message. Let me know if you want the practical implementations (Python code) right after that or side-by-side.







You said:
ok


ChatGPT said:
Great! Let’s now continue with Theoretical Questions on Hypothesis Testing from Screenshot 4:

1. What is hypothesis testing in statistics?
It is a method to make inferences about a population using sample data. It involves testing an assumption (hypothesis) to determine if there's enough evidence to reject it.

2. What is the null hypothesis, and how does it differ from the alternative hypothesis?
Null Hypothesis (H₀): Assumes no effect or difference.

Alternative Hypothesis (H₁): Assumes an effect or difference exists.

3. What is the significance level in hypothesis testing, and why is it important?
Significance Level (α): Probability of rejecting H₀ when it's true (Type I error).

Common values: 0.05, 0.01.

4–5. What does a P-value represent, and how do you interpret it?
P-value: Probability of observing the test results if H₀ is true.

Interpretation:

If P ≤ α → reject H₀

If P > α → fail to reject H₀

6. Type I and Type II Errors
Type I Error (False Positive): Rejecting true H₀.

Type II Error (False Negative): Failing to reject false H₀.

7. One-tailed vs. Two-tailed Tests
One-tailed: Tests for direction (e.g., greater than).

Two-tailed: Tests for any difference (≠).

8. What is the Z-test, and when is it used?
Used when:

Sample size ≥ 30

Population variance known

Data is approximately normal

9. How to calculate the Z-score?
Z = (X̄ – μ) / (σ / √n)

Measures how far a sample mean is from population mean.

10. What is the T-distribution and when should it be used?
Used when:

Sample size < 30

Population standard deviation unknown
It has heavier tails than normal distribution.

11. Difference between Z-test and T-test
Z-test: Known variance, large sample.

T-test: Unknown variance, small sample.

12. What is the T-test, and how is it used?
Used to compare means:

One-sample: Sample vs population

Two-sample: Between two independent samples

Paired: Before and after same subjects

13. Relationship between Z-test and T-test
Both compare means, but T-test adjusts for small sample uncertainty using T-distribution.

14. Confidence Interval (CI)
Range of values likely to contain the true parameter.
E.g., "We are 95% confident that the true mean lies between X and Y."

15. Margin of Error
Half-width of confidence interval.
Affected by:

Sample size

Variability

Confidence level

16. Bayes' Theorem
Used to update probability based on new evidence.
In statistics, it enables Bayesian inference:
P(A|B) = [P(B|A) × P(A)] / P(B)

17. Chi-square Distribution
Used for:

Goodness-of-fit

Independence test in contingency tables

18. Chi-square Goodness-of-fit Test
Tests if sample data fits a specific distribution.

19. What is the F-distribution?
Used to compare variances of two populations. Basis of ANOVA.

20. What is ANOVA?
Analysis of Variance – Compares means across ≥2 groups.

One-way: One factor

Two-way: Two factors + interaction

21. Different Types of ANOVA
One-way ANOVA

Two-way ANOVA

Repeated Measures ANOVA

22. F-test vs ANOVA
F-test compares two variances. ANOVA uses F-statistic to compare means across groups.

In [1]:
                                                        #pratical questions

#1
import math
from scipy.stats import norm

def z_test(sample_mean, population_mean, population_std, sample_size, alpha=0.05):
    """
    Perform a one-sample Z-test.

    Parameters:
    - sample_mean: Mean of the sample
    - population_mean: Known population mean to compare against
    - population_std: Known population standard deviation
    - sample_size: Number of observations in the sample
    - alpha: Significance level (default 0.05)

    Returns:
    - z_score: Computed Z statistic
    - p_value: Two-tailed p-value for the test
    - interpretation: Conclusion based on the p-value and alpha
    """
    # Calculate standard error
    standard_error = population_std / math.sqrt(sample_size)

    # Calculate Z-score
    z_score = (sample_mean - population_mean) / standard_error

    # Calculate two-tailed p-value
    p_value = 2 * (1 - norm.cdf(abs(z_score)))

    # Interpret the result
    if p_value < alpha:
        interpretation = f"Reject the null hypothesis at α={alpha}. There is a significant difference."
    else:
        interpretation = f"Fail to reject the null hypothesis at α={alpha}. No significant difference found."

    return z_score, p_value, interpretation

# Example usage
sample_mean = 105
population_mean = 100
population_std = 15
sample_size = 50
alpha = 0.05

z, p, result = z_test(sample_mean, population_mean, population_std, sample_size, alpha)
print(f"Z-score: {z:.4f}")
print(f"P-value: {p:.4f}")
print(result)


Z-score: 2.3570
P-value: 0.0184
Reject the null hypothesis at α=0.05. There is a significant difference.


In [2]:
#2
import numpy as np
import math
from scipy.stats import norm

# Simulate random data
np.random.seed(42)  # For reproducibility
sample_size = 100
population_mean = 50
population_std = 10

# Generate sample data with a true mean slightly different than population_mean
true_sample_mean = 52
sample_data = np.random.normal(loc=true_sample_mean, scale=population_std, size=sample_size)

# Calculate sample statistics
sample_mean = np.mean(sample_data)

# Function to perform Z-test
def z_test(sample_mean, population_mean, population_std, sample_size, alpha=0.05):
    standard_error = population_std / math.sqrt(sample_size)
    z_score = (sample_mean - population_mean) / standard_error
    p_value = 2 * (1 - norm.cdf(abs(z_score)))  # two-tailed test
    return z_score, p_value

z_score, p_value = z_test(sample_mean, population_mean, population_std, sample_size)

print(f"Sample Mean: {sample_mean:.2f}")
print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject the null hypothesis: sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: no significant difference found.")


Sample Mean: 50.96
Z-score: 0.9615
P-value: 0.3363
Fail to reject the null hypothesis: no significant difference found.


In [None]:
#3
import math
from scipy.stats import norm

def one_sample_z_test(sample_mean, population_mean, population_std, sample_size, alpha=0.05):
    """
    Perform a one-sample Z-test.

    Parameters:
    - sample_mean: Mean of the sample data
    - population_mean: Known population mean to compare with
    - population_std: Known population standard deviation
    - sample_size: Number of observations in the sample
    - alpha: Significance level (default 0.05)

    Returns:
    - z_stat: Z statistic
    - p_value: Two-tailed p-value
    - conclusion: Test conclusion string
    """
    # Calculate the standard error
    standard_error = population_std / math.sqrt(sample_size)

    # Calculate the Z statistic
    z_stat = (sample_mean - population_mean) / standard_error

    # Calculate the two-tailed p-value
    p_value = 2 * (1 - norm.cdf(abs(z_stat)))

    # Interpret the result
    if p_value < alpha:
        conclusion = f"Reject the null hypothesis (p = {p_value:.4f} < {alpha}). Significant difference found."
    else:
        conclusion = f"Fail to reject the null hypothesis (p = {p_value:.4f} >= {alpha}). No significant difference found."

    return z_stat, p_value, conclusion

# Example values
sample_mean = 103
population_mean = 100
population_std = 15
sample_size = 30

z_stat, p_val, conclusion = one_sample_z_test(sample_mean, population_mean, population_std, sample_size)

print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_val:.4f}")
print(conclusion)


In [3]:
#4
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import math

def two_tailed_z_test(sample_mean, population_mean, population_std, sample_size, alpha=0.05):
    # Calculate standard error
    standard_error = population_std / math.sqrt(sample_size)

    # Calculate Z-score
    z_score = (sample_mean - population_mean) / standard_error

    # Calculate two-tailed p-value
    p_value = 2 * (1 - norm.cdf(abs(z_score)))

    # Critical z-value for two-tailed test
    z_critical = norm.ppf(1 - alpha / 2)

    # Determine if we reject null hypothesis
    reject_null = abs(z_score) > z_critical

    return z_score, p_value, z_critical, reject_null

# Example data
sample_mean = 105
population_mean = 100
population_std = 15
sample_size = 40
alpha = 0.05

z_score, p_value, z_critical, reject_null = two_tailed_z_test(sample_mean, population_mean, population_std, sample_size, alpha)

print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Critical Z-value: ±{z_critical:.4f}")
print("Reject Null Hypothesis:" if reject_null else "Fail to Reject Null Hypothesis")

# Plotting
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x)

plt.figure(figsize=(10,6))
plt.plot(x, y, label='Standard Normal Distribution')

# Shade rejection regions
plt.fill_between(x, 0, y, where=(x <= -z_critical), color='red', alpha=0.3, label='Rejection Region')
plt.fill_between(x, 0, y, where=(x >= z_critical), color='red', alpha=0.3)

# Mark Z-score on the plot
plt.axvline(z_score, color='blue', linestyle='--', label=f'Z-score = {z_score:.2f}')

plt.title('Two-tailed Z-test Decision Regions')
plt.xlabel('Z value')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

#5
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import math

def plot_type1_type2_errors(pop_mean, pop_std, sample_size, alpha, true_mean):
    """
    Visualize Type 1 and Type 2 errors in a hypothesis test.

    Parameters:
    - pop_mean: Mean under null hypothesis (H0)
    - pop_std: Population standard deviation (assumed known)
    - sample_size: Number of samples
    - alpha: Significance level (Type 1 error rate)
    - true_mean: Actual mean under alternative hypothesis (H1)
    """

    se = pop_std / math.sqrt(sample_size)  # Standard error
    z_critical = norm.ppf(1 - alpha)  # Critical value for one-tailed test

    # Critical value cutoff in sample mean scale
    cutoff = pop_mean + z_critical * se

    # Define x range for plotting
    x = np.linspace(pop_mean - 4*se, true_mean + 4*se, 1000)

    # Null distribution (H0)
    h0_pdf = norm.pdf(x, pop_mean, se)

    # Alternative distribution (H1)
    h1_pdf = norm.pdf(x, true_mean, se)

    # Calculate Type 2 error (β): Probability of not rejecting H0 when H1 is true
    beta = norm.cdf(cutoff, true_mean, se)

    # Plot distributions
    plt.figure(figsize=(12, 6))
    plt.plot(x, h0_pdf, label='Null Hypothesis (H0)', color='blue')
    plt.plot(x, h1_pdf, label='Alternative Hypothesis (H1)', color='green')

    # Shade Type 1 error region (reject H0 when H0 true)
    x_type1 = np.linspace(cutoff, pop_mean + 4*se, 1000)
    plt.fill_between(x_type1, 0, norm.pdf(x_type1, pop_mean, se), color='red', alpha=0.3, label='Type 1 Error (α)')

    # Shade Type 2 error region (fail to reject H0 when H1 true)
    x_type2 = np.linspace(pop_mean - 4*se, cutoff, 1000)
    plt.fill_between(x_type2, 0, norm.pdf(x_type2, true_mean, se), color='orange', alpha=0.3, label='Type 2 Error (β)')

    plt.axvline(cutoff, color='black', linestyle='--', label=f'Critical value = {cutoff:.2f}')
    plt.title('Type 1 and Type 2 Errors in Hypothesis Testing')
    plt.xlabel('Sample Mean')
    plt.ylabel('Probability Density')
    plt.legend()
    plt.grid(True)
    plt.show()

    print(f"Significance level (Type 1 error, α): {alpha}")
    print(f"Type 2 error (β) given true mean {true_mean}: {beta:.4f}")
    print(f"Power of the test (1 - β): {1 - beta:.4f}")

# Example usage
plot_type1_type2_errors(pop_mean=100, pop_std=15, sample_size=30, alpha=0.05, true_mean=110)

 #6
 import numpy as np
from scipy.stats import ttest_ind

# Example sample data (two independent groups)
group1 = np.array([23, 21, 19, 24, 22, 20, 25])
group2 = np.array([30, 29, 28, 35, 32, 31, 33])

# Perform independent two-sample t-test
t_stat, p_value = ttest_ind(group1, group2)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpret the result
alpha = 0.05
if p_value < alpha:
    print(f"Reject the null hypothesis at α={alpha}. The means are significantly different.")
else:
    print(f"Fail to reject the null hypothesis at α={alpha}. No significant difference between means.")
#7
import numpy as np
from scipy.stats import ttest_rel
import matplotlib.pyplot as plt

# Example paired sample data (e.g., before and after treatment)
before = np.array([85, 88, 90, 86, 87, 89, 91])
after = np.array([88, 90, 92, 89, 90, 91, 93])

# Perform paired t-test
t_stat, p_value = ttest_rel(before, after)

print(f"Paired T-test statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print(f"Reject the null hypothesis at α={alpha}. Significant difference between paired samples.")
else:
    print(f"Fail to reject the null hypothesis at α={alpha}. No significant difference between paired samples.")

# Visualization
plt.figure(figsize=(8,5))

# Plot paired points
plt.plot(range(len(before)), before, 'o-', label='Before')
plt.plot(range(len(after)), after, 's-', label='After')

# Connect pairs with lines
for i in range(len(before)):
    plt.plot([i, i], [before[i], after[i]], 'gray', linestyle='--', alpha=0.7)

plt.xticks(range(len(before)), [f'Subject {i+1}' for i in range(len(before))])
plt.ylabel('Measurement')
plt.title('Paired Sample Comparison: Before vs After')
plt.legend()
plt.grid(True)
plt.show()
#8
import numpy as np
import math
from scipy.stats import norm, ttest_1samp

# Set seed for reproducibility
np.random.seed(123)

# Population parameters (true)
population_mean = 50
population_std = 10

# Sample parameters
sample_size = 30

# Simulate sample data with a true mean slightly different from population_mean
true_sample_mean = 52
sample_data = np.random.normal(loc=true_sample_mean, scale=population_std, size=sample_size)

# Calculate sample mean and sample std dev
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)

print(f"Sample mean: {sample_mean:.3f}")
print(f"Sample standard deviation: {sample_std:.3f}")

# 1. One-sample Z-test (assumes population std dev known)
def one_sample_z_test(sample_mean, population_mean, population_std, sample_size):
    se = population_std / math.sqrt(sample_size)
    z_stat = (sample_mean - population_mean) / se
    p_value = 2 * (1 - norm.cdf(abs(z_stat)))  # two-tailed
    return z_stat, p_value

z_stat, z_p_value = one_sample_z_test(sample_mean, population_mean, population_std, sample_size)

print(f"\nZ-test results:")
print(f"Z-statistic = {z_stat:.4f}")
print(f"P-value = {z_p_value:.4f}")

# 2. One-sample T-test (population std dev unknown)
t_stat, t_p_value = ttest_1samp(sample_data, population_mean)

print(f"\nT-test results:")
print(f"T-statistic = {t_stat:.4f}")
print(f"P-value = {t_p_value:.4f}")

# Interpretation
alpha = 0.05
print("\nInterpretation:")
for test_name, p_val in [('Z-test', z_p_value), ('T-test', t_p_value)]:
    if p_val < alpha:
        print(f"{test_name}: Reject the null hypothesis (significant difference).")
    else:
        print(f"{test_name}: Fail to reject the null hypothesis (no significant difference).")
#9
import math
from scipy.stats import norm, t

def confidence_interval(sample_mean, sample_std, sample_size, confidence=0.95, population_std_known=False, population_std=None):
    """
    Calculate the confidence interval for a sample mean.

    Parameters:
    - sample_mean: Mean of the sample
    - sample_std: Standard deviation of the sample
    - sample_size: Number of observations in the sample
    - confidence: Confidence level (default 0.95)
    - population_std_known: Boolean, True if population std dev known
    - population_std: Population std dev if known

    Returns:
    - (lower_bound, upper_bound): Tuple representing confidence interval
    """
    alpha = 1 - confidence
    se = (population_std / math.sqrt(sample_size)) if population_std_known else (sample_std / math.sqrt(sample_size))

    if population_std_known:
        # Use Z-distribution
        z_critical = norm.ppf(1 - alpha/2)
        margin_of_error = z_critical * se
    else:
        # Use T-distribution
        t_critical = t.ppf(1 - alpha/2, df=sample_size - 1)
        margin_of_error = t_critical * se

    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error

    return lower_bound, upper_bound

# Example usage
sample_mean = 100
sample_std = 15
sample_size = 30
confidence_level = 0.95

ci_lower, ci_upper = confidence_interval(sample_mean, sample_std, sample_size, confidence=confidence_level)
print(f"{int(confidence_level*100)}% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")
#10
import math
from scipy.stats import norm, t

def margin_of_error(sample_std, sample_size, confidence=0.95, population_std_known=False, population_std=None):
    """
    Calculate the margin of error for a confidence interval.

    Parameters:
    - sample_std: Sample standard deviation
    - sample_size: Number of samples
    - confidence: Confidence level (default 0.95)
    - population_std_known: Boolean, True if population std dev known
    - population_std: Population std dev if known

    Returns:
    - margin_of_error: The margin of error value
    """
    alpha = 1 - confidence
    se = (population_std / math.sqrt(sample_size)) if population_std_known else (sample_std / math.sqrt(sample_size))

    if population_std_known:
        # Use Z-distribution
        z_critical = norm.ppf(1 - alpha/2)
        moe = z_critical * se
    else:
        # Use T-distribution
        t_critical = t.ppf(1 - alpha/2, df=sample_size - 1)
        moe = t_critical * se

    return moe

# Example usage:
sample_std = 12.5
sample_size = 40
confidence_level = 0.95

moe = margin_of_error(sample_std, sample_size, confidence=confidence_level)
print(f"Margin of Error at {int(confidence_level*100)}% confidence: ±{moe:.3f}")
#11
def bayesian_inference(prior_disease, sensitivity, false_positive_rate):
    """
    Calculate posterior probability P(Disease | Positive Test) using Bayes' Theorem.

    Parameters:
    - prior_disease: P(Disease)
    - sensitivity: P(Positive | Disease)
    - false_positive_rate: P(Positive | No Disease)

    Returns:
    - posterior: P(Disease | Positive)
    """
    prior_no_disease = 1 - prior_disease

    # Total probability of testing positive
    p_positive = sensitivity * prior_disease + false_positive_rate * prior_no_disease

    # Posterior probability using Bayes' Theorem
    posterior = (sensitivity * prior_disease) / p_positive

    return posterior

# Given data
prior_disease = 0.01
sensitivity = 0.99
false_positive_rate = 0.05

posterior_prob = bayesian_inference(prior_disease, sensitivity, false_positive_rate)
print(f"Posterior probability of disease given positive test: {posterior_prob:.4f} ({posterior_prob*100:.2f}%)")
#12
import numpy as np
from scipy.stats import chi2_contingency

# Example contingency table (observed frequencies)
# Rows: Gender (Male, Female)
# Columns: Preference (Like, Dislike)
observed = np.array([[30, 10],  # Male
                     [20, 40]]) # Female

# Perform Chi-square test of independence
chi2_stat, p_value, dof, expected = chi2_contingency(observed)

print(f"Chi-square Statistic: {chi2_stat:.4f}")
print(f"Degrees of Freedom: {dof}")
print(f"P-value: {p_value:.4f}")
print("\nExpected Frequencies:")
print(expected)

# Interpretation
alpha = 0.05
if p_value < alpha:
    print(f"\nReject the null hypothesis at α={alpha}: Variables are dependent.")
else:
    print(f"\nFail to reject the null hypothesis at α={alpha}: Variables are independent.")
#13
import numpy as np

def calculate_expected_frequencies(observed):
    """
    Calculate expected frequencies for a Chi-square test of independence.

    Parameters:
    - observed: 2D numpy array of observed frequencies

    Returns:
    - expected: 2D numpy array of expected frequencies
    """
    # Row sums and column sums
    row_sums = observed.sum(axis=1).reshape(-1, 1)  # column vector
    col_sums = observed.sum(axis=0).reshape(1, -1)  # row vector
    total = observed.sum()

    # Expected frequencies formula: (row_sum * col_sum) / total
    expected = (row_sums @ col_sums) / total

    return expected

# Example observed data (contingency table)
observed = np.array([[30, 10],
                     [20, 40]])

expected = calculate_expected_frequencies(observed)
print("Observed Frequencies:")
print(observed)
print("\nExpected Frequencies:")
print(expected)
#14
import numpy as np
from scipy.stats import chisquare

# Observed counts for categories
observed = np.array([50, 30, 20])

# Expected counts (must sum to same total as observed)
expected = np.array([40, 40, 20])

# Perform goodness-of-fit Chi-square test
chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

print(f"Chi-square Statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print("Reject null hypothesis: observed distribution differs significantly from expected.")
else:
    print("Fail to reject null hypothesis: observed data fits expected distribution.")
#15
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

df = 5  # degrees of freedom
x = np.linspace(0, 20, 500)
y = chi2.pdf(x, df)

plt.plot(x, y, label=f'Chi-square PDF (df={df})')
plt.title('Chi-square Distribution')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()
#16
from scipy.stats import f

np.random.seed(0)
sample1 = np.random.normal(0, 10, 30)
sample2 = np.random.normal(0, 15, 30)

var1 = np.var(sample1, ddof=1)
var2 = np.var(sample2, ddof=1)

f_stat = var1 / var2 if var1 > var2 else var2 / var1
df1 = len(sample1) - 1
df2 = len(sample2) - 1

p_value = 1 - f.cdf(f_stat, df1, df2)
p_value *= 2  # two-tailed test

print(f"F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject null hypothesis: variances are significantly different.")
else:
    print("Fail to reject null hypothesis: no significant difference in variances.")
#17
from scipy.stats import f_oneway

group1 = np.random.normal(20, 5, 30)
group2 = np.random.normal(22, 5, 30)
group3 = np.random.normal(19, 5, 30)

f_stat, p_value = f_oneway(group1, group2, group3)

print(f"ANOVA F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject null hypothesis: at least one group mean is different.")
else:
    print("Fail to reject null hypothesis: group means are similar.")
#18
import matplotlib.pyplot as plt

groups = [group1, group2, group3]
plt.boxplot(groups, labels=['Group1', 'Group2', 'Group3'])
plt.title('One-way ANOVA Group Comparison')
plt.ylabel('Values')
plt.show()
#19
from scipy.stats import shapiro, levene

def check_anova_assumptions(groups):
    # Normality
    normality = [shapiro(g).pvalue > 0.05 for g in groups]
    # Equal variances
    equal_var = levene(*groups).pvalue > 0.05
    return normality, equal_var

normality_results, equal_variance = check_anova_assumptions(groups)
print("Normality per group:", normality_results)
print("Equal variances assumption:", equal_variance)
#20
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Simulated dataset with two factors and a response
data = pd.DataFrame({
    'FactorA': np.repeat(['A1', 'A2'], 30),
    'FactorB': np.tile(np.repeat(['B1', 'B2'], 15), 2),
    'Response': np.random.randn(60) + np.repeat([0, 1], 30)
})

model = ols('Response ~ C(FactorA) * C(FactorB)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

# Visualization: interaction plot
from statsmodels.graphics.factorplots import interaction_plot
interaction_plot(data['FactorA'], data['FactorB'], data['Response'])
plt.show()
#21
from scipy.stats import f
import matplotlib.pyplot as plt
import numpy as np

df1, df2 = 5, 10
x = np.linspace(0, 5, 500)
y = f.pdf(x, df1, df2)

plt.plot(x, y)
plt.title(f'F-Distribution PDF (df1={df1}, df2={df2})')
plt.xlabel('F-value')
plt.ylabel('Density')
plt.grid(True)
plt.show()

#22
plt.boxplot(groups, labels=['Group1', 'Group2', 'Group3'])
plt.title('Group Means Comparison')
plt.ylabel('Values')
plt.show()
#23
from scipy.stats import ttest_1samp

np.random.seed(0)
sample = np.random.normal(loc=50, scale=10, size=40)
t_stat, p_val = ttest_1samp(sample, popmean=50)
print(f"T-statistic: {t_stat:.4f}, P-value: {p_val:.4f}")
#24
from scipy.stats import chi2

sample_var = np.var(sample, ddof=1)
n = len(sample)
pop_var = 100  # hypothesized variance

chi_stat = (n - 1) * sample_var / pop_var
p_val_left = chi2.cdf(chi_stat, df=n - 1)
p_val_right = 1 - p_val_left

print(f"Chi-square stat: {chi_stat:.4f}")
print(f"P-value (two-tailed): {2 * min(p_val_left, p_val_right):.4f}")
#25from statsmodels.stats.proportion import proportions_ztest

count = np.array([45, 30])   # successes in groups
nobs = np.array([100, 80])   # total observations in groups

stat, pval = proportions_ztest(count, nobs)
print(f"Z-statistic: {stat:.4f}, P-value: {pval:.4f}")
#26
import matplotlib.pyplot as plt
from scipy.stats import f

# Use previous samples sample1, sample2
f_stat = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
df1, df2 = len(sample1) - 1, len(sample2) - 1

x = np.linspace(0, 5, 500)
y = f.pdf(x, df1, df2)

plt.plot(x, y, label='F-distribution')
plt.axvline(f_stat, color='r', linestyle='--', label='F statistic')
plt.legend()
plt.title('F-distribution with F statistic')
plt.show()
#27
# Simulate observed data from categories
np.random.seed(0)
observed = np.random.multinomial(100, [0.2, 0.5, 0.3])
expected = np.array([20, 50, 30])

chi2_stat, p_val = chisquare(f_obs=observed, f_exp=expected)
print(f"Chi-square stat: {chi2_stat:.4f}, P-value: {p_val:.4f}")



IndentationError: unexpected indent (<ipython-input-3-2578b8f3106c>, line 125)