ASSIGNMENT
----------------------------------------------------------
1)What is hypothesis testing in statistics?
Hypothesis testing is a statistical method used to make decisions about a population parameter based on sample data.

2)What is the null hypothesis, and how does it differ from the alternative hypothesis?
The null hypothesis is a statement that there is no effect or difference, while the alternative hypothesis suggests there is an effect or difference.

3)What is the significance level in hypothesis testing, and why is it important?
The significance level is the probability of rejecting the null hypothesis when it is true, often set at 0.05.

4)What does a P-value represent in hypothesis testing?
A P-value represents the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.

5)How do you interpret the P-value in hypothesis testing?
A low P-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, leading to its rejection.

6)What are Type 1 and Type 2 errors in hypothesis testing?
Type 1 error occurs when the null hypothesis is wrongly rejected, while Type 2 error occurs when the null hypothesis is wrongly not rejected.

7)What is the difference between a one-tailed and a two-tailed test in hypothesis testing?
A one-tailed test tests for an effect in one direction, while a two-tailed test tests for an effect in both directions.

8)What is the Z-test, and when is it used in hypothesis testing?
The Z-test is used to determine if there is a significant difference between sample and population means, typically when the sample size is large.

9)How do you calculate the Z-score, and what does it represent in hypothesis testing?
The Z-score is calculated as (sample mean - population mean) / (population standard deviation / √n), representing the number of standard deviations a data point is from the mean.

10)What is the T-distribution, and when should it be used instead of the normal distribution?
The T-distribution is used when the sample size is small and the population standard deviation is unknown, providing more conservative estimates.

11)What is the difference between a Z-test and a T-test?
A Z-test is used for large samples with known population variance, while a T-test is used for small samples with unknown population variance.

12)What is the T-test, and how is it used in hypothesis testing?
The T-test compares the means of two groups to determine if they are statistically different from each other.

13)What is the relationship between Z-test and T-test in hypothesis testing?
Both tests compare means, but the Z-test is for large samples with known variance, and the T-test is for small samples with unknown variance.

14)What is a confidence interval, and how is it used to interpret statistical results?
A confidence interval provides a range of values within which the true population parameter is expected to lie, with a certain level of confidence.

15)What is the margin of error, and how does it affect the confidence interval?
The margin of error indicates the range of uncertainty around the sample estimate, affecting the width of the confidence interval.

16)How is Bayes' Theorem used in statistics, and what is its significance?
Bayes' Theorem updates the probability of a hypothesis as more evidence becomes available, significant for incorporating prior knowledge.

17)What is the Chi-square distribution, and when is it used?
The Chi-square distribution is used in tests of independence and goodness of fit, particularly for categorical data.

18)What is the Chi-square goodness of fit test, and how is it applied?
The Chi-square goodness of fit test determines if sample data matches a population with a specific distribution.

19)What is the F-distribution, and when is it used in hypothesis testing?
The F-distribution is used in ANOVA and regression analysis to compare variances and test overall model significance.

20)What is an ANOVA test, and what are its assumptions?
ANOVA tests differences among group means, assuming normality, homogeneity of variance, and independent samples.

21)What are the different types of ANOVA tests?
Types include one-way ANOVA, two-way ANOVA, and repeated measures ANOVA, each suited for different experimental designs.

22)What is the F-test, and how does it relate to hypothesis testing?
The F-test compares variances to determine if group means are significantly different, often used in ANOVA.



In [None]:
'''
1. Z-test for comparing a sample mean to a known population mean
python
Copy
import numpy as np
from scipy.stats import ztest

# Sample data
sample_data = np.array([102, 104, 103, 105, 106, 102, 101, 107, 108, 105])
population_mean = 100

# Perform Z-test
z_score, p_value = ztest(sample_data, value=population_mean)

print(f"Z-score: {z_score}, P-value: {p_value}")
if p_value < 0.05:
    print("Reject the null hypothesis: The sample mean is significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")
2. Simulate random data and calculate P-value
python
Copy
import numpy as np
from scipy.stats import norm

# Simulate random data
np.random.seed(42)
sample_data = np.random.normal(loc=105, scale=10, size=30)
population_mean = 100

# Calculate P-value
z_score = (np.mean(sample_data) - population_mean) / (np.std(sample_data) / np.sqrt(len(sample_data)))
p_value = 2 * (1 - norm.cdf(abs(z_score)))

print(f"P-value: {p_value}")

3. One-sample Z-test
python
Copy
from scipy.stats import ztest

# Sample data
sample_data = [102, 104, 103, 105, 106, 102, 101, 107, 108, 105]
population_mean = 100

# Perform one-sample Z-test
z_score, p_value = ztest(sample_data, value=population_mean)

print(f"Z-score: {z_score}, P-value: {p_value}")

4. Two-tailed Z-test with visualization
python
Copy
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Sample data
sample_data = np.array([102, 104, 103, 105, 106, 102, 101, 107, 108, 105])
population_mean = 100
sample_mean = np.mean(sample_data)
std_error = np.std(sample_data) / np.sqrt(len(sample_data))

# Z-score calculation
z_score = (sample_mean - population_mean) / std_error

# Visualization
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x)
plt.plot(x, y, label="Standard Normal Distribution")
plt.axvline(z_score, color='red', linestyle='--', label="Z-score")
plt.axvline(-z_score, color='red', linestyle='--')
plt.fill_between(x, y, where=(x >= z_score) | (x <= -z_score), color='red', alpha=0.5, label="Rejection Region")
plt.legend()
plt.title("Two-tailed Z-test")
plt.show()

5. Calculate and visualize Type 1 and Type 2 errors
python
Copy
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Parameters
population_mean = 100
sample_mean = 105
std_dev = 10
sample_size = 30
alpha = 0.05

# Critical value for Type 1 error
critical_value = norm.ppf(1 - alpha, loc=population_mean, scale=std_dev/np.sqrt(sample_size))

# Type 2 error calculation
beta = norm.cdf(critical_value, loc=sample_mean, scale=std_dev/np.sqrt(sample_size))

# Visualization
x = np.linspace(90, 120, 1000)
y_null = norm.pdf(x, loc=population_mean, scale=std_dev/np.sqrt(sample_size))
y_alt = norm.pdf(x, loc=sample_mean, scale=std_dev/np.sqrt(sample_size))

plt.plot(x, y_null, label="Null Hypothesis")
plt.plot(x, y_alt, label="Alternative Hypothesis")
plt.axvline(critical_value, color='red', linestyle='--', label="Critical Value")
plt.fill_between(x, y_null, where=(x >= critical_value), color='blue', alpha=0.5, label="Type 1 Error")
plt.fill_between(x, y_alt, where=(x <= critical_value), color='green', alpha=0.5, label="Type 2 Error")
plt.legend()
plt.title("Type 1 and Type 2 Errors")
plt.show()
6. Independent T-test
python
Copy
from scipy.stats import ttest_ind

# Sample data
group1 = [23, 25, 28, 30, 27]
group2 = [20, 22, 25, 24, 23]

# Perform independent T-test
t_stat, p_value = ttest_ind(group1, group2)

print(f"T-statistic: {t_stat}, P-value: {p_value}")
7. Paired sample T-test with visualization
python
Copy
import matplotlib.pyplot as plt
from scipy.stats import ttest_rel

# Sample data (before and after)
before = [23, 25, 28, 30, 27]
after = [20, 22, 25, 24, 23]

# Perform paired T-test
t_stat, p_value = ttest_rel(before, after)

# Visualization
plt.plot(before, label="Before")
plt.plot(after, label="After")
plt.legend()
plt.title("Paired Sample T-test")
plt.show()

print(f"T-statistic: {t_stat}, P-value: {p_value}")
8. Simulate data and compare Z-test and T-test
python
Copy
import numpy as np
from scipy.stats import ztest, ttest_1samp

# Simulate data
np.random.seed(42)
sample_data = np.random.normal(loc=105, scale=10, size=30)
population_mean = 100

# Z-test
z_score, p_value_z = ztest(sample_data, value=population_mean)

# T-test
t_stat, p_value_t = ttest_1samp(sample_data, popmean=population_mean)

print(f"Z-test: Z-score = {z_score}, P-value = {p_value_z}")
print(f"T-test: T-statistic = {t_stat}, P-value = {p_value_t}")
9. Confidence interval for a sample mean
python
Copy
import numpy as np
from scipy.stats import t

# Sample data
sample_data = [102, 104, 103, 105, 106, 102, 101, 107, 108, 105]
confidence_level = 0.95
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)
n = len(sample_data)

# Calculate confidence interval
t_critical = t.ppf((1 + confidence_level) / 2, df=n-1)
margin_of_error = t_critical * (sample_std / np.sqrt(n))
ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error

print(f"Confidence Interval: ({ci_lower}, {ci_upper})")

10. Calculate the margin of error for a given confidence level
python
Copy
import numpy as np
from scipy.stats import t

# Sample data
sample_data = [102, 104, 103, 105, 106, 102, 101, 107, 108, 105]
confidence_level = 0.95
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)
n = len(sample_data)

# Calculate margin of error
t_critical = t.ppf((1 + confidence_level) / 2, df=n-1)
margin_of_error = t_critical * (sample_std / np.sqrt(n))

print(f"Margin of Error: {margin_of_error}")
11. Bayesian inference using Bayes’ Theorem
python
Copy
# Prior probabilities
P_A = 0.5  # Prior probability of event A
P_B_given_A = 0.7  # Likelihood of event B given A
P_B_given_not_A = 0.2  # Likelihood of event B given not A

# Calculate posterior probability using Bayes' Theorem
P_B = P_B_given_A * P_A + P_B_given_not_A * (1 - P_A)
P_A_given_B = (P_B_given_A * P_A) / P_B

print(f"Posterior Probability P(A|B): {P_A_given_B}")
12. Chi-square test for independence
python
Copy
import numpy as np
from scipy.stats import chi2_contingency

# Contingency table
data = np.array([[10, 20, 30], [6, 9, 17]])

# Perform Chi-square test
chi2, p_value, dof, expected = chi2_contingency(data)

print(f"Chi-square statistic: {chi2}, P-value: {p_value}")
13. Calculate expected frequencies for a Chi-square test
python
Copy
import numpy as np
from scipy.stats import chi2_contingency

# Observed data
observed = np.array([[10, 20, 30], [6, 9, 17]])

# Calculate expected frequencies
chi2, p_value, dof, expected = chi2_contingency(observed)

print("Expected Frequencies:")
print(expected)
14. Goodness-of-fit test
python
Copy
from scipy.stats import chisquare

# Observed and expected frequencies
observed = [10, 20, 30, 40]
expected = [15, 15, 30, 40]

# Perform Chi-square goodness-of-fit test
chi2, p_value = chisquare(observed, f_exp=expected)

print(f"Chi-square statistic: {chi2}, P-value: {p_value}")
15. Simulate and visualize the Chi-square distribution
python
Copy
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Degrees of freedom
df = 5

# Generate Chi-square distribution data
x = np.linspace(0, 20, 1000)
y = chi2.pdf(x, df)

# Plot
plt.plot(x, y, label=f"Chi-square (df={df})")
plt.title("Chi-square Distribution")
plt.legend()
plt.show()
16. F-test to compare variances
python
Copy
from scipy.stats import f_oneway

# Sample data
group1 = [23, 25, 28, 30, 27]
group2 = [20, 22, 25, 24, 23]
group3 = [19, 20, 22, 21, 20]

# Perform F-test (ANOVA)
f_stat, p_value = f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat}, P-value: {p_value}")
17. ANOVA test to compare means
python
Copy
from scipy.stats import f_oneway

# Sample data
group1 = [23, 25, 28, 30, 27]
group2 = [20, 22, 25, 24, 23]
group3 = [19, 20, 22, 21, 20]

# Perform ANOVA
f_stat, p_value = f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat}, P-value: {p_value}")
18. One-way ANOVA with visualization
python
Copy
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Group': ['A']*5 + ['B']*5 + ['C']*5,
    'Values': [23, 25, 28, 30, 27, 20, 22, 25, 24, 23, 19, 20, 22, 21, 20]
})

# Boxplot
sns.boxplot(x='Group', y='Values', data=data)
plt.title("One-way ANOVA")
plt.show()
19. Check ANOVA assumptions
python
Copy
from scipy.stats import shapiro, levene

# Sample data
group1 = [23, 25, 28, 30, 27]
group2 = [20, 22, 25, 24, 23]
group3 = [19, 20, 22, 21, 20]

# Normality test (Shapiro-Wilk)
print("Normality Test:")
for i, group in enumerate([group1, group2, group3]):
    stat, p = shapiro(group)
    print(f"Group {i+1}: Statistic={stat}, P-value={p}")

# Equal variance test (Levene's test)
stat, p = levene(group1, group2, group3)
print(f"Levene's Test: Statistic={stat}, P-value={p}")
20. Two-way ANOVA with visualization
python
Copy
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Factor1': ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'],
    'Factor2': ['X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y'],
    'Values': [23, 25, 28, 30, 27, 20, 22, 25]
})

# Perform two-way ANOVA
model = ols('Values ~ C(Factor1) + C(Factor2) + C(Factor1):C(Factor2)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
21. Visualize the F-distribution
python
Copy
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f

# Degrees of freedom
dfn, dfd = 5, 50

# Generate F-distribution data
x = np.linspace(0, 5, 1000)
y = f.pdf(x, dfn, dfd)

# Plot
plt.plot(x, y, label=f"F-distribution (dfn={dfn}, dfd={dfd})")
plt.title("F-distribution")
plt.legend()
plt.show()
22. One-way ANOVA with boxplots
python
Copy
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Group': ['A']*5 + ['B']*5 + ['C']*5,
    'Values': [23, 25, 28, 30, 27, 20, 22, 25, 24, 23, 19, 20, 22, 21, 20]
})

# Boxplot
sns.boxplot(x='Group', y='Values', data=data)
plt.title("One-way ANOVA with Boxplots")
plt.show()
23. Simulate data and perform hypothesis testing
python
Copy
import numpy as np
from scipy.stats import ttest_1samp

# Simulate data
np.random.seed(42)
sample_data = np.random.normal(loc=105, scale=10, size=30)
population_mean = 100

# Perform t-test
t_stat, p_value = ttest_1samp(sample_data, population_mean)

print(f"T-statistic: {t_stat}, P-value: {p_value}")
24. Hypothesis test for population variance
python
Copy
from scipy.stats import chi2

# Sample data
sample_data = [102, 104, 103, 105, 106, 102, 101, 107, 108, 105]
sample_variance = np.var(sample_data, ddof=1)
n = len(sample_data)
population_variance = 10  # Hypothesized population variance

# Chi-square test for variance
chi2_stat = (n - 1) * sample_variance / population_variance
p_value = 1 - chi2.cdf(chi2_stat, df=n-1)

print(f"Chi-square statistic: {chi2_stat}, P-value: {p_value}")
25. Z-test for comparing proportions
python
Copy
from statsmodels.stats.proportion import proportions_ztest

# Sample data
count = [40, 50]  # Number of successes
nobs = [100, 100]  # Total number of trials

# Perform Z-test for proportions
z_stat, p_value = proportions_ztest(count, nobs)

print(f"Z-statistic: {z_stat}, P-value: {p_value}")
26. F-test for comparing variances
python
Copy
from scipy.stats import f

# Sample data
sample1 = [23, 25, 28, 30, 27]
sample2 = [20, 22, 25, 24, 23]

# Calculate variances
var1 = np.var(sample1, ddof=1)
var2 = np.var(sample2, ddof=1)

# F-test
f_stat = var1 / var2
df1 = len(sample1) - 1
df2 = len(sample2) - 1
p_value = 2 * min(f.cdf(f_stat, df1, df2), 1 - f.cdf(f_stat, df1, df2))

print(f"F-statistic: {f_stat}, P-value: {p_value}")
27. Chi-square goodness-of-fit test with simulated data
python
Copy
from scipy.stats import chisquare

# Simulated observed and expected frequencies
observed = [10, 20, 30, 40]
expected = [15, 15, 30, 40]

# Perform Chi-square goodness-of-fit test
chi2, p_value = chisquare(observed, f_exp=expected)

print(f"Chi-square statistic: {chi2}, P-value: {p_value}")




















'''