**Question**
A company claims their energy drinks contain an average of 80mg of caffeine. A researcher tests 10 drinks and gets the following caffeine content (in mg): [78, 82, 85, 79, 81, 83, 80, 77, 84, 81]. Is there evidence to suggest the true average differs from 80mg?


In [2]:
import numpy as np
from scipy import stats

# Example 1: One-sample t-test (two-tailed)
print("Example 1: One-sample t-test (two-tailed)")

caffeine_content = [78, 82, 85, 79, 81, 83, 80, 77, 84, 81]
t_statistic, p_value = stats.ttest_1samp(caffeine_content, popmean=80)
print(f"T-statistic: {t_statistic:.4f}, p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")

Example 1: One-sample t-test (two-tailed)
T-statistic: 1.2247, p-value: 0.2518
Conclusion: Fail to reject the null hypothesis.


Question: A researcher wants to know if there's a difference in the average number of hours spent on social media between two age groups: 18-25 and 26-35. The data collected is as follows:
18-25 age group: [3.5, 4.2, 3.8, 4.5, 3.9, 4.1, 4.3]
26-35 age group: [2.8, 3.1, 3.5, 3.2, 2.9, 3.4, 3.0]
Is there a significant difference in social media usage between these age groups?

In [3]:
# Example 2: Independent two-sample t-test (two-tailed)
print("\nExample 2: Independent two-sample t-test (two-tailed)")


group_18_25 = [3.5, 4.2, 3.8, 4.5, 3.9, 4.1, 4.3]
group_26_35 = [2.8, 3.1, 3.5, 3.2, 2.9, 3.4, 3.0]
t_statistic, p_value = stats.ttest_ind(group_18_25, group_26_35)
print(f"T-statistic: {t_statistic:.4f}, p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")


Example 2: Independent two-sample t-test (two-tailed)
T-statistic: 5.7243, p-value: 0.0001
Conclusion: Reject the null hypothesis.


Question: A fitness instructor wants to know if her new training program significantly improves her clients' 5km run times. She records the run times (in minutes) of 6 clients before and after the 8-week program:
Before: [25.5, 23.8, 26.2, 24.9, 25.1, 24.5]
After:  [24.2, 22.9, 25.1, 23.7, 24.0, 23.2]
Is there a significant improvement in run times?

In [4]:
# Example 3: Paired t-test (two-tailed)
print("\nExample 3: Paired t-test (two-tailed)")

before = [25.5, 23.8, 26.2, 24.9, 25.1, 24.5]
after = [24.2, 22.9, 25.1, 23.7, 24.0, 23.2]
t_statistic, p_value = stats.ttest_rel(before, after)
print(f"T-statistic: {t_statistic:.4f}, p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")



Example 3: Paired t-test (two-tailed)
T-statistic: 18.5742, p-value: 0.0000
Conclusion: Reject the null hypothesis.


Question: A school district claims that their students score above the national average (500) on a standardized test. A sample of 8 students from this district scored: [510, 525, 495, 530, 515, 520, 505, 500]. Is there evidence to support the district's claim?


In [5]:
# Example 4: One-sample t-test (one-tailed, greater)

scores = [510, 525, 495, 530, 515, 520, 505, 500]
t_statistic, p_value_two_tailed = stats.ttest_1samp(scores, popmean=500)
p_value_one_tailed = p_value_two_tailed / 2  # Divide by 2 for one-tailed testprint("Question: A school district claims that their students score above the national average (500
print(f"T-statistic: {t_statistic:.4f}, one-tailed p-value: {p_value_one_tailed:.4f}")
print(f"Conclusion: {'Reject' if p_value_one_tailed < 0.05 else 'Fail to reject'} the null hypothesis.")


T-statistic: 2.8868, one-tailed p-value: 0.0117
Conclusion: Reject the null hypothesis.


**Question**
A company has developed a new typing course and wants to know if it reduces typing errors compared to their standard course. They measure the number of errors made in a 5-minute typing test for both groups:
Standard course: [12, 10, 14, 15, 11, 13, 9]
New course: [8, 11, 9, 10, 7, 12, 9]
Is there evidence that the new course results in fewer typing errors?

In [6]:
# Example 5: Independent two-sample t-test (one-tailed, less)

standard_course = [12, 10, 14, 15, 11, 13, 9]
new_course = [8, 11, 9, 10, 7, 12, 9]
t_statistic, p_value_two_tailed = stats.ttest_ind(standard_course, new_course)
p_value_one_tailed = p_value_two_tailed / 2  # Divide by 2 for one-tailed test
print(f"T-statistic: {t_statistic:.4f}, one-tailed p-value: {p_value_one_tailed:.4f}")
print(f"Conclusion: {'Reject' if p_value_one_tailed < 0.05 else 'Fail to reject'} the null hypothesis.")

T-statistic: 2.4648, one-tailed p-value: 0.0149
Conclusion: Reject the null hypothesis.


In [8]:
import numpy as np
from scipy import stats

# Function to perform a one-sample or two-sample Z-test
def z_test(sample, population_mean, population_std, alternative='two-sided'):
    n = len(sample)
    sample_mean = np.mean(sample)
    z_stat = (sample_mean - population_mean) / (population_std / np.sqrt(n))

    # Determine the p-value based on the type of test
    if alternative == 'two-sided':
        p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))
    elif alternative == 'less':
        p_value = stats.norm.cdf(z_stat)
    elif alternative == 'greater':
        p_value = 1 - stats.norm.cdf(z_stat)

    return z_stat, p_value

Question: A company claims their new energy drink increases reaction times to be faster than the population average of 250 milliseconds. They test 50 individuals and find a mean reaction time of 240 ms with a known population standard deviation of 40 ms. Is there evidence to support the company's claim?


In [9]:
# Example 1: One-sample z-test (one-tailed, greater)
print("Example 1: One-sample z-test (one-tailed, greater)")
print("Question: A company claims their new energy drink increases reaction times to be faster than the population average of 250 milliseconds. They test 50 individuals and find a mean reaction time of 240 ms with a known population standard deviation of 40 ms. Is there evidence to support the company's claim?")

# Corrected: alternative='greater' because we are testing for faster reaction times (mean < population mean)
sample = np.random.normal(240, 40, 50)  # Simulating 50 samples with mean 240 and std 40
population_mean = 250
population_std = 40

z_stat, p_value = z_test(sample, population_mean, population_std, alternative='greater')
print(f"Z-statistic: {z_stat:.4f}, one-tailed p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"Interpretation: {'There is' if p_value < 0.05 else 'There is not'} significant evidence to support the company's claim.")

Example 1: One-sample z-test (one-tailed, greater)
Question: A company claims their new energy drink increases reaction times to be faster than the population average of 250 milliseconds. They test 50 individuals and find a mean reaction time of 240 ms with a known population standard deviation of 40 ms. Is there evidence to support the company's claim?
Z-statistic: -3.0807, one-tailed p-value: 0.9990
Conclusion: Fail to reject the null hypothesis.
Interpretation: There is not significant evidence to support the company's claim.


Question: A factory claims their light bulbs last an average of 10,000 hours with a standard deviation of 500 hours. A consumer group tests 100 bulbs and finds they last an average of 9,800 hours. Is there evidence to suggest the true average differs from the claim?


In [10]:
# Example 2: One-sample z-test (two-tailed)
print("\nExample 2: One-sample z-test (two-tailed)")

sample_mean = 9800
n = 100
population_mean = 10000
population_std = 500

z_stat, p_value = z_test([sample_mean] * n, population_mean, population_std)
print(f"Z-statistic: {z_stat:.4f}, two-tailed p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"Interpretation: There {'is' if p_value < 0.05 else 'is not'} significant evidence to suggest the true average differs from the claim.")


Example 2: One-sample z-test (two-tailed)
Z-statistic: -4.0000, two-tailed p-value: 0.0001
Conclusion: Reject the null hypothesis.
Interpretation: There is significant evidence to suggest the true average differs from the claim.


Question: A diet program claims to help people lose more than 10 pounds on average over a month. A study of 75 participants showed an average weight loss of 9.5 pounds with a known population standard deviation of 3.5 pounds. Is there evidence against the program's claim?


In [11]:
# Example 3: One-sample z-test (one-tailed, less)
print("\nExample 3: One-sample z-test (one-tailed, less)")
print("Question: A diet program claims to help people lose more than 10 pounds on average over a month. A study of 75 participants showed an average weight loss of 9.5 pounds with a known population standard deviation of 3.5 pounds. Is there evidence against the program's claim?")

sample = np.random.normal(9.5, 3.5, 75)  # Simulating 75 samples with mean 9.5 and std 3.5
population_mean = 10
population_std = 3.5

z_stat, p_value = z_test(sample, population_mean, population_std, alternative='less')
print(f"Z-statistic: {z_stat:.4f}, one-tailed p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"Interpretation: There {'is' if p_value < 0.05 else 'is not'} significant evidence against the program's claim.")


Example 3: One-sample z-test (one-tailed, less)
Question: A diet program claims to help people lose more than 10 pounds on average over a month. A study of 75 participants showed an average weight loss of 9.5 pounds with a known population standard deviation of 3.5 pounds. Is there evidence against the program's claim?
Z-statistic: -2.0661, one-tailed p-value: 0.0194
Conclusion: Reject the null hypothesis.
Interpretation: There is significant evidence against the program's claim.


In [12]:
import numpy as np
from scipy import stats

def f_test(sample1, sample2, alternative='two-sided'):
    f = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
    df1 = len(sample1) - 1
    df2 = len(sample2) - 1

    if alternative == 'two-sided':
        p_value = 2 * min(stats.f.cdf(f, df1, df2), 1 - stats.f.cdf(f, df1, df2))
    elif alternative == 'less':
        p_value = stats.f.cdf(f, df1, df2)
    elif alternative == 'greater':
        p_value = 1 - stats.f.cdf(f, df1, df2)

    return f, p_value

Question: A company has two production lines for manufacturing light bulbs. They want to know if there's a difference in the variability of the lifespans between the two lines. They test 50 bulbs from each line with the following results:
Line A: mean = 1000 hours, standard deviation = 100 hours
Line B: mean = 990 hours, standard deviation = 120 hours
Is there a significant difference in the variability of lifespans between the two production lines?

In [13]:
# Example 1: Two-sample F-test (two-tailed)
print("Example 1: Two-sample F-test (two-tailed)")

np.random.seed(42)  # For reproducibility
line_a = np.random.normal(1000, 100, 50)
line_b = np.random.normal(990, 120, 50)

f_stat, p_value = f_test(line_a, line_b)
print(f"F-statistic: {f_stat:.4f}, two-tailed p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"Interpretation: There {'is' if p_value < 0.05 else 'is not'} significant evidence of a difference in variability between the two production lines.")

Example 1: Two-sample F-test (two-tailed)
F-statistic: 0.7919, two-tailed p-value: 0.4171
Conclusion: Fail to reject the null hypothesis.
Interpretation: There is not significant evidence of a difference in variability between the two production lines.


Question: A teacher believes that the variability in test scores is greater for the morning class compared to the afternoon class. The morning class (n=30) has a variance of 25, while the afternoon class (n=28) has a variance of 16. Is there evidence to support the teacher's belief?


In [14]:
# Example 2: Two-sample F-test (one-tailed, greater)
print("\nExample 2: Two-sample F-test (one-tailed, greater)")
print("Question: A teacher believes that the variability in test scores is greater for the morning class compared to the afternoon class. The morning class (n=30) has a variance of 25, while the afternoon class (n=28) has a variance of 16. Is there evidence to support the teacher's belief?")

np.random.seed(42)  # For reproducibility
morning_class = np.random.normal(0, np.sqrt(25), 30)
afternoon_class = np.random.normal(0, np.sqrt(16), 28)

f_stat, p_value = f_test(morning_class, afternoon_class, alternative='greater')
print(f"F-statistic: {f_stat:.4f}, one-tailed p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"Interpretation: There {'is' if p_value < 0.05 else 'is not'} significant evidence that the variability in test scores is greater for the morning class.")


Example 2: Two-sample F-test (one-tailed, greater)
Question: A teacher believes that the variability in test scores is greater for the morning class compared to the afternoon class. The morning class (n=30) has a variance of 25, while the afternoon class (n=28) has a variance of 16. Is there evidence to support the teacher's belief?
F-statistic: 1.4450, one-tailed p-value: 0.1697
Conclusion: Fail to reject the null hypothesis.
Interpretation: There is not significant evidence that the variability in test scores is greater for the morning class.


Question: A pharmaceutical company has developed a new process for manufacturing a drug. They believe this process will reduce the variability in the active ingredient concentration compared to the old process. They test 40 batches from each process with the following results:
Old process: variance = 0.25
New process: variance = 0.20
Is there evidence to support the company's belief that the new process reduces variability?

In [16]:
# Example 3: Two-sample F-test (one-tailed, less)
print("\nExample 3: Two-sample F-test (one-tailed, less)")

np.random.seed(42)  # For reproducibility
old_process = np.random.normal(0, np.sqrt(0.25), 40)
new_process = np.random.normal(0, np.sqrt(0.20), 40)

f_stat, p_value = f_test(old_process, new_process, alternative='less')
print(f"F-statistic: {f_stat:.4f}, one-tailed p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"Interpretation: There {'is' if p_value < 0.05 else 'is not'} significant evidence that the new process reduces variability in the active ingredient concentration.")


Example 3: Two-sample F-test (one-tailed, less)
F-statistic: 1.2192, one-tailed p-value: 0.7305
Conclusion: Fail to reject the null hypothesis.
Interpretation: There is not significant evidence that the new process reduces variability in the active ingredient concentration.


Question: A researcher wants to know if there's a relationship between gender and preference for different types of movies. They surveyed 200 people and got the following results:
             Action  Comedy  Drama
Male           30      25     20
Female         20      40     65
Is there a significant relationship between gender and movie preference?

In [18]:
import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example 1: Chi-square test of independence
print("Example 1: Chi-square test of independence")

# Create the contingency table
observed = pd.DataFrame({
    'Action': [30, 20],
    'Comedy': [25, 40],
    'Drama': [20, 65]
}, index=['Male', 'Female'])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"Interpretation: There {'is' if p_value < 0.05 else 'is not'} significant evidence of a relationship between gender and movie preference.")

Example 1: Chi-square test of independence
Chi-square statistic: 17.9041
p-value: 0.0001
Degrees of freedom: 2
Conclusion: Reject the null hypothesis.
Interpretation: There is significant evidence of a relationship between gender and movie preference.


In [None]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

# One-Way ANOVA Example
print("Simple One-Way ANOVA Example")
print("Question: Does the type of fertilizer affect plant growth?")

# Sample data: Plant growth (in cm) for three fertilizer types
fertilizer_A = [20, 22, 21, 23, 25]
fertilizer_B = [18, 19, 20, 21, 17]
fertilizer_C = [15, 16, 14, 15, 17]

# Perform one-way ANOVA using scipy.stats
f_statistic, p_value = stats.f_oneway(fertilizer_A, fertilizer_B, fertilizer_C)

print(f"F-statistic: {f_statistic:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"Interpretation: There {'is' if p_value < 0.05 else 'is not'} significant evidence that fertilizer type affects plant growth.\n")

Simple One-Way ANOVA Example
Question: Does the type of fertilizer affect plant growth?
F-statistic: 23.1467
p-value: 0.0001
Conclusion: Reject the null hypothesis.
Interpretation: There is significant evidence that fertilizer type affects plant growth.



In [None]:
# Two-Way ANOVA Example (Corrected)
print("Simple Two-Way ANOVA Example")
print("Question: Do fertilizer type and sunlight exposure affect plant growth?")

# Sample data: Plant growth (in cm) for combinations of fertilizer and sunlight
data = {
    'Fertilizer': ['A', 'A', 'B', 'B'] * 3,
    'Sunlight': ['Low', 'High'] * 6,
    'Growth': [20, 25, 18, 22, 22, 27, 19, 24, 21, 26, 17, 23]
}
df = pd.DataFrame(data)

# Perform two-way ANOVA with interaction effects using statsmodels
model = ols('Growth ~ C(Fertilizer) + C(Sunlight) + C(Fertilizer):C(Sunlight)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Print the ANOVA table
print("\nTwo-Way ANOVA Results (effects of fertilizer and sunlight on growth, including interaction):")
print(anova_table)

# Conclusion based on p-values
print("\nConclusion:")
print(f"1. Effect of Fertilizer: {'Reject' if anova_table.loc['C(Fertilizer)', 'PR(>F)'] < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"2. Effect of Sunlight: {'Reject' if anova_table.loc['C(Sunlight)', 'PR(>F)'] < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"3. Interaction Effect: {'Reject' if anova_table.loc['C(Fertilizer):C(Sunlight)', 'PR(>F)'] < 0.05 else 'Fail to reject'} the null hypothesis.")
print(f"   Interpretation: There {'is' if anova_table.loc['C(Fertilizer):C(Sunlight)', 'PR(>F)'] < 0.05 else 'is not'} significant evidence of an interaction between fertilizer type and sunlight exposure on plant growth.")


Simple Two-Way ANOVA Example
Question: Do fertilizer type and sunlight exposure affect plant growth?

Two-Way ANOVA Results (effects of fertilizer and sunlight on growth, including interaction):
                                 sum_sq   df             F    PR(>F)
C(Fertilizer)              2.700000e+01  1.0  2.700000e+01  0.000826
C(Sunlight)                7.500000e+01  1.0  7.500000e+01  0.000025
C(Fertilizer):C(Sunlight)  1.789728e-29  1.0  1.789728e-29  1.000000
Residual                   8.000000e+00  8.0           NaN       NaN

Conclusion:
1. Effect of Fertilizer: Reject the null hypothesis.
2. Effect of Sunlight: Reject the null hypothesis.
3. Interaction Effect: Fail to reject the null hypothesis.
   Interpretation: There is not significant evidence of an interaction between fertilizer type and sunlight exposure on plant growth.
