You are a market researcher conducting a survey to estimate the average monthly spending on a 
particular brand of smartphones among a population of smartphone users in a city. You randomly 
select a sample of 100 smartphone users and ask them about their monthly spending on the brand 
[Assume M = 1.96 for 95 % confidence]. 
Based on your sample data, you calculate the sample mean spending as $250 with a standard 
deviation of $30. 
a. Calculate a 95% confidence interval for the true population mean monthly spending on this 
brand of smartphones.

In [2]:
from scipy import stats
import math

xbar = 250  # Sample mean spending
s = 30  # Sample standard deviation
n = 100  # Sample size

# Confidence level (95%)
confidence_level = 0.95
alpha = 1 - confidence_level

# Critical value for a 95% confidence level
Z = stats.norm.ppf(1 - alpha / 2)

# Calculate the margin of error
ME = Z * (s / math.sqrt(n))

# Calculate the confidence interval
ciLow = xbar - ME
ciHigh = xbar + ME

print(ciLow)
print(ciHigh)

244.12010804637984
255.87989195362016


A food manufacturer claims that the shelf life of its product is 100 days. To verify this claim, a 
quality control manager randomly selects a sample of 25 product items and records their shelf life 
in days. The sample data is as follows: 
Sample Mean Shelf Life (x) = 98 days Sample Standard Deviation (s) = 7 days 
The quality control manager wants to perform a one-sample hypothesis test to determine whether 
there is enough evidence to conclude that the actual shelf life of the product is different from the 
claimed 100 days. 
The hypotheses are as follows: 
 Null Hypothesis (H0): The actual shelf life is 100 days (μ = 100). 
 Alternative Hypothesis (Ha): The actual shelf life is different from 100 days (μ ≠ 100). 
Using a significance level (alpha) of 0.05, conduct a one-sample hypothesis test to determine 
whether there is evidence to support the claim that the actual shelf life is different from 100 days. 

In [5]:
xbar = 98  
s = 7  
n = 25 

# Population mean under the null hypothesis
pop_mean_H0 = 100  # Null hypothesis mean

# Significance level (alpha)
alpha = 0.05

# t-statistic
t_stats = (xbar - pop_mean_H0) / (s / (n ** 0.5))

# degrees of freedom
dof = n - 1

# critical t-value for a two-tailed test
criticalT = stats.t.ppf(1 - alpha / 2, dof)

pvalue = 2 * (1 - stats.t.cdf(abs(t_stats), dof))

print(f"Sample Mean: {xbar}")
print(f"Sample Standard Deviation: {s}")
print(f"Sample Size: {n}")
print(f"Null Hypothesis Mean: {pop_mean_H0}")
print(f"Significance Level (alpha): {alpha}")
print(f"Degrees of Freedom: {dof}")
print(f"Calculated t-statistic: {t_stats}")
print(f"Critical t-value: {criticalT}")
print(f"P-value: {pvalue}")

if abs(t_stats) > criticalT:
    print("Reject H0")
else:
    print("Fail to reject H0")

Sample Mean: 98
Sample Standard Deviation: 7
Sample Size: 25
Null Hypothesis Mean: 100
Significance Level (alpha): 0.05
Degrees of Freedom: 24
Calculated t-statistic: -1.4285714285714286
Critical t-value: 2.0638985616280205
P-value: 0.16601358583680903
Fail to reject H0


In [6]:
import scipy.stats as stats

# Class A
n1 = 30  
x1 = 85  
s1 = 10  

# Class B
n2 = 35  
x2 = 90  
s2 = 12  

alpha = 0.05

# degrees of freedom
dof = n1 + n2 - 2

# pooled standard deviation (assuming equal variances)
pooled_stdev = ((n1 - 1) * s1**2 + (n2 - 1) * s2**2) / dof
pooled_stdev = pooled_stdev**0.5


t_stats = (x1 - x2) / (pooled_stdev * ((1 / n1) + (1 / n2))**0.5)

# critical t-value for a two-tailed test
criticalT = stats.t.ppf(1 - alpha / 2, dof)

# p-value for the two-tailed test
pvalue = 2 * (1 - stats.t.cdf(abs(t_stats), dof))

print(f"Sample Size (Class A): {n1}")
print(f"Sample Mean Score (Class A): {x1}")
print(f"Sample Standard Deviation (Class A): {s1}")
print(f"Sample Size (Class B): {n2}")
print(f"Sample Mean Score (Class B): {x2}")
print(f"Sample Standard Deviation (Class B): {s2}")
print(f"Significance Level (alpha): {alpha}")
print(f"Degrees of Freedom: {dof}")
print(f"Pooled Standard Deviation: {pooled_stdev}")
print(f"Calculated t-statistic: {t_stats}")
print(f"Critical t-value: {criticalT}")
print(f"P-value: {pvalue}")

# Determine whether to reject the null hypothesis
if abs(t_stats) > criticalT:
    print("Reject H0")
else:
    print("Fail to reject H0")


Sample Size (Class A): 30
Sample Mean Score (Class A): 85
Sample Standard Deviation (Class A): 10
Sample Size (Class B): 35
Sample Mean Score (Class B): 90
Sample Standard Deviation (Class B): 12
Significance Level (alpha): 0.05
Degrees of Freedom: 63
Pooled Standard Deviation: 11.124119369461646
Calculated t-statistic: -1.8065181740930074
Critical t-value: 1.9983405417721956
P-value: 0.07561378878097358
Fail to reject H0


A teacher wants to determine whether there is a significant difference in the average exam scores 
among three different classes: Class A, Class B, and Class C. The teacher collects the following 
data: 
Class A (Sample 1): 
 Sample Size (n1) = 25 
 Sample Mean Score (x1) = 85 
 Sample Variance (s1^2) = 64 
Class B (Sample 2): 
 Sample Size (n2) = 30 
 Sample Mean Score (x2) = 88 
 Sample Variance (s2^2) = 72 
Class C (Sample 3): 
 Sample Size (n3) = 28 
 Sample Mean Score (x3) = 82 
 Sample Variance (s3^2) = 68 
You want to perform an Analysis of Variance (ANOVA) to determine whether there is enough 
evidence to conclude that there is a significant difference in the average exam scores among the 
three classes. 
The hypotheses are as follows: 
 Null Hypothesis (H0): The average exam scores in the three classes are the same (μ1 = μ2 
= μ3). 
 Alternative Hypothesis (Ha): At least one class has a different average exam score. 
Using a significance level (alpha) of 0.05, conduct an ANOVA to determine whether there is 
evidence to support the claim of a significant difference in average exam scores among the three 
classes. 
Calculate the test statistic (F), critical value (if applicable), and decide regarding the null 
hypothesis. Additionally, provide a conclusion based on your findings in the context of the exam 
scores for Classes A, B, and C.

In [2]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = pd.DataFrame({
    'score': [85] * 25 + [88] * 30 + [82] * 28,
    'class_name': ['A'] * 25 + ['B'] * 30 + ['C'] * 28
})
model = ols('score ~ C(class_name)', data=data).fit()
anova_table = sm.stats.anova_lm(model,typ=1)
anova_table


Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(class_name),2.0,521.5663,260.7831,1.7873159999999997e+29,0.0
Residual,80.0,1.1672620000000001e-25,1.459077e-27,,


In [6]:
model.summary()

0,1,2,3
Dep. Variable:,score,R-squared:,1.0
Model:,OLS,Adj. R-squared:,1.0
Method:,Least Squares,F-statistic:,1.787e+29
Date:,"Sun, 17 Sep 2023",Prob (F-statistic):,0.0
Time:,16:10:08,Log-Likelihood:,2448.1
No. Observations:,83,AIC:,-4890.0
Df Residuals:,80,BIC:,-4883.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,85.0000,7.64e-15,1.11e+16,0.000,85.000,85.000
C(class_name)[T.B],3.0000,1.03e-14,2.9e+14,0.000,3.000,3.000
C(class_name)[T.C],-3.0000,1.05e-14,-2.85e+14,0.000,-3.000,-3.000

0,1,2,3
Omnibus:,80.279,Durbin-Watson:,0.031
Prob(Omnibus):,0.0,Jarque-Bera (JB):,7.111
Skew:,-0.065,Prob(JB):,0.0286
Kurtosis:,1.572,Cond. No.,3.9


In [1]:
data = {
    'Class A': [85] * 25,
    'Class B': [88] * 30,
    'Class C': [82] * 28
}
f_statistic, p_value = stats.f_oneway(*data.values())
alpha = 0.05

print(f'F-stat: {f_statistic}')
print(f'p-value: {p_value}')

if p_value < alpha:
    print("Reject H0")
else:
    print("Fail to reject H0")


F-stat: inf
p-value: 0.0
Reject H0




A marketing research firm wants to determine whether there is an association between gender 
(male or female) and product preferences (Product A, Product B, or Product C). They conduct a 
survey of 300 individuals and record the following data: 
 Gender: 
 Male: 120 respondents 
 Female: 180 respondents 
 Product Preferences: 
 Product A: 80 males, 120 females 
 Product B: 30 males, 40 females 
 Product C: 10 males, 20 females 
The marketing research firm wants to test whether there is a significant association between gender 
and product preferences. They plan to use the chi-square test for independence. 
State the null hypothesis (H0) and the alternative hypothesis (Ha) for this test. 
Perform the chi-square test for independence and calculate the chi-square statistic (χ²) and the 
degrees of freedom. Use a significance level (alpha) of 0.05. 
Based on your calculations, make a decision regarding the null hypothesis. Provide a conclusion 
based on your findings in the context of the association between gender and product preferences.

In [24]:
import scipy.stats as stats

# Define the observed data as a contingency table
observed_data = [
    [80, 30, 10],  # Product A: 80 males, 30 females
    [120, 40, 20]  # Product B: 120 males, 40 females; Product C: 10 males, 20 females
]

# Perform the chi-square test for independence
chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed_data)

# Set significance level (alpha)
alpha = 0.05

# Print the null and alternative hypotheses
print("Null Hypothesis (H0): Gender and product preferences are independent.")
print("Alternative Hypothesis (Ha): Gender and product preferences are not independent (there is an association).")

# Print the results
print(f'Chi-square statistic: {chi2_stat:.2f}')
print(f'Degrees of freedom: {dof}')
print(f'p-value: {p_value:.4f}')

# Decide whether to reject the null hypothesis
if p_value < alpha:
    print("\nReject the null hypothesis")
    print("There is enough evidence to conclude that there is a significant association between gender and product preferences.")
else:
    print("\nFail to reject the null hypothesis")
    print("There is no significant association between gender and product preferences.")


Null Hypothesis (H0): Gender and product preferences are independent.
Alternative Hypothesis (Ha): Gender and product preferences are not independent (there is an association).
Chi-square statistic: 0.79
Degrees of freedom: 2
p-value: 0.6725

Fail to reject the null hypothesis
There is no significant association between gender and product preferences.
