Q1: Difference between a t-test and a z-test

In [2]:
#T test
# when to use : Population Standard Deviation is unkown
# sample size n < 30
# distribution : Student's t distribution
#example :Comparing means of two small groups (n=25)

#Z test
#when to use : Population Standard Deviation known and large n
# sample size n > 30
# distribution : Standard Normal Distribution
#exaxmple:  Testing if a sample mean differs from a population mean with known σ and n=100

Q2: One-tailed vs Two-tailed tests

In [3]:
# One-tailed test: Tests for effect in one direction (e.g., greater than or less than)

# Two-tailed test: Tests for any difference (e.g., not equal)

# Example:

# One-tailed: Testing if new teaching method scores > traditional method.

# Two-tailed: Testing if a new drug has any effect (increase or decrease)

Q3: Type 1 and Type 2 Errors

In [4]:
#Type I
# Rejecting a true null hypothesis	
# example : Saying a drug works when it actually doesn't

#Type II	
# Failing to reject a false null hypothesis	
# Saying a drug doesn’t work when it actually does

Q4: Bayes’ Theorem + Example

In [5]:
#the formula for bayes theorem is : P(A|B) = P(B|A) * P(A) / P(B)
# Example:

# 1% of people have a disease.
# Test is 99% accurate.
# What’s the chance someone who tests positive has the disease?

P_disease = 0.01
P_no_disease = 0.99
P_pos_given_disease = 0.99
P_pos_given_no_disease = 0.01

P_pos = (P_pos_given_disease * P_disease) + (P_pos_given_no_disease * P_no_disease)

P_disease_given_pos = (P_pos_given_disease * P_disease) / P_pos
print(f"Probability of disease given positive test: {P_disease_given_pos:.4f}")


Probability of disease given positive test: 0.5000


Q5: What is a confidence interval?
It’s a range that estimates a population parameter with a certain level of confidence.

In [7]:
#the formula for Confidece interval = mean +/- (Z * (standard deviation / sqrt(n)))
# Example:

# Mean = 100, σ = 15, n = 36, Confidence = 95%
import scipy.stats as stats
import numpy as np

mean = 100
std_dev = 15
n = 36
z = stats.norm.ppf(0.975)

margin = z * (std_dev / np.sqrt(n))
CI = (mean - margin, mean + margin)
print(f"95% Confidence Interval: {CI}")


95% Confidence Interval: (95.10009003864987, 104.89990996135013)


Q6: Bayes Theorem Sample

In [8]:
# 30% of customers are premium.

# 90% of premium customers renew.

# 40% of standard customers renew.

# What’s the chance a customer who renewed is premium?
P_premium = 0.3
P_standard = 0.7
P_renew_given_premium = 0.9
P_renew_given_standard = 0.4

P_renew = (P_renew_given_premium * P_premium) + (P_renew_given_standard * P_standard)
P_premium_given_renew = (P_renew_given_premium * P_premium) / P_renew
print(f"Probability customer is premium given renewal: {P_premium_given_renew:.4f}")


Probability customer is premium given renewal: 0.4909


Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

In [9]:
mean = 50
std = 5
n = 1  # Assuming single value

z = stats.norm.ppf(0.975)
margin = z * std
CI = (mean - margin, mean + margin)
print(f"95% Confidence Interval: {CI}")


95% Confidence Interval: (40.200180077299734, 59.799819922700266)


Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error

In [10]:
# Margin of Error (MoE): Half the width of the confidence interval.
# MoE = z * (σ / sqrt(n))

# Larger sample size ⇒ Smaller MoE

# Example:

# Surveying 1000 vs 100 people → 1000 gives more precise results.

Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results

In [11]:
#the formula for z score is : z = X - mean / standard deviation 
# 75 - 70 / 5 = 1.0
#Interpretation: The value is 1 SD above the mean.

Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

In [12]:
sample_mean = 6
mu = 0
std = 2.5
n = 50

t_stat = (sample_mean - mu) / (std / np.sqrt(n))
p_val = stats.t.sf(np.abs(t_stat), df=n-1) * 2
print("T-statistic:", t_stat)
print("P-value:", p_val)


T-statistic: 16.970562748477143
P-value: 3.7168840835270203e-22


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job

In [13]:
p_hat = 0.65
n = 500
z = stats.norm.ppf(0.975)

margin = z * np.sqrt((p_hat * (1 - p_hat)) / n)
CI = (p_hat - margin, p_hat + margin)
print(f"95% Confidence Interval: {CI}")


95% Confidence Interval: (0.6081925393809212, 0.6918074606190788)


Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01

In [14]:
mean1, std1, n1 = 85, 6, 30
mean2, std2, n2 = 82, 5, 30

se = np.sqrt((std1**2 / n1) + (std2**2 / n2))
t_stat = (mean1 - mean2) / se
p_val = stats.t.sf(np.abs(t_stat), df=min(n1-1, n2-1)) * 2

print("T-statistic:", t_stat)
print("P-value:", p_val)


T-statistic: 2.1038606199548298
P-value: 0.0441779623620329


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean

In [15]:
mean = 65
std = 8
n = 50
z = stats.norm.ppf(0.95)

margin = z * (std / np.sqrt(n))
CI = (mean - margin, mean + margin)
print(f"90% Confidence Interval: {CI}")


90% Confidence Interval: (63.13906055411732, 66.86093944588268)


Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test

In [16]:
mean = 0.25
mu = 0  # if testing against 0
std = 0.05
n = 30

t_stat = (mean - mu) / (std / np.sqrt(n))
p_val = stats.t.sf(np.abs(t_stat), df=n-1) * 2
print("T-statistic:", t_stat)
print("P-value:", p_val)


T-statistic: 27.386127875258307
P-value: 2.8325244885113353e-22
