Understanding When to Use t-Test vs Z-Test for Confidence Intervals:

***Z-Test***: 
Use when the population standard deviation (𝜎) is known and the sample size is large (n ≥ 30).

***t-Test***:
Use when the population standard deviation (𝜎) is unknown and the sample size is small (n < 30). The t-distribution accounts for the added uncertainty in estimating the population standard deviation.

**z-test steps**:

1. given data
sample mean
population standard deviation
sample size
confidence level

2. find the z-value for given confidence level
get from z-table or standard norma distribution to find z-value
90% = 1.645
95% = 1.96
99% = 2.576

3. calculate standard error 
SE = σ / sqrt(n)

4. margin of error 
ME = z * SE

5. calculate confidence 
upper bound  = xbar - ME
lower bound  = xbar + ME


**t-test steps** :
1. given data
sample mean,
sample standard deviation,
sample size,
confidence level

2. find the t-value for given confidence level
use t-distribution table based on degrees of freedom(df=n-1),
t * alpha/2

3. calculate the standard error 
Se = s/np.sqrt(n)

4. margin of error
ME = t*SE

5. confidence interval
CI = sample mean +/- ME

confidence interval

1. A random sample of 100 light bulbs is taken from a large shipment. The sample mean lifespan is found to be 1,500 hours with a standard deviation of 100 hours. Construct a 95% confidence interval for the population mean lifespan of all the light bulbs in the shipment.

In [6]:
import scipy.stats as stats
import math

n = 100
sample_mean = 1500
sample_std = 100
confidence = 0.95
df = n-1
alpha = 0.05

# use t-test when sample<30 and unknown population std deviation
# use z-test when sample>30 and known population std deviation

# here we use t-test
confidence_interval = stats.t.interval(1-alpha, df, loc = sample_mean, scale=sample_std/math.sqrt(n))
confidence_interval






(1480.1578304849131, 1519.8421695150869)

2. A survey of 500 adults found that 300 of them approve of the president's job performance. Construct a 90% confidence interval for the population proportion of adults who approve of the president's job performance.

In [19]:
import numpy as np
# 1. given data

n = 500 #population n
x = 300 #sample n

# 2. calculate sample proportion 
p = x / n #sample proportion
p

# 3. z-critical value for 90% confidene level 
z = 1.645 #z-critical value for 90% confidence level

# 4. calculate the standard error (SE)
SE = np.sqrt(p*(1-p)/x) #standard error
SE

# 5. Calculate margin of error (ME)
ME = 1.645 * SE
ME

# 6. confidence interval
CI_lower = p-ME
CI_upper = p+ME

CI_lower, CI_upper



(0.5534723737979251, 0.6465276262020748)

In [28]:
from statsmodels.stats.proportion import proportion_confint

#using predefined function
x=300
n=500

confidence_interval = proportion_confint(count=x, nobs=n, alpha=0.10, method='normal')

print(confidence_interval)

# count = number of successes (300 approvals)
# nobs = total sample size (500)
# alpha = significance level (0.10) alpha = 1-0.90 = 0.10
# method = 'normal' uses normal approximation (z-distribution)

(0.5639630625889797, 0.6360369374110203)


3. Two different brands of batteries are tested. A random sample of 25 batteries from Brand A has a mean lifespan of 20 hours with a standard deviation of 2 hours. A random sample of 30 batteries from Brand B has a mean lifespan of 22 hours with a standard deviation of 3 hours. Construct a 99% confidence interval for the difference in the population mean lifespans of the two brands of batteries.

In [42]:
import scipy.stats as stats

# Sample sizes
n1 = 25
n2 = 30

# Sample means
x1_bar = 20
x2_bar = 22

# Sample standard deviations
s1 = 2
s2 = 3

# Confidence level
alpha = 0.01

# Degrees of freedom
df = n1 + n2 - 2

# Calculate pooled standard error
se = stats.t.ppf(1 - alpha/2, df) * stats.t.ppf(1 - alpha/2, df) * ((s1**2 / n1) + (s2**2 / n2))**0.5

# Calculate margin of error
margin_of_error = se

# Construct confidence interval
lower_bound = (x1_bar - x2_bar) - margin_of_error
upper_bound = (x1_bar - x2_bar) + margin_of_error

print("99% confidence interval for the difference in population means:", (lower_bound, upper_bound))


99% confidence interval for the difference in population means: (-6.841658633469452, 2.841658633469452)


A random sample of 100 students is taken from a large university. The sample mean GPA is 3.2 with a standard deviation of 0.5. Construct a 95% confidence interval for the population mean GPA.

In [7]:
import scipy.stats as stats
import math 

n = 100
sample_mean = 3.2
sample_std = 0.5
confidence = 0.95
alpha = 0.05
df = n-1

from statsmodels.stats.proportion import proportion_confint

confidence_interval = stats.t.interval(1-alpha, df, loc=sample_mean, scale=sample_std/math.sqrt(n))
confidence_interval



(np.float64(3.100789152424566), np.float64(3.2992108475754343))

A survey of 500 people finds that 300 of them prefer brand A. Construct a 90% confidence interval for the population proportion preferring brand A.

In [9]:
from statsmodels.stats.proportion import proportion_confint

n = 500
x= 300

confidence_interval = proportion_confint(count=x, nobs=n, alpha=0.10, method='normal')
confidence_interval


(0.5639630625889797, 0.6360369374110203)

Two groups of students are given different teaching methods. Group A has 25 students with a sample mean of 85 and a sample standard deviation of 5. Group B has 30 students with a sample mean of 82 and a sample standard deviation of 6. Construct a 99% confidence interval for the difference between the population means.

In [19]:
n1 = 25
n1_mean = 85
n1_std = 5

n2 = 30
n2_mean = 82
n2_std = 6

alpha = 0.01

df = n1+n2-2
critical_value = 2.58

se = math.sqrt((n1_std**2/n1) + (n2_std**2/n2))

margin_of_error = se * critical_value

lower_bound = (n1_mean - n2_mean) - margin_of_error
upper_bound = (n1_mean - n2_mean) + margin_of_error

print(lower_bound, upper_bound)


-0.8267584193413624 6.826758419341362
