# Hypothesis Testing

In [6]:
from scipy import stats
from statsmodels.stats.weightstats import ztest
import math
import numpy as np

## Z-test

### **Note**


**Z-Test Quick Revision Note**

**Definition**:  

A Z-test is a statistical method used to determine if there is a significant difference between the means of two groups, particularly when the sample size is large (typically \( n > 30 \)) and the population variance is known. It helps assess hypotheses about population parameters when data follows a normal distribution.

**Equation**:

For a Z-test, the equation is:

Z = (X bar - mu) / (sigma / root(n))

where:
-  X bar = sample mean,
-  mu = population mean,
-  sigma  = population standard deviation,
-  n  = sample size.


**Uses**:
1. **Testing Population Mean**: Used when comparing a sample mean to a known population mean.
2. **Comparing Two Means**: To test the difference between two sample means when population variance is known.
3. **Large Samples**: Effective for large sample sizes, as it relies on the Central Limit Theorem for normal approximation.

### **Practical**

One-sample Z-test(Two tail test)

**Problem** <br> Suppose a factory claims that the average weight of a pack of chips is 200 grams. A customer suspects that the actual average weight might be different. To test this, the customer randomly selects 50 packs and finds that their average weight is 198 grams with a standard deviation of 10 grams. <br>


State the Hypotheses:

Null hypothesis (𝐻0): The average weight is 200 grams (𝜇=200). <br>
Alternative hypothesis (𝐻1): The average weight is not 200 grams (𝜇≠200).

In [7]:
sample_mean = 198
population_mean = 200
sample_std_dev = 10
sample_size = 50
alpha = 0.05

z_value = (sample_mean - population_mean) / (sample_std_dev / math.sqrt(sample_size))
print(f" Z value is: {z_value}")
p_value = 2 * (1 - stats.norm.cdf(abs(z_value)))
print(f" P value is: {p_value}")

if p_value < alpha:
    print(" We reject null hypothesis and accept alternative hypothesis because p-value is less than significant value(alpha). So we conclude that there is no significant relationship between weights")
else:
    print(" We fail to reject null hypothesis, P value is greater than significant value(alpha),.So we conclude that there is a significant relationship between weights")

 Z value is: -1.4142135623730951
 P value is: 0.157299207050285
 We fail to reject null hypothesis, P value is greater than significant value(alpha),.So we conclude that there is a significant relationship between weights


Using stats funcion

In [8]:
random_values = np.random.normal(loc=sample_mean,scale=sample_std_dev, size=50)
z_statistic, p_value = ztest(random_values, value=population_mean)
z_statistic, p_value

(-4.985549487781714, 6.178596159878033e-07)

Two-sample Z-test

**Problem**

 
Suppose you work for a company that manufactures two different types of light bulbs. The company wants to compare the average lifespans of the two light bulbs.
<br> You take two independent random samples:

Sample 1: 100 bulbs of type A, with an average lifespan of 1500 hours and a standard deviation of 300 hours. <br>
Sample 2: 120 bulbs of type B, with an average lifespan of 1450 hours and a standard deviation of 320 hours. <br>
You want to test at a 5% significance level (α=0.05) whether there is a significant difference between the lifespans of the two types of bulbs.

H0: There is no significant difference b/w them(mu1 = mu2) <br>
H1: There is significant difference b/w two bulbs(mu1 not equals mu2)<br>
It is a two-tail test. so p = 2 * ( 1 - cdf(abs(z_stati)))

In [9]:
# sample 1
n1 = 100
mean1 = 1500
std1 = 300
# sample 2
n2 = 120
mean2 = 1450
std2 = 320
aplha = 0.05

In [10]:
z_value = (mean1 - mean2) / math.sqrt(((std1 ** 2) / n1) + ((std2 ** 2) / n2))
print(f"Z value is: {z_value}")
p_value = 2 * (1 - stats.norm.cdf(abs(z_value)))
print(f"P value is: {p_value}")
if p_value < aplha:
    print("Reject Null hypothesis")
else:
    print("Fail to reject null hypothesis")

Z value is: 1.1940919199575821
P value is: 0.2324420135643246
Fail to reject null hypothesis


In [11]:
sample1 = np.random.normal(loc=mean1,scale=std1,size=n1)
sample2 = np.random.normal(loc=mean2,scale=std2,size=n2)

In [13]:
z_score_two_sample, p_value_two_sample = ztest(sample1, sample2)
z_score_two_sample, p_value_two_sample

(-1.3296797476747753, 0.1836238103273865)

**Paired test**

A paired Z-test is used for comparing the means of two related groups, such as measurements taken on the same subjects before and after an intervention.

Define hypotheses: <br>
H0: 𝜇𝑑 = 0 (The mean of the differences is zero) <br>
H1: ud not equals to 0

In [15]:
before = np.array([20, 18, 19, 22, 20, 23, 21, 22])
after = np.array([18, 17, 16, 20, 19, 21, 20, 19])

differences = before - after
mean_diff = np.mean(differences)
std_diff = np.std(differences, ddof=1)
n = len(differences)

z_score_paired = mean_diff / (std_diff / np.sqrt(n))

# Calculate the p-value (two-tailed)
p_value_paired = 2 * (1 - stats.norm.cdf(abs(z_score_paired)))

print(f"Paired Z-test Z-score: {z_score_paired}")
print(f"Paired Z-test p-value: {p_value_paired}")


Paired Z-test Z-score: 6.354889093022426
Paired Z-test p-value: 2.085771555471183e-10


## T-test

T-Test: Quick Revision
A T-test is a statistical test used to compare the means of one or two groups, helping determine if there is a statistically significant difference between them. It's commonly used when the sample size is small or when the population standard deviation is unknown.

for one sample test: t = X bar - mu / (s/root(n)) <br>
for two sample test: t = X1 bar - X2 bar / root(square(sp2 (1/n1 + 1/n2) )) <br>
for paired sample test: t = D bar / (Sd / root(n))

**Uses of T-Tests** <br>
Hypothesis Testing: Used to test hypotheses about population means.<br>
Small Samples: Suitable for small samples (generally 
𝑛
<
30.)<br>
Unknown Population Variance: Often used when population variance is unknown. <br>


**Assumptions**<br>
Normality: Data should follow a normal distribution (especially important for small samples).<br>
Independence: Observations should be independent.<br>
Equal Variances (for two-sample t-tests): Known as homogeneity of variance; applies when comparing two groups.