<a href="https://colab.research.google.com/github/basava-999/statistics/blob/main/Z_Test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:

import numpy as np

from scipy import stats

**1. Introduction to the Z-Test**

The Z-test is a parametric statistical test used to determine whether there is a significant difference between:

- a sample mean and a population mean, or
two sample means,
when the population variance is known (or the sample size is large enough, usually
𝑛
≥
30
).

- It is based on the Z-distribution, which is the standard normal distribution:
𝑍
∼
𝑁
(
0
,
1
)

**2. General Idea**

We calculate a Z statistic:

𝑍
=
( Observed Statistic
−
Expected Value ) /
Standard Error

and compare it to the critical Z value from the standard normal table or use it to compute a p-value.

**3. Assumptions**

- Data follows normal distribution (or
𝑛
≥
30, CLT applies)

- Data is measured on an interval/ratio scale

- Independence of observations

- Population variance (
𝜎
2
) is known (or estimated from large samples)

**4. Hypothesis**

H0 : No difference

H1 : Significant Difference

# **1 sample Z Test**

#### **Test Calculation**

$$
Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}$$

**Data**

In [3]:

pop_mean = 68
pop_std  = 5

data = np.random.normal( 65, 6, 1000)
sample_mean = np.mean(data)
n = len(data)

print( f'Sample Mean: {sample_mean}')
print( f'Sample Size: {n}')

Sample Mean: 64.88599415622453
Sample Size: 1000


**Test Statistic**

In [4]:

z = (sample_mean - pop_mean) / ( pop_std / np.sqrt( n))

print(f'Z Statistic: {z}')

Z Statistic: -19.694702226809927


**Critical Value**

In [7]:

critical = stats.norm.ppf( 1 - .05)
print( f'Critical Value: {critical}')

Critical Value: 1.6448536269514722


**P value**

In [9]:

p = 2 * ( 1 - stats.norm.cdf( abs(z)))
print( f'P Value: {p}')

P Value: 0.0


- |z| > critical value
- p < LOS

- Reject H0

#### **Effect Size**

$$
d = \frac{\bar{X} - \mu_0}{\sigma}
$$

In [10]:

d = ( sample_mean - pop_mean) / pop_std
print( f'Effect Size: {d}')

Effect Size: -0.6228011687550946


**Moderate to strong**

# **2 Sample Z Test**

#### **Test Calculation**


$$
Z = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}
$$

**Data**

In [11]:

sample_1 = np.random.normal( 65, 7, 1000)
sample_2 = np.random.normal( 68, 6, 1000)

sample_mean_1 = np.mean(sample_1)
sample_mean_2 = np.mean(sample_2)

sample_var_1 = np.var(sample_1)
sample_var_2 = np.var(sample_2)

n_1 = len(sample_1)
n_2 = len(sample_2)

print( f'Sample 1 Mean: {sample_mean_1}')
print( f'Sample 2 Mean: {sample_mean_2}\n')

print( f'Sample 1 Variance: {sample_var_1}')
print( f'Sample 2 Variance: {sample_var_2}\n')

print( f'Sample 1 Size: {n_1}')
print( f'Sample 2 Size: {n_2}')





Sample 1 Mean: 65.01262314746077
Sample 2 Mean: 67.81066934344271

Sample 1 Variance: 51.84357560656173
Sample 2 Variance: 37.39132238691258

Sample 1 Size: 1000
Sample 2 Size: 1000


**Test Statistic**

In [12]:

z = ( sample_mean_1 - sample_mean_2) / np.sqrt( sample_var_1 / n_1 + sample_var_2 / n_2)

print( f'Z Statistic: {z}')

Z Statistic: -9.366719496614918


**Critical Value**

In [30]:

critical = stats.norm.ppf( 1 - .05/2)
print( f'Critical Value: {critical}')

Critical Value: 1.959963984540054


**P value**

In [14]:

p = 2 * ( 1 - stats.norm.cdf( abs(z)))
print( f'P Value: {p}')

P Value: 0.0


- |z| > critical value
- p < LOS ( 0.05 )
- Reject H0

#### **Effect Size**

$$
d = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{\sigma_1^2 + \sigma_2^2}{2}}} $$

In [15]:

d = ( sample_mean_1 - sample_mean_2) / np.sqrt( ( sample_var_1 + sample_var_2) / 2)
print( f'Effect Size: {d}')


Effect Size: -0.4188924304120713


**Moderate**

# **Proportion 1  Sample Z Test**

#### **Test Calculation**

$$
z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}
$$

**Data**

In [18]:

sample_p = .65
pop_p    = .7
pop_n    = 1 - pop_p
n        = 1000

print( f'Sample Proportion    : {sample_p}')
print( f'Population Proportion: {pop_p}')
print( f'Sample Size          : {n}')

Sample Proportion    : 0.65
Population Proportion: 0.7
Sample Size          : 1000


**Statistic**

In [19]:

z = ( sample_p - pop_p) / np.sqrt( ( pop_p * pop_n) / n)
print( f'Z Statistic: {z}')

Z Statistic: -3.4503277967117665


**Critical Value**

In [29]:

critical = stats.norm.ppf( 1 - .05 / 2)
print( f'Critical Value: {critical}')

Critical Value: 1.959963984540054


**P Value**

In [21]:

p = 2 * ( 1 - stats.norm.cdf( abs(z)))
print( f'P Value: {p}')

P Value: 0.0005599062480061701


- |z| > critical value
- p < LOS
- Reject H0

**In-Built**

In [25]:
from statsmodels.stats.proportion import proportions_ztest

z, p = proportions_ztest( n * sample_p, n, pop_p)
print( f'Z Statistic: {z}')
print( f'P Value: {p}')

Z Statistic: -3.314967720658975
P Value: 0.0009165370761145487


#### **Effect Size**

$$h = 2 \cdot \arcsin(\sqrt{\hat{p}}) - 2 \cdot \arcsin(\sqrt{p_0})$$

In [22]:

h = 2 * np.arcsin( np.sqrt( sample_p)) - 2 * np.arcsin( np.sqrt( pop_p))
print( f'Effect Size: {h}')

Effect Size: -0.10682419205209026


**Very small**

# **Proportion 2 Sample z TEst**

####**Test Calculations**

$$z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1 - \hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$$

**Data**

In [26]:

sample_1_p = .65
sample_2_p = .7

n_1 = 1000
n_2 = 1000

p = ( sample_1_p * n_1 + sample_2_p * n_2) / ( n_1 + n_2)
q = 1 - p

print( f'Sample 1 Proportion: {sample_1_p}')
print( f'Sample 2 Proportion: {sample_2_p}\n')

print( f'Sample 1 Size: {n_1}')
print( f'Sample 2 Size: {n_2}\n')

print( f'p: {p}')
print( f'q: {q}')


Sample 1 Proportion: 0.65
Sample 2 Proportion: 0.7

Sample 1 Size: 1000
Sample 2 Size: 1000

p: 0.675
q: 0.32499999999999996


**Statistic**

In [27]:

z = ( sample_1_p - sample_2_p) / np.sqrt( p * q * ( 1 / n_1 + 1 / n_2))
print( f'Z Statistic: {z}')


Z Statistic: -2.38704958013144


**Critical Value**

In [28]:

critical = stats.norm.ppf( 1 - .05/2)
print( f'Critical Value: {critical}')

Critical Value: 1.959963984540054


**P Value**

In [31]:

p = 2 * ( 1 - stats.norm.cdf( abs(z)))
print( f'P Value: {p}')

P Value: 0.016984200582130793


- |z| > critical value
- p < LOS
- Reject H0

**In Built**

In [33]:

z, p = proportions_ztest( [n_1 * sample_1_p, n_2 * sample_2_p], [n_1, n_2])
print( f'Z Statistic: {z}')
print( f'P Value: {p}')

Z Statistic: -2.38704958013144
P Value: 0.01698420058213072


#### **Effet Size**

$$h = 2 \cdot \arcsin(\sqrt{{p_1}}) - 2 \cdot \arcsin(\sqrt{p_2})$$

In [32]:

h = 2 * np.arcsin( np.sqrt( sample_1_p)) - 2 * np.arcsin( np.sqrt( sample_2_p))
print( f'Effect Size: {h}')

Effect Size: -0.10682419205209026


**Very Weak**