In [1]:
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
from scipy.stats import norm, shapiro
from statsmodels.stats.weightstats import ztest
from statsmodels.stats.proportion import proportions_ztest

In [3]:
import warnings
warnings.filterwarnings("ignore")

# z-score

**Hypothesis testing** with a normal distribution determines if a sample mean significantly differs from a known population mean, assuming the population is normally distributed with a known variance. 

It involves defining null ($H_{0}$) and alternative ($H_{1}$) hypotheses, calculating a test statistic ($\bar{X}$ or $z$-score), and comparing it to a significance level ($\alpha$) to accept or reject the null hypothesis. 

## One-Sample Z-Test

**One Sample** $z$-**test** (single sample $z$-test) is used to compare the sample mean $\bar{X}$ with some specific or hypothesized value $\mu_0$(known mean of the population). 

**One Sample** $z$-**test** checks whether the sample comes from a known population where population mean ($\mu_0$) and standard deviation ($\sigma$) should be known.

The formula for a **one-sample** **$z$-score** is:

$$\begin{equation}\tag{z-test}
z = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}} 
\end{equation}$$

where:
* $\bar{X}$ is the sample mean
* $\mu _{0}$ is the hypothesized population mean (from the null hypothesis)
* $\sigma$ is the known population standard deviation
* $n$ is the sample size

The **null hypothesis** ($H_{0}$) in a **$Z$-test** is a statement of **no effect or no difference**. 
It typically asserts that the population mean ($\mu$) is equal to a specific hypothesized value ($\mu_{0}$). 
The purpose of the **$Z$-test** and its associated **$z$-score** is to determine if there is enough statistical evidence to reject this default assumption.

The **$z$-score** measures how far the sample mean ($\bar{X}$) deviates from the mean ($\mu_0$) specified in the **null hypothesis***, in terms of standard errors.A very large positive or negative **$z$-score** suggests the observed data are highly unlikely if the **null hypothesis** were true, leading to its rejection

The general procedure for a **Z-test** involves a few key steps: 

1. **State the Hypotheses**: Define the null hypothesis ($H_{0}$, e.g., $\mu =\mu_{0}$) and the alternative hypothesis ($H_{a}$, e.g., $\mu\ne\mu_{0}$, $\mu>\mu_{0}$, or $\mu<\mu_{0}$).

2. **Select a Significance Level** ($\alpha$): This is typically set to 0.05, representing a 5% risk of incorrectly rejecting the null hypothesis.

3. **Calculate the $Z$-score**: Use the formula (z-test) to compute the test statistic from your sample data.

4. **Make a Decision**: Compare the calculated **$z$-score** to a **critical value** or use the **$p$-value method**.
    * **Critical Value Method**: If the calculated **$z$-score** falls into the "rejection region" (beyond the critical values, e.g., outside $\pm1.96$ for a two-tailed test at $\alpha=0.05$), you reject the null hypothesis.
    * **$p$-value Method**: The **$p$-value** is the probability of observing a **$z$-score** as extreme as the one calculated, assuming the null hypothesis is true. If the **$p$-value** is less than the significance level ($\alpha$), you reject the null hypothesis.

- The dependent variable should have an approximately standard normal distribution i.e. $\mathcal{N}(0, 1)$ (Shapiro-Wilks Test)
- Population standard deviation $\sigma$ should be known
- Observations are independent of each other and randomly drawn from a population
- The sample size should be large ($n \geq 30$)

**$Z$-scores** transform normally distributed data into the standard normal distribution which is a special bell curve with $\mu=0$ and $\sigma=1$. 
This transformation unlocks some useful analytical capabilities: approximately 68% of values fall within one standard deviation of the mean (**$z$-scores** between -1 and +1), 95% fall within two standard deviations (-2 to +2), and 99.7% fall within three standard deviations (-3 to +3).

In [4]:
random_variates = norm.rvs(loc=0, scale=1, size=10000)

In [5]:
statistic, p_value = shapiro(random_variates)

print(f'Statistic: {statistic:.4f}')
print(f'P-value: {p_value:.4f}')

Statistic: 0.9996
P-value: 0.0328


In [6]:
# Interpretation
alpha = 0.05
if p_value > alpha:
    print("Data looks Gaussian (fail to reject null hypothesis)")
else:
    print("Data does not look Gaussian (reject null hypothesis)")

Data does not look Gaussian (reject null hypothesis)


In [7]:
# Perform the z-test
z_statistic, p_value = ztest(random_variates, value=0)

print(f"Z-Statistic: {z_statistic:.4f}")
print(f"P-Value: {p_value:.4f}")

Z-Statistic: 0.0642
P-Value: 0.9488


## Two-sampled $Z$-test

The **two-sample** (unpaired or independent) $z$-**test** calculates if the means of two independent groups are equal or significantly different from each other. 

\begin{equation}\tag{2-z-test}
Z = \frac{(\bar{X_1}-\bar{X_2})-(\mu_1-\mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}
\end{equation}

The **two-sample** $z$-**test** compares the means of two independent samples 

**Null hypothesis**: means are equal, $\mathrm{H}_{0}:\mu _{1}=\mu _{2}$. 

Unlike the $t$-test, $z$-test is performed when the population means and standard deviation are known.

> If the sample size is large ($n\geq30$) and population standard deviation ($\sigma_1$ and $\sigma_2$) is unknown, you can also estimate the population standard deviation from the sample.

- Dependent variables for samples should have an approximately standard normal distribution (Shapiro-Wilks Test)
- Population standard deviations ($\sigma_1$ and $\sigma_2$) should be known
- Observations are independent of each other and randomly drawn from a population
- The sample size should be large ($n\geq30$)

In [8]:
mu1, mu2 = 4, 8
sigma1, sigma2 = 2, 1
random_variates1 = norm.rvs(loc=mu1, scale=sigma1, size=1000)
random_variates2 = norm.rvs(loc=mu2, scale=sigma2, size=1000)

In [9]:
# Perform Shapiro-Wilk Test
statistic, p_value = shapiro(random_variates1)

print(f'Statistic: {statistic:.4f}')
print(f'P-value: {p_value:.4f}')

Statistic: 0.9984
P-value: 0.4679


In [10]:
# Perform the two-sample z-test
z_statistic, p_value = ztest(x1=random_variates1, x2=random_variates2, 
                             value=0, alternative='two-sided',
                             usevar='unequal', ddof=1.0)

print(f"Z-Statistic: {z_statistic:.4f}")
print(f"P-Value: {p_value:.4f}")

Z-Statistic: -54.1155
P-Value: 0.0000


In [11]:
if p_value < alpha:
    print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the sample mean and population mean.")

Reject the null hypothesis: The sample mean is significantly different from the population mean.


## Paired $z$-test (dependent $z$-test)

This test is used to test hypotheses about population proportions, commonly in $A/B$ testing scenarios. 

**Paired** $z$-**test*** is used for checking whether there is difference between the two paired samples or not.

- Differences between the two dependent variables $\delta$ follow an approximately standard normal distribution (Shapiro-Wilks Test)
- The independent variable should have a pair of dependent variables
- Observations are sampled independently from each other
- Population standard deviations ($\sigma_{\delta}$) for difference should be known
- The sample size should be large ($n\geq30$). 

In [12]:
data = pd.DataFrame({
    'before': norm.rvs(loc=3, scale=2, size=100),
    'after': norm.rvs(loc=5, scale=2, size=100)
})

differences = data['after'] - data['before']

In [13]:
# Perform the Z-test
# value=0 is the mean difference under the null hypothesis
# The function returns the z-statistic and the p-value
z_statistic, p_value = ztest(differences, value=0) 

print(f"Z-statistic: {z_statistic}")
print(f"P-value: {p_value}")

Z-statistic: 6.8027271859429
P-value: 1.0265694257189186e-11


In [14]:
# Interpretation (for a significance level of 0.05)
if p_value < 0.05:
    print("Reject the null hypothesis (significant difference found).")
else:
    print("Fail to reject the null hypothesis (no significant difference found).")

Reject the null hypothesis (significant difference found).


## One-Proportion $z$-test

The normal distribution can be used for a $z$-test for a proportion because, under certain conditions, the sampling distribution of the sample proportion approximates a normal distribution. 

This is based on the **Central Limit Theorem**, which states that as the sample size becomes large, the distribution of the sample proportion becomes approximately normal, even if the population distribution is not.

\begin{equation}\tag{z-test-proportion}
z=\frac{\hat{p}-p}{\sqrt{\frac{pq}{n}}}
\end{equation}

Assumptions:
- **Independence**: Observations within and between groups must be independent.
- **Normality**: Sample size is large enough for the normal approximation to apply.
- **Binary Data**: The outcome is categorical (e.g., yes/no, success/failure). 

In [15]:
# Sample data: 60 heads out of 100 coin tosses
count = 60      # number of successes
nobs = 100      # number of trials
value = 0.5     # hypothesized population proportion

# Perform the z-test (two-sided alternative hypothesis by default)
# Returns the z-statistic and the p-value
stat, pval = proportions_ztest(count, nobs, value)

print(f"Z-statistic: {stat:.3f}")
print(f"P-value: {pval:.3f}")

Z-statistic: 2.041
P-value: 0.041


In [16]:
# Conclusion based on significance level (e.g., alpha = 0.05)
alpha = 0.05
if pval < alpha:
    print("Reject the null hypothesis (there is a significant difference from 0.5)")
else:
    print("Fail to reject the null hypothesis (no significant evidence the proportion is different from 0.5)")

Reject the null hypothesis (there is a significant difference from 0.5)


## Two-Proportion Z-Test

In [17]:
# Sample data for two groups
# Group A: 100 conversions out of 1000 visitors
count_A = 100
nobs_A = 1000

# Group B: 150 conversions out of 1100 visitors
count_B = 150
nobs_B = 1100

counts = np.array([count_A, count_B])
nobs = np.array([nobs_A, nobs_B])

In [18]:
# Perform the two-proportion z-test (null hypothesis is that proportions are equal)
stat, pval = proportions_ztest(counts, nobs)

print(f"Z-statistic: {stat:.3f}")
print(f"P-value: {pval:.3f}")

Z-statistic: -2.570
P-value: 0.010


In [19]:
# Conclusion based on significance level (e.g., alpha = 0.05)
alpha = 0.05
if pval < alpha:
    print("Reject the null hypothesis (there is a significant difference between proportions)")
else:
    print("Fail to reject the null hypothesis (no significant evidence of a difference)")

Reject the null hypothesis (there is a significant difference between proportions)


## References

- [Understanding Z-Score with NumPy](https://medium.com/@whyamit101/understanding-z-score-with-numpy-bc8b23f81639)
- [Z-test : Formula, Types, Examples](https://www.geeksforgeeks.org/data-science/z-test/)
- [How to Perform A/B Testing with Hypothesis Testing in Python: A Comprehensive Guide](https://towardsdatascience.com/how-to-perform-a-b-testing-with-hypothesis-testing-in-python-a-comprehensive-guide-17b555928c7e/)