In [1]:
import numpy as np
import math

# z-score

In statistics, the standard score or **z-score** is the number of standard deviations by which the value of a raw score (i.e., an observed value $X$ or data point) is above or below the mean value ($\mu$) of what is being observed or measured.

\begin{equation}\tag{0}
z = \frac{X-\mu}{\sigma}
\end{equation}

From this starting point, variations arise depending on whether you are working with sample data or testing a sample mean ($\bar{X}$) against a population mean. 

The **z-score** serves as the test statistic in a **Z-test**. 
The goal of the test is to determine if the difference between your observed sample mean ($=\bar{X}$) and the population mean ($\mu _{0}$) expected under the **null hypothesis** ($H_{0}$) is statistically significant or due to random chance.

## Formula

The formula for a one-sample **$z$-score** is:

$$\begin{equation}\tag{1}
z = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}} 
\end{equation}$$

where:
* $\bar{X}$ is the sample mean
* $\mu _{0}$ is the hypothesized population mean (from the null hypothesis)
* $\sigma$ is the known population standard deviation
* $n$ is the sample size

## Hypothesis statements

The **null hypothesis** ($H_{0}$) in a **$Z$-test** is a statement of **no effect or no difference**. 
It typically asserts that the population mean ($\mu$) is equal to a specific hypothesized value ($\mu_{0}$). 
The purpose of the **$Z$-test** and its associated **$z$-score** is to determine if there is enough statistical evidence to reject this default assumption.

The **$z$-score** measures how far the sample mean ($\bar{X}$) deviates from the mean ($\mu_0$) specified in the **null hypothesis***, in terms of standard errors.A very large positive or negative **$z$-score** suggests the observed data are highly unlikely if the **null hypothesis** were true, leading to its rejection

## Hypothesis Testing

The general procedure for a **Z-test** involves a few key steps:Â 

1. **State the Hypotheses**: Define the null hypothesis ($H_{0}$, e.g., $\mu =\mu_{0}$) and the alternative hypothesis ($H_{a}$, e.g., $\mu\ne\mu_{0}$, $\mu>\mu_{0}$, or $\mu<\mu_{0}$).

2. **Select a Significance Level** ($\alpha$): This is typically set to 0.05, representing a 5% risk of incorrectly rejecting the null hypothesis.

3. **Calculate the $Z$-score**: Use the formula (1) to compute the test statistic from your sample data.

4. **Make a Decision**: Compare the calculated **$z$-score** to a **critical value** or use the **$p$-value method**.
    * **Critical Value Method**: If the calculated **$z$-score** falls into the "rejection region" (beyond the critical values, e.g., outside $\pm1.96$ for a two-tailed test at $\alpha=0.05$), you reject the null hypothesis.
    * **$p$-value Method**: The **$p$-value** is the probability of observing a **$z$-score** as extreme as the one calculated, assuming the null hypothesis is true. If the **$p$-value** is less than the significance level ($\alpha$), you reject the null hypothesis.

## Computing

In [3]:
from scipy.stats import zscore, norm

In [2]:
# Example values
sample_mean = 49.2 # the mean of your sample
population_mean = 50 # the hypothesized mean of the population under the null hypothesis
population_std_dev = 2 # the known standard deviation of the population
sample_size = 30 # the number of observations in your sample

In [5]:
# Calculate the standard error of the mean
standard_error = population_std_dev / math.sqrt(sample_size)

In [6]:
# Calculate the z-score
z_score = (sample_mean - population_mean) / standard_error
print(f"The calculated z-score is: {z_score}")

The calculated z-score is: -2.1908902300206567


In [8]:
# For a two-tailed test, for example
p_value = 2 * norm.cdf(z_score) if z_score < 0 else 2 * (1 - norm.cdf(z_score))
print(f"The p-value is: {p_value}")

The p-value is: 0.028459736916311117


**$Z$-scores** transform normally distributed data into the standard normal distribution which is a special bell curve with $\mu=0$ and $\sigma=1$. 
This transformation unlocks some useful analytical capabilities: approximately 68% of values fall within one standard deviation of the mean (**$z$-scores** between -1 and +1), 95% fall within two standard deviations (-2 to +2), and 99.7% fall within three standard deviations (-3 to +3).

In [4]:
data = np.array([10, 20, 30, 40, 50])
print(zscore(data))

[-1.41421356 -0.70710678  0.          0.70710678  1.41421356]


if one only has a sample of observations from the population, then the analogous computation using the sample mean $\bar{X}$ and sample standard deviation $s$ yields the **$t$-statistic**.

## One-Sampled $Z$-test

In [2]:
from statsmodels.stats.weightstats import ztest

A **one-sample $Z$-test** is used to determine if the mean of a single sample is significantly different from a known population mean. When to Use:

* The population standard deviation is known.
* The sample size is large (usually n>30).
* The data is approximately normally distributed.

In [3]:
data = [11.8] * 100  
population_mean = 12
population_std_dev = 0.5

In [4]:
z_statistic, p_value = ztest(data, value=population_mean)

print(f"Z-Statistic: {z_statistic:.4f}")
print(f"P-Value: {p_value:.4f}")

Z-Statistic: -560128131373970.2500
P-Value: 0.0000


In [6]:
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The average battery life is different from 12 hours.")
else:
    print("Fail to reject the null hypothesis: The average battery life is not significantlyy different from 12 hours")

Reject the null hypothesis: The average battery life is different from 12 hours.


## Two-sampled $Z$-test

In [7]:
from statsmodels.stats.proportion import proportions_ztest

In this test we have provided 2 normally distributed and independent populations and we have drawn samples at random from both populations.

\begin{equation}\tag{2}
Z = \frac{(\bar{X_1}-\bar{X_2})-(\mu_1-\mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}
\end{equation}

## References

- [Understanding Z-Score with NumPy](https://medium.com/@whyamit101/understanding-z-score-with-numpy-bc8b23f81639)
- [Z-test : Formula, Types, Examples](https://www.geeksforgeeks.org/data-science/z-test/)
- [How to Perform A/B Testing with Hypothesis Testing in Python: A Comprehensive Guide](https://towardsdatascience.com/how-to-perform-a-b-testing-with-hypothesis-testing-in-python-a-comprehensive-guide-17b555928c7e/)