<img src="resources/central_limit_theorem.png" style="width: 600px; margin-left: 4em"/>

---

# Z-test

A Z-test is any statistical test for which the **distribution of the test statistic T** under the null hypothesis can be approximated by a **normal distribution**

it tests the mean of a distribution.

Because of the central limit theorem, many test statistics are approximately normally distributed for large samples. Therefore, many statistical tests can be conveniently performed as approximate Z-tests **if the sample size is large or the population variance is known**.

https://en.wikipedia.org/wiki/Z-test

### Conditions

- Population variance should be known, or well estimated with large sample size ( > 30 or 50)
- Z-tests focus on a single parameter, and treat all others as their true values
- The test statistic should follow a normal distribution. If the variation of the test statistic is strongly non-normal, a Z-test should not be used.

If sample size is not large enough for these estimates to be reasonably accurate, the Z-test may not perform well.


### One-sample z-test

- (Normal population **or** n large) **and** σ known.
- z is the distance from the mean in relation to the standard deviation of the mean

<img src="resources/one_sample_z_test.png" style="width: 300px; margin-left: 4em"/>

```
x_bar = sample mean
miu0 = hypothesized population mean
σ = population standard deviation
n = sample size
```

### One-proportion z-test

- n * p0 > 10
- n * (1 − p0) > 10
- it is a SRS (Simple Random Sample)

<img src="resources/one_proportion_z_test.png" style="width: 300px; margin-left: 4em"/>

```
p_hat = x/n = sample proportion, unless specified otherwise
p0 = hypothesized population proportion
n = sample size
```


### How to understand the formula

#### one-sample z-test

For a numerical random variable X follows a population distribution with **`miu`** and **`std`**, randomly choose **`n`** samples x1, x2, ..., xn and calculate a **`sample mean x_bar`**. Repeat this sampling several times, there will be a group of sample means, each could be different from x_bar. But all sample means follows one same approximated normal distribution, with mean equals to population mean **`miu`** and standard diviation equals to **`std / sqrt(n)`**, according to central limit theorem.

Z is a standard score of the sample mean x_bar, in the distribution of sampling mean. It tells how far the sample mean is from the population mean.

#### one-proportion z-test

For a boolean randome variable X follows a bernoulli distribution with **`miu`** and **`std`** as `p` and `sqrt(p * q)`, respectively. Randomly choose **`n`** samples x1, x2, ..., xn and count the number of success. The count is equivalent to the meam of n samples **`(n * p)`**. Again, repeat this sampling several times, there will be a group of different sample means. But all sample means follows one same approximated normal distribution, with mean equals to population mean **`miu`** and standard diviation equals to **`std / sqrt(n)`**, according to central limit theorem.

This time, `std / sqrt(n) = sqrt(p * q / n) `

#### one-proportion z-test, another way of understanding

For a large n we can accurately approximate the binomial distribution with a normal distribution with **`miu = n * p`** and **`std = sqrt(n * p * q)`**. Every point in this distribution represents a total sum of success out of n bermoulli trials. Thus, to calculate the distance of one point to the population mean, in relation to the standard diviation of the normal distribution, we can simply use the z-score.

This time `z-score = (num_of_success_in_sample - n * p) / sqrt(n * p * q)`

The above formula is acually equivalent to `(p - p0) / sqrt(n * p * q / n)`

## Examples

### 1. one-sample z-test

### 1.1 left tailed
In a region, the mean and std of scores of a reading tests are 100 and 12, respectively. 55 students from a school recieve a mean of 96. Are the students from this school comparable to the region as a whole, or their score is surprisingly low? Assuming the region scores follow normal distribution.

- T: average score of the school (miu), which is the mean of the sampling distribution.
- H0: miu = miu0 = 100
- Ha: miu < miu0 = 100
- alpha: 0.05

In [1]:
import numpy as np
from scipy import stats

alpha = 0.05
miu, std = 100, 12
sample_mean, n = 96, 55

se = std / np.sqrt(n)
z_score = (sample_mean - miu) / se
p_value = stats.norm.cdf(z_score)  # 0.006716732602885773

print(p_value)

if p_value < alpha:
    print("H0 should be rejected")
else:
    print("H0 should not be rejected")

0.006716732602885773
H0 should be rejected


### 1.2 right tailed

testing the claim that women in a certain town are taller than the average U.S. height, which is 63.8 inches. From a random sample of 50 women, we get an average height of 64.7 inches with a standard deviation of 2.5 inches.

- T: average women height in town (miu), which is the mean of the sampling distribution.
- H0: miu = miu0 = 63.8
- Ha: miu > miu0 = 63.8
- alpha: 0.05
- population std unknown, but we can estimate it by sample mean because n is relatively large.

In [2]:
import numpy as np
from scipy import stats

alpha = 0.05
miu = 63.8
sample_mean, sample_std, n = 64.7, 2.5, 50

se = sample_std / np.sqrt(n)
z_score = (sample_mean - miu) / se
p_value = 1 - stats.norm.cdf(z_score)  # 0.0054547491821344

print(p_value)

if p_value < alpha:
    print("H0 should be rejected")
else:
    print("H0 should not be rejected")

0.0054547491821344
H0 should be rejected


### 1.3 both-tailed

Suppose a pharmaceutical company manufactures ibuprofen pills. They need to perform some quality assurance to ensure they have the correct dosage, which is supposed to be 500 milligrams. In a random sample of 125 pills, there is an average dose of 499.3 milligrams with a standard deviation of 6 milligrams.

- T: average weight of pills (miu), which is the mean of the sampling distribution.
- H0: miu = miu0 = 500
- Ha: miu != miu0 = 500
- alpha: 0.05
- population std unknown, but we can estimate it by sample mean because n is relatively large.

In [3]:
import numpy as np
from scipy import stats

alpha = 0.05
miu = 500
sample_mean, sample_std, n = 499.3, 6, 125

se = sample_std / np.sqrt(n)
z_score = (sample_mean - miu) / se
p_value = 2 * min(1 - stats.norm.cdf(z_score), stats.norm.cdf(z_score))  # 0.1921064408679496

print(p_value)

if p_value < alpha:
    print("H0 should be rejected")
else:
    print("H0 should not be rejected")

0.1921064408679496
H0 should not be rejected


### 2. one-proportion z-test

### 2.1 left tailed

we'll look at the proportion of students who suffer from test anxiety. We want to test the claim that fewer than half of students suffer from test anxiety. In a random sample of 1000 students, 450 students claimed to have test anxiety.

- T: The proportion of students sufferring from test anxiety, p, which is the mean of the sampling distribution.
- H0: p = p0 = 0.5
- Ha: p < p0 = 0.5
- alpha: 0.05

In [4]:
import numpy as np
from scipy import stats

alpha = 0.05
p = 0.5
sample_p, n = 0.45, 1000

se = np.sqrt(p * (1 - p) / n)
z_score = (sample_p - p) / se
p_value = stats.norm.cdf(z_score)  # 0.0007827011290012763

print(p_value)

if p_value < alpha:
    print("H0 should be rejected")
else:
    print("H0 should not be rejected")

0.0007827011290012763
H0 should be rejected


In [5]:
# verify the above result
stats.binom.cdf(450, 1000, 0.5)

0.0008652680424885023

### 2.2 right tailed

An article said 26% of Americans can speak more than one language. One was curious if the figure is higher in his city, and thus randomly tested 120 people and found 40 of them can speak more than one language.

- T: The proportion of people who can speak more than one language, p, which is the mean of the sampling distribution.
- H0: p = p0 = 0.26
- Ha: p > p0 = 0.26
- alpha: 0.05


In [6]:
import numpy as np
from scipy import stats

alpha = 0.05
p = 0.26
sample_p, n = 40 / 120, 120

se = np.sqrt(p * (1 - p) / n)
z_score = (sample_p - p) / se
p_value = 1 - stats.norm.cdf(z_score)  # 0.03351844776878066

print(p_value)

if p_value < alpha:
    print("H0 should be rejected")
else:
    print("H0 should not be rejected")

0.03351844776878066
H0 should be rejected


In [7]:
# verify the above result
1 - stats.binom.cdf(39, 120, 0.26)

0.04464489214305367

### 2.3 both tailed

Redo the example in p-value notebook. test if a coin is fair by flipping it 20 times and getting 14 heads.

- T: The probability of getting head, p, which is the mean of the sampling distribution.
- H0: p = p0 = 0.5
- Ha: p != p0 = 0.5
- alpha: 0.05


In [8]:
import numpy as np
from scipy import stats

alpha = 0.05
p = 0.5
sample_p, n = 14 / 20, 20

se = np.sqrt(p * (1 - p) / n)
z_score = (sample_p - p) / se
p_value = 2 * min(1 - stats.norm.cdf(z_score), stats.norm.cdf(z_score))  # 0.07363827012030266

print(p_value)

if p_value < alpha:
    print("H0 should be rejected")
else:
    print("H0 should not be rejected")

0.07363827012030266
H0 should not be rejected


The number is not even close the the one we calculated in p-value notebook, 0.11531829833984375

That's because of the small sample size n. The larger n is, the closer it is compare to a normal distribution

If we scale up the sample size while keeping the same head's proportion, we will see the following. Flipping 2000 times and getting 1400 heads.

In [9]:
import numpy as np
from scipy import stats

alpha = 0.05
p = 0.5
sample_p, n = 700 / 1000, 1000

se = np.sqrt(p * (1 - p) / n)
z_score = (sample_p - p) / se
p_value = 2 * min(1 - stats.norm.cdf(z_score), stats.norm.cdf(z_score))  # 0.0

print(p_value)

if p_value < alpha:
    print("H0 should be rejected")
else:
    print("H0 should not be rejected")

0.0
H0 should be rejected


In [10]:
# verify the above result
2 * min(1 - stats.binom.cdf(700, 1000, 0.5), stats.binom.cdf(700, 1000, 0.5))

2.220446049250313e-16