##  What Is a Z-Test?

A **z-test** is a **statistical hypothesis test** used to compare a sample to a population when:

* The **population standard deviation** is **known** (unlike the t-test).
* The **sample size is large** (typically $n > 30$).

It's based on the **standard normal distribution** (mean = 0, std = 1).


---

## Z-Test Assumptions

### 1. **Population Standard Deviation (σ) is Known**

* The main reason you use a Z-test instead of a t-test.
* In real life, σ is rarely truly known, so Z-tests are mostly used when it’s given (e.g., in industrial quality control).

---

### 2. **Random Sampling**

* The sample must be **randomly selected** from the population to avoid bias.

---

### 3. **Independence of Observations**

* Each observation in the sample should be **independent** of the others.
* This means no repeated measurements on the same subject (unless paired test is explicitly designed).

---

### 4. **Normality of the Population Distribution**

* If the population is normally distributed → Z-test works directly for **any n**.
* If the population is **not normal** → Central Limit Theorem kicks in for **large n (usually n > 30)**, making the sample mean approximately normal.

---

### 5. **Scale of Measurement**

* Data should be **interval** or **ratio scale** (not ordinal or nominal).

---

##  **Summary Table**

| Assumption                     | Why It Matters                                       |
| ------------------------------ | ---------------------------------------------------- |
| Known σ                        | Needed for standardization with normal distribution  |
| Random sample                  | Ensures representativeness and validity of inference |
| Independence                   | Avoids inflated significance due to correlation      |
| Normal distribution or large n | Justifies using the standard normal curve            |
| Interval/ratio data            | Means and standard deviations are meaningful         |



**Practical Tip:**: If **σ is unknown** or **n is small**, switch to a **t-test** instead.


---

##  What Is a Z-Score?

The **z-score** measures how far a data point or sample mean is from the population mean, in units of standard deviation.

### Formula:

$$
z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}
$$

Where:

* $\bar{x}$ = sample mean
* $\mu$ = population mean
* $\sigma$ = population standard deviation
* $n$ = sample size

> A **z-score of 2** means your sample mean is 2 standard errors above the population mean.

---

##  Types of Z-Test

### 1. **One-Sample Z-Test**

Used to compare a **sample mean** with a **known population mean** when **population standard deviation is known**.

####  Example:

You want to know if your class of 50 students, with an average score of 72, is different from the population mean of 70, assuming population std dev is 10.

---

### 2. **Two-Sample Z-Test (Independent)**

Used to compare **two independent sample means** when **population std deviations are known**.

####  Example:

Compare average heights of male and female students in two schools, assuming both schools have known standard deviations.

---

### 3. **Z-Test for Proportions**

Used when comparing **proportions**, such as success/failure rates, click-through rates, etc.

####  Example:

You ran an A/B test:

* Version A: 40 out of 200 users clicked.
* Version B: 30 out of 200 users clicked.

> Is there a significant difference in click-through rate?

---

## **Python Code**

###  1. One-Sample Z-Test



In [9]:
import numpy as np
from scipy.stats import norm

sample_scores = [72, 74, 70, 76, 71, 69, 75, 73, 70, 74]
pop_mean = 70
pop_std_dev = 5

# Calculate the sample mean and size
sample_mean = np.mean(sample_scores)
n = len(sample_scores)

# Calculate the Z-statistic using the formula
z_stat = (sample_mean - pop_mean) / (pop_std_dev / np.sqrt(n))

# Calculate the two-sided p-value
p_value = 2 * norm.sf(abs(z_stat))

print(f"z-statistic: {z_stat:.2f}")
print(f"p-value: {p_value:.4f}")

z-statistic: 1.52
p-value: 0.1290


**Interpretation:**

* p < 0.05 ⇒ reject null hypothesis.
* Your class mean is **significantly different** from the population.

---

###  2. Two-Sample Z-Test (Independent)

In [12]:
import numpy as np
from scipy.stats import norm

group1 = np.array([170, 172, 168, 169, 171])
group2 = np.array([165, 166, 164, 167, 163])

# Assume known and equal population std dev = 2
pop_std_dev = 2
n1 = len(group1)
n2 = len(group2)

# Calculate the sample means
mean1 = np.mean(group1)
mean2 = np.mean(group2)

# Calculate the Z-statistic
# Formula for two-sample z-test with known standard deviation:
# z = (mean1 - mean2) / sqrt( (sigma^2 / n1) + (sigma^2 / n2) )
z_stat = (mean1 - mean2) / np.sqrt((pop_std_dev**2 / n1) + (pop_std_dev**2 / n2))

# Calculate the two-sided p-value
p_value = 2 * norm.sf(abs(z_stat))

print(f"z-statistic: {z_stat:.2f}")
print(f"p-value: {p_value:.4f}")


z-statistic: 3.95
p-value: 0.0001


---

###  3. Z-Test for Proportions

In [3]:
from statsmodels.stats.proportion import proportions_ztest

clicks = [40, 30]          # successes
views = [200, 200]         # trials

z_stat, p_value = proportions_ztest(clicks, views)
print(f"z-statistic: {z_stat:.2f}")
print(f"p-value: {p_value:.4f}")

z-statistic: 1.32
p-value: 0.1882


---

| Test Type              | Use Case                               | Python Function         |
| ---------------------- | -------------------------------------- | ----------------------- |
| One-Sample Z-Test      | Compare sample mean to population mean | `ztest()` (statsmodels) |
| Two-Sample Z-Test      | Compare two independent means          | `ztest(x1, x2)`         |
| Z-Test for Proportions | Compare two proportions                | `proportions_ztest()`   |

---

## When to Use t-Test vs z-Test?

| Feature                   | t-Test                 | z-Test         |
| ------------------------- | ---------------------- | -------------- |
| Population std dev known? | ❌ No                   | ✅ Yes          |
| Sample size               | Small (n < 30)         | Large (n > 30) |
| Distribution assumption   | Normal / nearly normal | Normal         |

---

## **Numerical Examples**
###  **1. One-Sample Z-Test**

**Scenario:**

* Sample mean $\bar{x} = 72$
* Population mean $\mu = 70$
* Population std deviation $\sigma = 10$
* Sample size $n = 50$

---

### **z-statistic:**

$$
z = \frac{72 - 70}{10 / \sqrt{50}} = \frac{2}{10 / 7.07} = \frac{2}{1.41} \approx 1.41
$$

Critical z-value at $\alpha = 0.05$ (two-tailed): **1.96**

---

###  Conclusion:

* $z = 1.41 < 1.96$ → **Not significant**
* Sample mean not significantly different from 70.

---

##  **2. Two-Sample Z-Test**

**Scenario:**

* Group A: $\bar{x}_1 = 84.33, \sigma_1 = 10, n_1 = 100$
* Group B: $\bar{x}_2 = 80.00, \sigma_2 = 10, n_2 = 100$

---

### **z-statistic:**

$$
z = \frac{84.33 - 80}{\sqrt{\frac{10^2}{100} + \frac{10^2}{100}}} = \frac{4.33}{\sqrt{2}} = \frac{4.33}{1.41} \approx 3.07
$$

Critical z = 1.96 → $3.07 > 1.96$

---

### Conclusion:

* Significant difference between means.

---

## **3. Z-Test for Proportions**

**Scenario:**

* Version A: 40 clicks / 200 → $p_1 = 0.20$
* Version B: 30 clicks / 200 → $p_2 = 0.15$

---

### **Pooled proportion:**

$$
p = \frac{40 + 30}{200 + 200} = \frac{70}{400} = 0.175
$$

$$
SE = \sqrt{p(1-p)(\frac{1}{n_1} + \frac{1}{n_2})} = \sqrt{0.175 \cdot 0.825 \cdot \left(\frac{1}{200} + \frac{1}{200}\right)} = \sqrt{0.175 \cdot 0.825 \cdot 0.01} \approx \sqrt{0.00144375} \approx 0.038
$$

---

### **z-statistic:**

$$
z = \frac{0.20 - 0.15}{0.038} \approx \frac{0.05}{0.038} \approx 1.32
$$

Critical z = 1.96

---

###  Conclusion:

* $z = 1.32 < 1.96$ → **Not significant**
* No significant difference in click rates.

---

