## 1 Z-Proportion test

### 1.1 Assumptions for Z-Proportion test

1. Population Standard Deviation $\sigma$ is known.
2. Z-Proportion test is used for testing hypothesis on population parameter type - Proportion.
3. Sampling distribution of sample proportion is normal and its mean is available.

### 1.2 Conditions for Z-Proportion test

1. Random Sampling.
2. Observations are independent of each other.
3. Sample size must be large enough so the sampling distribution of the proportion is approximately normal $np_0 \ge 10$ and $n(1 - p_0) \ge 10$
4. Known population standard deviation.

### 1.3 Types of Z-Proportion tests

1. One Sample Z-Proportion test
2. Two sample Z-Proportion test

> **Note**:
>
> In Two sample Z-Proportion test the two samples can be of different size but must be i.i.d.  
> i.id := Independent and Identically Distributed

## 2 One Sample Z-Proportion test

### 2.1 Nature of hypothesis

- $H_0: p = p_0$
- $H_a: \text{$p \neq p_0$ (or $p > p_0$, $p < p_0$)}$

### 2.2 Test Statistic

1. The name of the test statistic in One sample Z-proportion test is called as Z-statistic.
2. Sample distribution of sample proportion follows Normal Distribution.

$
\begin{align}
\large
\text{Z-Statistic} = \frac{\hat{p} - p}{\sqrt{\frac{p\;(1\;-\;p)}{n}}}
\end{align}
$

> **Note**:
>
> $\text{Z-statistic}$ is different from $\text{Z-Score}$.

### 2.3 API

```python
from statsmodels.stats.proportion import proportions_ztest
```

https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_ztest.html

## 3 Two Sample Z-Proportion test

### 3.1 Nature of hypothesis

- $H_0: p_1 = p_2$
- $H_a: \text{$p_1 \neq p_2$ (or $p_1 > p_2$, $p_1 < p_2$)}$

### 3.2 Test Statistic

1. The name of the test statistic in Two sample Z-proportion test is called as Z-statistic.
2. Sample distribution of sample proportion follows Normal Distribution.

$
\begin{align}
\large
\text{Z-Statistic} = \frac{\hat{p_1}-\hat{p_2}}{\sqrt{\hat{p}\;(1-\hat{p})\;\Bigl(\frac{1}{n_1}+\frac{1}{n_2}\Bigr)}}
\end{align}
$

### 3.3 API

```python
from statsmodels.stats.proportion import proportions_ztest
```

https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_ztest.html

## 4 Quizzes

In [1]:
import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest

### Quiz #1

A fast-food restaurant claims that 80% of their customers prefer their new burger over the old one.  
In a random sample of 100 customers, 85 said they preferred the new burger.  

What is the null and alternative hypothesis?

A. Null Hypothesis $H_0: \text{p = 0.80}$, Alternate Hypothesis $H_1: \text{p $\neq$ 0.80}$  
B. Null Hypothesis $H_0: \text{p = 0.80}$, Alternate Hypothesis $H_1: \text{p < 0.80}$  
C. Null Hypothesis $H_0: \text{p = 0.85}$, Alternate Hypothesis $H_1: \text{p $\neq$ 0.85}$  
D. Null Hypothesis $H_0: \text{p = 0.85}$, Alternate Hypothesis $H_1: \text{p > 0.85}$  

#### Solution

In [2]:
# Given,
p = 0.8
n = 100  # Sample size
x = 85  # observed value

# H_0: p = 80%
# H_a: p != 80%

In [3]:
# Distribution: Normal
# Test statistic: Z-Statistic
# Significance level: 0.05
alpha = 0.05

In [4]:
x = 85  # observed value
# Since H_a contains != symbol, perform two-tailed test.

In [5]:
p_hat = x / n
p_hat

0.85

In [6]:
# Calculate test statistic.
z = (p_hat - p) / np.sqrt((p * (1 - p)) / n)
z.round(6).item()

1.25

In [7]:
# Compute p-value.
p_value = 2 * (1 - stats.norm.cdf(z))  # 2-tailed test
p_value.round(6).item()

0.2113

In [8]:
# Compare p-value with significance level.
if p_value <= alpha:
    print("Reject Null hypothesis")
else:
    print("Failed to reject Null hypothesis")

Failed to reject Null hypothesis


### Quiz #2

You are a product manager for a company that has recently launched a new product.  

Customer satisfaction is a critical metric, and you want to determine if the proportion of  
satisfied customers with the new product meets your target satisfaction level of 70%.

You collected a random sample of 150 customer reviews, and 115 of them expressed satisfaction with the product.

#### Solution #1

Manual Calculation

In [9]:
# Given,
p = 0.7  # 70%
n = 150
x = 115

# H_0: p = 70
# H_a: p != 70

In [10]:
# Distribution: Normal Distribution
# Test Statistic: Z-Statistic
# Significance level: 0.05
alpha = 0.05

In [11]:
# Observed value
x = 115
# Since H_a contains != symbol, perform tow-tailed test.

In [12]:
p_hat = x / n
p_hat

0.7666666666666667

In [13]:
# Calculate test statistic.
z = (p_hat - p) / np.sqrt((p * (1 - p)) / n)
z.round(6).item()

1.781742

In [14]:
# Compute p-value.
p_value = 2 * stats.norm.cdf(-np.abs(z))
p_value.round(6).item()

0.074791

In [15]:
# Compare p-value with significance level.
if p_value <= alpha:
    print("Reject Null hypothesis")
else:
    print("Failed to reject Null hypothesis")

Failed to reject Null hypothesis


#### Solution #2

Using library

In [16]:
# Given,
p = 0.7  # Target
n = 150  # Sample size
x = 115  # observed value

# H_0: p = 70%
# H_a: p != 70%

alpha = 0.05  # Default value

x = 115  # Observed value.
# Since H_a contains != symbol, perform two-tailed test.

# Compute test statistic and p-value.
z_stat, p_value = proportions_ztest(
    count=x,
    nobs=n,
    value=p,
    alternative="two-sided",  # "less" (Left), "greater" (Right).
    prop_var=p,
)
print("z-statistic:", z_stat.round(6).item())
print("p-value:", p_value.round(6).item())

# Compare p-value with alpha.
if p_value <= alpha:
    print("Reject Null Hypothesis.")
else:
    print("Failed to reject Null Hypothesis.")

z-statistic: 1.781742
p-value: 0.074791
Failed to reject Null Hypothesis.


### Quiz #3

A fast-food restaurant claims that 80% of their customers prefer their new burger over the old one.  
In a random sample of 100 customers, 85 said they preferred the new burger.  

What is the null and alternative hypothesis?

A. Null Hypothesis $H_0 : p = 0.80$, Alternative Hypothesis $H_a: p \ne 0.80$  
B. Null Hypothesis $H_0 : p = 0.80$, Alternative Hypothesis $H_a: p < 0.80$  
C. Null Hypothesis $H_0 : p = 0.85$, Alternative Hypothesis $H_a: p \ne 0.85$  
D. Null Hypothesis $H_0 : p = 0.85$, Alternative Hypothesis $H_a: p > 0.85$

#### Solution

Option A:

$H_0 : p = 0.80$  
$H_a : p \ne 0.80$  

### Quiz #4

You are the manager of an e-commerce website, and you have recently implemented a new web page in hopes of increasing sales.  
To evaluate the effectiveness of the new page, you collected data on the conversion rates for both the old and new web pages. The conversion rate is defined as the proportion of visitors who make a purchase.

- For the old web page (Web Page A), you had **1000** visitors, resulting in **50** conversions.
- For the new web page (Web Page B), you had **500** visitors, resulting in **30** conversions.

Now, you want to determine if there is a statistically significant difference in the conversion rates between the old and new web pages.

#### Solution

In [17]:
# Given,
# Number of visits for Web Page A and Web Page B # n_1, n_2
visits = np.array([1000, 500])

# Number of conversions for Web Page A and Web Page B # x_1, x_2
conversions = np.array([50, 30])

In [18]:
p1 = 50 / 1000
p2 = 30 / 500

# H_0: p_1 = p_2
# H_a: p_1 != p_2

In [19]:
# Distribution: Normal Distribution
# Test-statistic: Z-statistic
# Significance level: 0.05
alpha = 0.05

In [20]:
# Since H_a contains != symbol perform two-tailed test.

In [21]:
# Calculate test-statistic and p-value.
z_stat, p_value = proportions_ztest(
    count=conversions,
    nobs=visits,
    alternative="two-sided",
)
print("z-statistic:", z_stat.round(6).item())
print("p-value", p_value.round(6).item())

z-statistic: -0.812534
p-value 0.416485


In [22]:
# Compare p-value with alpha.
if p_value <= alpha:
    print("Reject Null Hypothesis.")
else:
    print("Failed to reject Null Hypothesis.")

Failed to reject Null Hypothesis.
