# Two-Proportion Z-Test

## 1. Introduction
The two-proportion z-test is a statistical method used to determine whether two population proportions are significantly different from each other. This guide will explain the concept, how to perform the test, and how to interpret the results.

## 2. Theoretical Background
### Statistical Hypotheses
- **Null Hypothesis (H₀)**: There is no difference between the two population proportions, symbolically expressed as $( p_1 = p_2 ).$
- **Alternative Hypothesis (H₁)**: There is a difference between the two population proportions, expressed as $( p_1 \neq p_2).$

### Importance of Hypothesis Testing
Hypothesis testing is a fundamental aspect of making inferential statistical conclusions, helping us understand if observed data can be explained by chance or if there are significant differences to be addressed.

## 3. Assumptions of the Two-Proportion Z-Test
- **Sample Size**: Both groups must be sufficiently large, typically each group should have \( np \) and \( n(1-p) \) greater than 5.
- **Independence**: The samples must be independent of each other.
- **Normality**: The sampling distribution should be approximately normal.

## 4. Test Statistic Calculation
### Formula
The z-score is calculated using the formula:
$z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}}$
where:
- $ (\hat{p}_1 )$  and  $( \hat{p}_2 )$ are the sample proportions
- $( n_1 )$ and $( n_2 )$ are the sample sizes
- $( \hat{p} )$ is the pooled proportion of successes, calculated as $( \frac{x_1 + x_2}{n_1 + n_2} )$

### Interpretation of Z-Score
The z-score measures the number of standard deviations the observed difference in proportions is from the null hypothesis. A large absolute value of z indicates more evidence against the null hypothesis.

## 5. Python Code Implementation
### Setting Up the Environment


In [5]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest
import matplotlib.pyplot as plt

In [6]:
def two_proportion_z_test(success_a, size_a, success_b, size_b):
    stat, pval = proportions_ztest([success_a, success_b], [size_a, size_b])
    return stat, pval


In [7]:
def plot_proportions(success_a, size_a, success_b, size_b):
    proportions = [success_a / size_a, success_b / size_b]
    plt.bar(['Group A', 'Group B'], proportions, color=['blue', 'green'])
    plt.ylabel('Proportion')
    plt.title('Proportion of Successes in Each Group')
    plt.show()


### Example 1: Basic

In [8]:
# Sample data
stat, pval = two_proportion_z_test(300, 1000, 250, 1000)
print(f"Z-statistic: {stat}, P-value: {pval}")


Z-statistic: 2.503915429180671, P-value: 0.012282738972377402


## 7. Interpretation of Results
Understanding the output involves interpreting both the z-statistic and the p-value. If the p-value is less than 0.05, we reject the null hypothesis, indicating a significant difference between the two proportions.

## 8. Limitations and Considerations
This test assumes that samples are large and independently drawn. It may not be appropriate for small sample sizes or correlated samples.

## 9. Conclusion
The two-proportion z-test is a powerful statistical tool used to compare the proportions from two different groups when certain conditions are met. This test is particularly useful when dealing with large sample sizes where the Central Limit Theorem ensures (at least in theory...) the normality of the sampling distribution of the proportions. It allows a user to determine if an observed differences in proportions are likely due to just random chance or if those differences reflect actual differences in the population.

This test should be used when:
- You have two independent samples from different populations.
- The sample sizes are large enough for the approximation to the normal distribution to hold (typically, each sample should have at least 10 successes and 10 failures).
- You are interested in testing a hypothesis about the equality (or lack thereof) of the two proportions.

