# Hypothesis Testing - Independent Samples Proportions Z-test

### Objective
Objective is to explore and apply methods of testing statistical hypotheses to determine
differences in proportions between two populations.

## 1. Description of Hypothetical Populations:
In this study, we consider two hypothetical populations, representing employees from different groups in a workplace setting:

**Population A (Stress Management Training Group):**
- Employees who participated in a stress management training program.
- These individuals received training on stress management techniques, such as mindfulness exercises, time management strategies, and relaxation techniques.
- Assumption: 40% of the employees who underwent the stress management training program reported a reduction in stress levels and an improvement in overall well-being.

**Population B (No Stress Management Training Group):**
- Employees who did not participate in the stress management training program.
- Assumption: 50% of these employees reported experiencing stress-related issues in the workplace.

## 2. Formulation of Null and Alternative Hypotheses:
We formulate hypotheses to test differences in proportions between Population A and Population B.

**Case 1: Testing that the proportions are not equal:**
- **Null Hypothesis (H₀):** The proportion of employees experiencing stress-related issues is equal in Population A and Population B.
- **Alternative Hypothesis (H₁):** The proportion of employees experiencing stress-related issues is not equal in Population A and Population B.

**Case 2: Testing that the proportion of one population is greater (less) than the other:**
- **Null Hypothesis (H₀):** The proportion of employees experiencing stress-related issues is the same or less (greater) among employees who participated in a stress management training program compared to those who did not.
- **Alternative Hypothesis (H₁):** The proportion of employees experiencing stress-related issues is greater (less) among employees who participated in a stress management training program compared to those who did not.

## 3. Comparison Table:

| Significance Level | Test Type      | Z-Statistic | Non-rejection Region | P-Value | Decision                   |
|--------------------|----------------|-------------|----------------------|---------|----------------------------|
| **α = 0.05**           | Two-tailed     | -2.72       | [-1.96, 1.96]         | 0.0065  | Reject H₀ (p1 ≠ p2)       |
|                    | One-tailed (p1 > p2) | -2.72       | Z < -1.645            | 0.9968  | Fail to Reject H₀ (p1 ≤ p2) |
|                    | One-tailed (p1 < p2) | -2.72       | Z > 1.645             | 0.0032  | Reject H₀ (p1 ≥ p2)       |
| **α = 0.01**           | Two-tailed     | -2.72       | [-2.576, 2.576]       | 0.0065  | Reject H₀ (p1 ≠ p2)       |
|                    | One-tailed (p1 > p2) | -2.72       | Z < -2.326            | 0.9968  | Fail to Reject H₀ (p1 ≤ p2) |
|                    | One-tailed (p1 < p2) | -2.72       | Z > 2.326             | 0.0032  | Reject H₀ (p1 ≥ p2)       |

## 4. Brief Conclusions:
Based on the results of the two-sample proportion test:

**For α = 0.05:**
- For the two-tailed test, since the absolute value of the z-statistic (-2.72) is greater than the critical value of 1.96, we reject the null hypothesis (p1 ≠ p2).
- For the one-tailed test where p1 > p2, since the z-statistic (-2.72) is less than the critical value of -1.645, we fail to reject the null hypothesis (p1 ≤ p2).
- For the one-tailed test where p1 < p2, since the z-statistic (-2.72) is less than the critical value of 1.645, we reject the null hypothesis (p1 ≥ p2).

**For α = 0.01:**
- Similar conclusions can be drawn for the significance level of 0.01, where the critical values for the z-statistic change, but the decisions remain the same.

These results suggest that there is a statistically significant difference in the proportion of employees experiencing stress-related issues between Population A and Population B. However, the direction of the difference depends on the chosen significance level. Further investigation may be warranted to explore the effectiveness of the stress management training program in reducing workplace stress.

In [3]:
import numpy as np
from scipy.stats import norm

# Define parameters
n1 = n2 = 300  # sample sizes
p1 = 0.4  # probability of success in population 1
p2 = 0.5  # probability of success in population 2

# Generating samples
np.random.seed(42)  # for reproducibility
sample1 = np.random.binomial(1, p1, n1)
sample2 = np.random.binomial(1, p2, n2)

# Calculate empirical proportions
empirical_prop1 = np.mean(sample1)
empirical_prop2 = np.mean(sample2)

# Hypothesis testing
alpha_05 = 0.05  # significance level
alpha_01 = 0.1

# Two-tailed test: H0: p1 = p2, H1: p1 ≠ p2
z_statistic = (empirical_prop1 - empirical_prop2) / np.sqrt((p1 * (1 - p1) / n1) + (p2 * (1 - p2) / n2))
p_value_two_tailed = 2 * (1 - norm.cdf(np.abs(z_statistic)))

# One-tailed test: H0: p1 <= p2, H1: p1 > p2
p_value_one_tailed_greater = 1 - norm.cdf(z_statistic)

# One-tailed test: H0: p1 >= p2, H1: p1 < p2
p_value_one_tailed_less = norm.cdf(z_statistic)

# Printing results
print("Empirical proportion in population 1:", empirical_prop1)
print("Empirical proportion in population 2:", empirical_prop2)
print("\nTwo-tailed z-test:")
print("z-statistic:", z_statistic)
print("p-value (alpha=0.05):", p_value_two_tailed)

print("\nOne-tailed z-test (p1 > p2):")
print("z-statistic:", z_statistic)
print("p-value (alpha=0.05):", p_value_one_tailed_greater)

print("\nOne-tailed z-test (p1 < p2):")
print("z-statistic:", z_statistic)
print("p-value (alpha=0.05):", p_value_one_tailed_less)

# Conclusion for alpha=0.05
print("\nConclusion for alpha=0.05:")
if p_value_two_tailed < alpha_05:
    print("Reject the null hypothesis (p1 ≠ p2) at alpha=0.05 level of significance.")
else:
    print("Fail to reject the null hypothesis (p1 ≠ p2) at alpha=0.05 level of significance.")

if p_value_one_tailed_greater < alpha_05:
    print("Reject the null hypothesis (p1 > p2) at alpha=0.05 level of significance.")
else:
    print("Fail to reject the null hypothesis (p1 > p2) at alpha=0.05 level of significance.")

if p_value_one_tailed_less < alpha_05:
    print("Reject the null hypothesis (p1 < p2) at alpha=0.05 level of significance.")
else:
    print("Fail to reject the null hypothesis (p1 < p2) at alpha=0.05 level of significance.")

# Conclusion for alpha=0.1
print("\nConclusion for alpha=0.1:")
if p_value_two_tailed < alpha_01:
    print("Reject the null hypothesis (p1 ≠ p2) at alpha=0.1 level of significance.")
else:
    print("Fail to reject the null hypothesis (p1 ≠ p2) at alpha=0.1 level of significance.")

if p_value_one_tailed_greater < alpha_01:
    print("Reject the null hypothesis (p1 > p2) at alpha=0.1 level of significance.")
else:
    print("Fail to reject the null hypothesis (p1 > p2) at alpha=0.1 level of significance.")

if p_value_one_tailed_less < alpha_01:
    print("Reject the null hypothesis (p1 < p2) at alpha=0.1 level of significance.")
else:
    print("Fail to reject the null hypothesis (p1 < p2) at alpha=0.1 level of significance.")

Empirical proportion in population 1: 0.41
Empirical proportion in population 2: 0.52

Two-tailed z-test:
z-statistic: -2.7217941261796654
p-value (alpha=0.05): 0.006492857745083791

One-tailed z-test (p1 > p2):
z-statistic: -2.7217941261796654
p-value (alpha=0.05): 0.9967535711274581

One-tailed z-test (p1 < p2):
z-statistic: -2.7217941261796654
p-value (alpha=0.05): 0.003246428872541932

Conclusion for alpha=0.05:
Reject the null hypothesis (p1 ≠ p2) at alpha=0.05 level of significance.
Fail to reject the null hypothesis (p1 > p2) at alpha=0.05 level of significance.
Reject the null hypothesis (p1 < p2) at alpha=0.05 level of significance.

Conclusion for alpha=0.1:
Reject the null hypothesis (p1 ≠ p2) at alpha=0.1 level of significance.
Fail to reject the null hypothesis (p1 > p2) at alpha=0.1 level of significance.
Reject the null hypothesis (p1 < p2) at alpha=0.1 level of significance.
