### Test Information
#### Proportion T-Test
##### Description
The proportion t-test is a statistical test used to compare the proportions of two independent groups. It is used to determine if there is a significant difference between the proportions of two groups.

##### Assumptions
* The data is randomly sampled from the population.
* The sample size is sufficiently large (usually greater than 30).
* The data is normally distributed.

##### Formula
The formula for the proportion t-test is:

t = (p1 - p2) / sqrt(p * (1 - p) * (1/n1 + 1/n2))

where:
- t is the test statistic
- p1 and p2 are the proportions of the two groups
- p is the pooled proportion (p = (X1 + X2) / (n1 + n2))
- n1 and n2 are the sample sizes of the two groups
- X1 and X2 are the number of successes in each group

##### Example
Suppose we want to compare the proportion of males and females in two different cities. We collect a random sample of 100 people from each city and find that 60 males and 40 females in city A, and 50 males and 50 females in city B.

| City | Males | Females | Total |
| --- | --- | --- | --- |
| A   | 60   | 40    | 100  |
| B   | 50   | 50    | 100  |

Using the formula above, we can calculate the test statistic and determine if there is a significant difference between the proportions of males and females in the two cities.

In [2]:
import numpy as np

In [3]:
n1 = 247
p1 = .37


n2 = 308
p2 = .39

In [4]:
population1 = np.random.binomial(1,p1,n1)
population2 = np.random.binomial(1,p2,n2)   

In [5]:
population1

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1,
       1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0,
       1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1,
       0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1,
       0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 1, 0, 0, 1])

In [6]:
len(population1)

247

In [15]:
population1.mean()

np.float64(0.4291497975708502)

In [14]:
population2.mean()

np.float64(0.36038961038961037)

In [12]:
!pip install statsmodels

Collecting statsmodels
  Downloading statsmodels-0.14.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.2 kB)
Collecting patsy>=0.5.6 (from statsmodels)
  Downloading patsy-0.5.6-py2.py3-none-any.whl.metadata (3.5 kB)
Downloading statsmodels-0.14.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.7/10.7 MB[0m [31m25.3 MB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
[?25hDownloading patsy-0.5.6-py2.py3-none-any.whl (233 kB)
Installing collected packages: patsy, statsmodels
Successfully installed patsy-0.5.6 statsmodels-0.14.2


In [13]:
import statsmodels.api as sm


In [16]:
t,p,df = sm.stats.ttest_ind(population1,population2)

In [17]:
t

np.float64(1.650831437929816)

In [18]:
p

np.float64(0.09934075670569809)

In [19]:
if p<0.5:
    print('reject Null hypothesis')
else :
    print('fail to reject Null hypothesis')

reject Null hypothesis


### Manual Calculation of 2 Proportion t-test

#### Step 1: Calculate the Pooled Proportion
The pooled proportion (p̄) is the total number of successes divided by the total number of trials.

p̄ = (X1 + X2) / (n1 + n2)

where:
- X1 = number of successes in sample 1
- X2 = number of successes in sample 2
- n1 = sample size of sample 1
- n2 = sample size of sample 2

#### Step 2: Calculate the Standard Error of the Difference
The standard error of the difference (SE) is a measure of the variability of the difference between the two proportions.

SE = sqrt(p̄(1-p̄)(1/n1 + 1/n2))

#### Step 3: Calculate the Test Statistic (t)
The test statistic (t) is a measure of the number of standard errors that the observed difference is away from zero.

t = (p1 - p2) / SE

where:
- p1 = proportion of sample 1
- p2 = proportion of sample 2

#### Step 4: Determine the Degrees of Freedom
The degrees of freedom (df) is the number of values in the final calculation of the test statistic that are free to vary.

df = n1 + n2 - 2

#### Step 5: Look Up the Critical t-value or Calculate the p-value
Using a t-distribution table or calculator, look up the critical t-value for the given degrees of freedom and desired significance level (α). Alternatively, calculate the p-value associated with the test statistic.

#### Step 6: Compare the Test Statistic to the Critical t-value or p-value
If the test statistic is greater than the critical t-value or the p-value is less than the desired significance level, reject the null hypothesis. Otherwise, fail to reject the null hypothesis.

Note: The null hypothesis is typically H0: p1 = p2, and the alternative hypothesis is Ha: p1 ≠ p2.

from scipy import stats
import numpy as np

# Sample data
group1 = np.random.normal(0, 1, 100)
group2 = np.random.normal(1, 1, 100)

# Perform two-proportion t-test
t_stat, p_val = stats.ttest_ind(group1, group2)

print("T-statistic:", t_stat)
print("p-value:", p_val)

# Interpret the results
alpha = 0.05
if p_val < alpha:
    print("Reject the null hypothesis. The means are significantly different.")
else:
    print("Fail to reject the null hypothesis. The means are not significantly different.")