## Intuition: 
People who live nearby visit their homes more frequently than those who live farther away

### For Testing Purpose: 
- **Null Hypothesis:** &nbsp;&nbsp;$\hat{p}_1 - \hat{p}_2 \leq 0$  
- **Alternate Hypothesis:** &nbsp;&nbsp;$\hat{p}_1 - \hat{p}_2 > 0$  

Where,
- $\hat{p}_1$: Proportion of students living nearby and visiting home >= 3 times
- $\hat{p}_2$: Proportion of students living far away and visiting home >=  3 times


In [1]:
import numpy as np
import pandas as pd
from scipy.stats import norm


### Taking $\alpha$ = 0.05 

In [2]:
alpha = 0.05
distance = 750 

df = pd.read_csv( "../data.csv" )

nearBy = df[df['dist'] <= distance ]
nearBy_freq = nearBy[(nearBy['freq'] == '3-4') | (nearBy['freq'] == '5-10') | (nearBy['freq'] == '>10')]

p_near = nearBy_freq.shape[0]/nearBy.shape[0]
print( "P_near: " , p_near)

P_near:  0.8391608391608392


In [3]:
farAway = df[df['dist'] > distance ]
farAway_freq = farAway[(farAway['freq'] == '3-4') | (farAway['freq'] == '5-10') | (farAway['freq'] == '>10')]

p_far = farAway_freq.shape[0]/farAway.shape[0]
print( "P_far: ", p_far )

P_far:  0.5252525252525253


### Now finding Test Statistic: 

using formula: 
#### $$ Z^* = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{ \frac{\hat{p}_1(1-\hat{p}_1)}{n1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n2}   }} $$



In [4]:
var_far = (p_far*(1-p_far)) / farAway.shape[0] 
var_near = (p_near*(1-p_near)) / nearBy.shape[0] 

deviation = np.sqrt( var_far + var_near )

test_statistic = (p_near - p_far) / deviation 
print( "Test_Statistic: ", test_statistic )  

Test_Statistic:  5.334553698609998


### Calculating *p-value* and comparing with $\alpha$
using formula: 
#### $$ p\text{-value} = P( Z \geq Z^* ) $$


In [5]:
mean = 0
std_dev = 1 

p_value = 1 - norm.cdf( test_statistic, loc=mean, scale=std_dev )
print("P-value: ", p_value )

P-value:  4.7889920473664915e-08


### Conclusion: 

In [6]:
if p_value < alpha:
    print("Since p < \u03B1")
    print("Null Hypothesis rejected and Alternate Hypothesis accepted") 
else: 
    print("Failed to reject this hypothesis with given data")

Since p < α
Null Hypothesis rejected and Alternate Hypothesis accepted
