# Introduction

This method is used to find the p-value in a hypothesis testing. In modern computing we use resampling and permutation of samples to figure out the p-value instead of old table based approaches. Although some similary tests like **Kolmokorov-Sprinov** test still reply on tables values instead of Permutaion and Sampling.

In [1]:
import numpy as np

## Sample 2 random variables

In [11]:
city_1_heights = np.random.uniform(low=140, high=185, size=50)
city_2_heights = np.random.uniform(low=140, high=185, size=50)

In [12]:
city_1_heights

array([ 162.1750727 ,  176.9668943 ,  144.82959801,  176.28823395,
        154.0266655 ,  173.67410065,  159.48609444,  151.39201431,
        177.73125168,  148.55280433,  176.03976771,  155.5956065 ,
        143.83720274,  158.6961084 ,  174.82661103,  164.31506462,
        153.30989551,  153.05149335,  177.28304299,  140.4962265 ,
        168.40433684,  153.48082981,  177.35096746,  156.19651856,
        163.68509129,  182.50075291,  172.09107727,  182.02953439,
        145.16205619,  170.87115606,  166.14328985,  160.81163071,
        179.80727322,  146.75331887,  161.7145309 ,  156.60748839,
        152.10317395,  156.22916131,  160.71556484,  173.30582415,
        166.86902416,  170.392744  ,  167.25188268,  172.56412748,
        162.95830006,  165.51109896,  149.5228493 ,  171.00007839,
        170.43218649,  155.18837737])

In [13]:
city_2_heights

array([ 144.39780371,  154.26090619,  164.65341446,  151.01438857,
        158.66881619,  150.96416582,  179.7023463 ,  176.13094103,
        165.81333735,  177.38790908,  149.69326777,  178.02772412,
        182.99228283,  157.13655253,  142.9411304 ,  180.78069901,
        161.27696419,  160.49720273,  142.40155843,  158.14770991,
        143.26862332,  152.54272157,  162.28388012,  161.91257794,
        180.75513919,  146.99337932,  146.25585699,  184.5087434 ,
        164.92628242,  170.86583692,  184.95097115,  145.09194849,
        174.61813749,  144.75808108,  146.67072194,  145.9479317 ,
        177.50640512,  144.44111688,  144.1073582 ,  143.0370369 ,
        154.16455297,  178.43332768,  155.12163199,  149.27485422,
        166.71797671,  160.31524502,  182.82971931,  169.93563951,
        167.06446858,  153.42470753])

## Calculate the Mean of the two series

In [26]:
mean_city_1 = city_1_heights.mean()
mean_city_2 = city_2_heights.mean()

## Define the test-statistics

In [44]:
Obeserved_Value = round(abs(mean_city_1 - mean_city_2), 2)

In [45]:
Obeserved_Value

2.4100000000000001

## Define Null and Alternative Hypothesis

$$
H_0 = {The\> average\> population\> of\> the\> two\> cities\> is\> not\> different}\\
H_1 = {The\> average\> population\> of\> the\> two\> cities\> is\> different}
$$

## Define the Significance Level

In [80]:
SIGNIFICANCE_LEVEL = 5

## Define Sampling Frequency

In [16]:
SAMPLING_FREQ = 1000

## Combine all the samples

In [19]:
all_heights = np.append(city_1_heights, city_2_heights)

In [20]:
all_heights

array([ 162.1750727 ,  176.9668943 ,  144.82959801,  176.28823395,
        154.0266655 ,  173.67410065,  159.48609444,  151.39201431,
        177.73125168,  148.55280433,  176.03976771,  155.5956065 ,
        143.83720274,  158.6961084 ,  174.82661103,  164.31506462,
        153.30989551,  153.05149335,  177.28304299,  140.4962265 ,
        168.40433684,  153.48082981,  177.35096746,  156.19651856,
        163.68509129,  182.50075291,  172.09107727,  182.02953439,
        145.16205619,  170.87115606,  166.14328985,  160.81163071,
        179.80727322,  146.75331887,  161.7145309 ,  156.60748839,
        152.10317395,  156.22916131,  160.71556484,  173.30582415,
        166.86902416,  170.392744  ,  167.25188268,  172.56412748,
        162.95830006,  165.51109896,  149.5228493 ,  171.00007839,
        170.43218649,  155.18837737,  144.39780371,  154.26090619,
        164.65341446,  151.01438857,  158.66881619,  150.96416582,
        179.7023463 ,  176.13094103,  165.81333735,  177.38790

## Resample of sampling frequency times randomly

In [67]:
test_stats = np.array([])
for each in range(SAMPLING_FREQ):
    s1 = np.random.choice(all_heights, 50, replace=True)  # sample 1
    s2 = np.random.choice(all_heights, 50, replace=True)  # sample 2
    test_stats = np.append(test_stats, round(abs(s1.mean() - s2.mean()), 2))

## Sort all the values in no-decreasing order

In [72]:
test_stats.sort()

In [71]:
test_stats[:10]

array([ 0.  ,  0.01,  0.01,  0.02,  0.03,  0.03,  0.03,  0.03,  0.04,  0.04])

## # of values Values >= ```Obeserved_Value``` in ```test_stats```?

In [75]:
above_values = test_stats[test_stats >= Obeserved_Value].size

In [76]:
above_values

332

## Probability of ```Obeserved_Value``` given the NULL Hypothesis is accepted

$$
P(X \ge {Obeserved\_Value} \mid H_0) = \frac{above\_values}{total} \times 100
$$

In [78]:
p = 332 / 1000 * 100

In [79]:
p

33.2

## Find the significane related Significance Level

In [None]:
if p >= SIGNIFICANCE_LEVEL:
    print('We fail to reject NULL Hypothesis')
else:
    print('We reject t')