# Hypothesis Testing

The goal of hypothesis testing is to determine whether observed effects that are seen in a sample are likely to also appear in the larger population or whether it is likely that those effects were due to the randomness inherent in taking a sample.

You will be trying to answer “Given a sample and an apparent effect, what is the probability of seeing such an effect by chance?”

Since you are only looking at a sample of the population, if you calculate statistics from the sample, you are _unlikely_ to get the same values as the corresponding population parameters.

## Resampling/Permutation Tests

With **resampling**, you draw repeated samples from observed data with the goal of assessing random variability in a statistic. Similar to the bootstrap, you are not going to try to analytically determine the distribution of the test statistic, but instead build it out of the observed sample.

**The Big Idea:** We are trying to determine if two samples came from the same underlying distribution. If they came from the same distribution, then the label is irrelevant, and if we shuffle them, then it is still a sample from the same distribution.

You start with the null hypothesis - that the two samples came from the same distribution, and then look at the distribution of some test statistic (eg. difference in means) by randomly permuting the samples a large number of times and recalculating the test statistic.

That is, the $p$-value is the proportion of test statistics calculated from permutations that were _at least as extreme_ as the observed test statistic.

This is a non-parametric method, since you don't care how the data was generated (i.e., it doesn't matter if it was from a normal distribution).

See: http://faculty.washington.edu/yenchic/18W_425/Lec3_permutation.pdf

Let's look at the example with the amount of time spent sleeping. First, capture the observed difference in means.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
sleeping = pd.read_csv('../data/atus_sleeping.csv')

In [None]:
observed_difference = sleeping[sleeping.sex == 'Female'].minutes_spent_sleeping.mean() - sleeping[sleeping.sex == 'Male'].minutes_spent_sleeping.mean()
observed_difference

Now, you will randomly shuffle the sex labels and see what the distribution of differences looks like.

In [None]:
num_Female = len(sleeping[sleeping.sex == 'Female'])
sleep_times = sleeping.minutes_spent_sleeping.tolist()

In [None]:
from nssstats.permutation import generate_permutations, permutation_test_p, permutation_test_plot

In [None]:
permutation_differences = generate_permutations(values = sleep_times, 
                                               label_count = num_Female,
                                               num_permutations = 10000,
                                               statistic = np.mean)

In [None]:
permutation_test_plot(permutation_differences, observed_difference, alternative = 'larger')

In [None]:
permutation_test_p(permutation_differences, observed_difference, alternative = 'larger')

This tells you that if there were no difference in the distribution of sleeping times, you would see a difference in sample means _at least as large_ as what you observed 7.2% of the time. This is not below the 5% threshold, so you will not reject the null hypothesis that there is not difference in the distribution of sleeping times among males and females.

Now, repeat this for the grooming dataset.

In [None]:
grooming = pd.read_csv('../data/atus_grooming.csv')
observed_difference = grooming[grooming.sex == 'Female'].minutes_spent_grooming.mean() - grooming[grooming.sex == 'Male'].minutes_spent_grooming.mean()
observed_difference

In [None]:
num_Female = len(grooming[grooming.sex == 'Female'])
grooming_times = grooming.minutes_spent_grooming.tolist()

In [None]:
permutation_differences = generate_permutations(values = grooming_times,
                                                label_count = num_Female,
                                                statistic = np.mean)

In [None]:
permutation_test_plot(permutation_differences, observed_difference, alternative = 'larger')

In [None]:
permutation_test_p(permutation_differences, observed_difference, alternative = 'larger')

In this case, if the null hypothesis were true and there really was no difference in the distribution of time spent grooming for males and females, then you would observe a result that was as extreme or more extreme only 0.3% of the time. This is below the threshold of 5%, so you will reject the null hypothesis and conclude that there **is** a difference in the distribution of grooming times for males and females.

## Permutation Test for Proportion

You can also perform permutation tests for the difference in proportions. Here, you will randomly shuffle the values and recalculate the difference in observed proportions across groups.

You can use the same functions as above by passing in a list of Booleans (True/False), since computing the mean of a list of Booleans is the same as calculating the proportion of True values. Python treats True as 1 and False as 0.

In [None]:
squirrels = pd.read_csv('../data/2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv')

In [None]:
squirrels = squirrels[~squirrels['Primary Fur Color'].isna()]

squirrels['Black'] = squirrels['Primary Fur Color'] == 'Black'

In [None]:
observed_difference = np.mean(squirrels[squirrels['Black'] == True]['Runs from']) - np.mean(squirrels[squirrels['Black'] == False]['Runs from'])
observed_difference

So in our data, we say that the proportion of black squirrels that ran away from humans was 8.8% higher than the proportion of non-black squirrels that ran away from humans.

In [None]:
num_black = len(squirrels[squirrels['Black'] == True])
run = squirrels['Runs from'].to_list()

In [None]:
permutation_differences = generate_permutations(values = run, label_count = num_black)

In [None]:
permutation_test_plot(permutation_differences, observed_difference, alternative = 'larger')

In [None]:
permutation_test_p(permutation_differences, observed_difference, alternative = 'larger')

Here, you would only see an observation that was at least as extreme 2.4% of the time if there was no difference between black squirrels and other squirrels. Therefore, you reject the null hypothesis and conclude that black squirrels are more likely to run away.

## Permutation Testing of Correlation

Let's see how to conduct a hypothesis test about correlation. We'll step through the example from the slides. Recall that the null and alternative hypotheses were

$$H_0: \text{The correlation between temperature and NOx concentration is 0}$$

$$H_1: \text{There is a negative correlation between temperature and NOx concentration.}$$

Read in the data.

In [None]:
air_quality = pd.read_csv('../data/air_quality.csv')

In [None]:
air_quality.head()

The scatterplot of the two relevant variables: `Temperature` and `NOx`

In [None]:
air_quality.plot(kind = 'scatter', x = 'Temperature', y = 'NOx');

The observed correlation:

In [None]:
np.corrcoef(air_quality['Temperature'], air_quality['NOx'])

In [None]:
observed_correlation = np.corrcoef(air_quality['Temperature'], air_quality['NOx'])[0,1]

To conduct the permutation test, you can first generate the permutations using the generate_permutations_correlation function from nssstats.permutations.

In [None]:
from nssstats.permutation import generate_permutations_correlation

To use this function, you need to pass in the two columns on interest as lists.

In [None]:
permutation_correlations = generate_permutations_correlation(air_quality['Temperature'].tolist(), 
                                                             air_quality['NOx'].tolist())

Now you can get the p-value and plot the permuted correlations.

In [None]:
permutation_test_p(permutation_correlations, observed_correlation, alternative = 'smaller')

In [None]:
permutation_test_plot(permutation_correlations, observed_correlation, alternative = 'smaller')

You can see from the $p$-value and plot that the observed correlation would not be all that unusual if the null were true. You can expect to see an observed correlation _at least as large_ more than 12% of the time. Thus, you cannot reject the null hypothesis. The data does not provide enough evidence to conclude that there is a negative correlation between temperature and NOx value.