# T-Testing & Hypothesis Testing

Point estimates and confidence intervals are basic tools for hypothesis ttesting. Statistical hypothesis testing is used to detrmine whether observed data deviates from the expected data. The scipy.stas library contains many functions that we can use to carry out this hypothesis testing.


Statistical hypothesis tests are based on a statement called a null hypothsis, this assumes that nothing of note is ocurring between the test varialbes. So if you are analysing if two groups are different, the null hypothesis of this would be that the groups would be the same. In practice, if you wanted to test idf the average age of one premier league football team is differnt from the leagues average, the null hypothesis would be that the average ages of both are the same. 


The purpose of hypothesis testing is to determine if the null hypothesis  is likely to be true given the sample data provided. If the data cannot make a compelling argument against the null hypothesis, it is accepted to be true. If the argument posed is quite compelling, you will infer that somthign of value or interesting is ocurring within the data. A new hypothesis will be formed, as above it would be that the average age of the team does, in fact, differ from the mean of the league average. 



With the null & alternative hypothesis defined, we must denote a significance level ($\alpha$), this is a probability  threshold at which we will reject the null hypothesis. If the probability of getting a result as extreme as the one observed due to chance, then the null hypothsis is rejected in favour of the alternative. The probability of seeing a result as or more extreme than the one observed is known as the p-value.



A T-test is the statistical test used to detrmine if a data sample differs from the population, or whether two samples differ from eachother. 

<br>

## One Sample T-test

Checks if a group of sample data differ from the population mean. Below I will create some sample data to run this t=test on, and we will see it in practice, but first let's import some python pacakges we will be using in this sheet.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import math

Below I am going to simulate some data, two sets of populations, one large group, and one smaller, a theortical sample taken from the population. We want to carry out a statistical hypothesis test to validate that the average age of our "Ros" sample is lower than that of the over all "Pop" data. We know that the overall average is lower, as we have constructed this data to reflect this difference. Will the statitical test flag that there is a significant difference within this data?

We know the two means of each dataset are different, so the Null hypotheses that the samples come from the same distribution is not valid, however we will show this using statistical testing.

In [2]:
#set the seed so test is exacly replicable
np.random.seed(42)

pop_ages1 = stats.poisson.rvs(loc=18, mu = 35, size = 150000)
pop_ages2 = stats.poisson.rvs(loc=18, mu = 10, size = 100000)
pop_ages = np.concatenate((pop_ages1, pop_ages2))

ros_ages1 = stats.poisson.rvs(loc=18, mu = 30, size = 30)
ros_ages2 = stats.poisson.rvs(loc=18, mu = 10, size = 20)
ros_ages = np.concatenate((ros_ages1, ros_ages2))

In [3]:
print(pop_ages.mean())
print(ros_ages.mean())

42.998628
39.72


Below we will conduct a 95% confidence level T-test to see if it will rightly reject the null hypothesis that the samples come from the same disteibution as the population. We will use the [stats.ttest_1samp()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html) function to do this. 

>"This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean." REF 1

In the below, the arguments are:
a - this is our sample data to bop compared
popmean - this is the mean that the sample date is being compared to.

Within the results we will be prodivded with a p-value, this p-value is the probability that we will see a result to the extreme of the one observed due to chance. 

In [4]:
stats.ttest_1samp(a = ros_ages, popmean = pop_ages.mean()) 

Ttest_1sampResult(statistic=-2.1502410346712346, pvalue=0.03649747507423559)

So in our case above, there would be roughly a 3.6% chance that there is something of interest within the data, i.e that the null hypothesis can be rejected. 

The statistic (t-statistic) result of roughly -2.15 here means how much our sample mean deviates from the population mean. If the t-statistic lies outside the quantiles of the t-distribution coresponding to our confidence level and degrees of freedom, we reject the null hypothsis. Essentially, if we see a result that is "exterme" enough, something of interest is going on within the data. 

To check the quantiles, we will use [stats.t.ppf](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html), with our 95% confidence interaval, we will have 2.5% in both the bottom & top tails, these will be out q or quantile vlaues. df = degres of freedom, and it is our sample size minus one, or n-1, which will be 49 (len(ros_ages)-1). 

In [5]:
(len(ros_ages)-1)

49

In [6]:
# lower quantile
stats.t.ppf(q=0.025, df= 49)

-2.0095752344892093

In [7]:
# upper quantile inverse of the lower
stats.t.ppf(q=0.975, df= 49)

2.009575234489209

Ok, so if our t_statistic lies outside of the above range(+/- of the mean), we can accept our alternative hyptothesis as the result we have observed is far enough away from our population mean that we can consider it different. In this case we can accept that the mean of our ros sample is different to our pop data. We have confirmed, in a statistical manner, what we set our data up to reflect.

## Two Sample T-test

Another type of test we can carry out is that of comparing another similar sample to our "ros" from above, rather than testing against the known populations in the pop, we will compare the two sample means. The [stats.ttest_ind()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html) function. Firstly we will need to create this sample dataset, we'll call it "gal". Our a argument for the independent test will be our "ros" data, and the b will be our "gal" data. The equal_var will be false, this lets you specify if the arguments have equal variance.

In [15]:
np.random.seed(42)

gal_ages1 = stats.poisson.rvs(loc=18, mu = 33, size = 30)
gal_ages2 = stats.poisson.rvs(loc=18, mu = 13, size = 20)
gal_ages = np.concatenate((gal_ages1, gal_ages2))

In [16]:
gal_ages.mean()

42.44

In [17]:
stats.ttest_ind(a = ros_ages,
               b = gal_ages,
               equal_var = False)

Ttest_indResult(statistic=-1.2870506937293775, pvalue=0.20111338521248417)

So our p-value would indicate a 20% chance of the two groups of data were identical if the samples provided are this different. While using a 95% confidence level, the null hypothesis would stand, and we would reject the alternative hypothesis, due to a p-value greater than our significane level of 5%.

## Type I and Type II Errors

A statistical hypothesis test and call on whether to reject the null hypothesis explored above, is not infallible, it provides statistical evidence upon which a conclusion to accept or reject the hypothesis is made. When this decision is incorrect, it falls into one of two categories, type I & type II errors.

A type I error is when the null hypothesis is wrongly rejected, and is often called a "false positive" or  "false hit". The significance level ($\alpha$) is the rate of getting a false positive, so having a higher confidence level will  reduce the chances of running into a type I error. 

Type II errors occur when there is a failure to reject the null hypthesis when it is actually false, commonly known as a "false negative" or a "miss". The higher the confidence level, the greater your risk of making a type II error. 


In laymans terms a type I error is when a fire alarm sounds, but there is no fire present in the building, where as a type II error would be the inverse, when a fire is in the building, but the fire alarm has not sounded.


References: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html
