## One-Tailed T-Tests

So far we have only been checking for differences between means.

We’ve never really discussed how we can test for specific direction of differences. 

Before we discuss the two-different type of t-tests we can run, let’s go over the fundamentals of calculating a t-test. If there is only one hypothesis test I will ask you all to commit to memory, it will be the t-test.

In [4]:
import random 
from scipy import stats
import scipy

# 1 = heads
# 0 = tails
unfair_coin = [1, 1, 1, 1, 1, 0, 1, 1, 0, 1]
fair_coin = [random.randint(0, 1) for _ in range(0, len(unfair_coin))]

print(sum(unfair_coin)/len(unfair_coin))
print(sum(fair_coin)/len(fair_coin))

p_val = stats.ttest_ind(unfair_coin, fair_coin)[1]
t_val = stats.ttest_ind(unfair_coin, fair_coin)[0]

# H0: u0 = u1
# H1: u1 != u2
print("Two-Tail t-test", p_val)

# H0: u0 = u1
# H1: u1 > u2 if (t_score > 0)
print("One-Tail t-test", p_val/2)

print("T-score ", t_val)

0.8
0.4
Two-Tail t-test 0.07394020035116586
One-Tail t-test 0.03697010017558293
T-score  1.8973665961010278


## Calculating T-Score

A t-test is a parametric method to calculate if the difference between means is significant. 

A two-sample t-test calculates a score called the “t-score”, which is the difference of sample means divided by the standard error of differences.

Standard error of the difference is simply an estimate of the variability in the difference of two sample means: i.e. how spread out do I expect my differences to be?

The t-score that we calculate will fall somewhere on this curve!

By setting a significance level to 0.05, we limit significant scores to the outside of the curve.

Essentially we are stating:  
* We expect the following distribution of t-scores  
* The area under the curve at a specific t-score is the probability that we expect that t-score   
* We only consider results to be significant if they are outliers  

We utilize calculus to calculate the area under the curve of our t-score.

If the area under the curve is underneath 0.05, this indicates significance!

Our p-value is evidence against the null. The smaller the alibi, the worse.

## Cohen's D (Effect Size)

In our t-tests, we’ve only discussed the presence of a statistically significant difference between means.

That is, we can tell if a result is significant or not. But how do we fairly quantify the size of the result?
* Medical data is collected on patient longevity 
* Using our t-test, we find out that there is a statistically significant difference between patients who eat bananas vs patients who don’t (p=0.04)
* However, the average lifespan difference between group A and group B is two minutes.

We utilize Cohen’s D

Cohen’s D is a parametric method that calculates a standard (unit-less) difference in means of two values (mean of sample group 1 vs mean of sample group 2)
This allows us to objectively state the effect of a variation. (small = 0.2, med = 0.5, large = 0.8)

In [2]:
from numpy.random import randn
from numpy import mean
from numpy import var
from math import sqrt

data1 = 10 * randn(10000) + 60
data2 = 10 * randn(10000) + 55

# get lengths of data
n1, n2 = len(data1), len(data2)

# calculate variances
s1, s2 = var(data1, ddof=1), var(data2, ddof=1)

# calculate pooled standard deviation (an estimate of pop std dev)
s = sqrt(((n1 - 1) * s1 + (n2 - 1) * s2) / (n1 + n2 - 2))

# get means
u1, u2 = mean(data1), mean(data2)

# calc cohen's d
cohens_d = (u1 - u2) / s

print(cohens_d)

0.4859598503562626
