The purpose of this demo document is to show the capabilities of Python towards performing hypothesis testing. For a test example, we'll look at a hypothetical scenario where the effect of ascorbic acid is evaluated against the common cold. The data is based on a 1961 experiment where 279 French skiers were given either ascorbic acid or a placebo. After two weeks, they were examined to determine if they still had symptoms of the cold. 

| Treatment     | Cold | No Cold | Total |
|---------------|------|---------|-------|
| Placebo       | 31   | 109     | 140   |
| Ascorbic Acid | 17   | 122     | 139   |
| Total         | 48   | 231     | 279   |



## Defining the Null and Alternative Hypothesis

To define the null and alternative hypotheses, we must frame them off of what we are looking to investigate. As defined above, we want to see how effective Ascorbic Acid is in preventing cold symptoms (link between ascorbic acid and cold symptoms). Because this is a case-control experiment, we'll define the case as those that have cold symptoms and the control as those without. Let $P_1$ be the amount of Ascorbic Acid user that are afflicted with the cold and $P_2$ without.

* $H_0: P_1 = P_2$
* $H_1: P_1 \neq P_2$

## Generate Expected Distribution

In [12]:
## Observed Distribution Array
obs = {
    "a" : 17, ## Ascorbic Acid Treatment, Cold Symptoms 
    "b" : 31, ## Placebo Treatment, Cold Symptoms
    "c" : 122, ## Ascorbic Acid Treatment, No Cold Symptoms
    "d" : 109 ## Placebo Treatment, No Cold Symptoms
}

n = sum(obs.values()) ## Sum of participants

aEXP = (obs['a'] + obs['b']) * (obs['a'] + obs['c']) / n
bEXP = (obs['a'] + obs['b']) * (obs['b'] + obs['d']) / n
cEXP = (obs['c'] + obs['d']) * (obs['a'] + obs['c']) / n
dEXP = (obs['c'] + obs['d']) * (obs['b'] + obs['d']) / n

## Expected Distribution Array
exp = {
    "Ascorbic Acid Treatment, Cold Symptoms" : aEXP,
    "Placebo Treatment, Cold Symptoms" : bEXP,
    "Ascorbic Acid Treatment, No Cold Symptoms" : cEXP,
    "Placebo Treatment, No Cold Symptoms" : dEXP
}

exp

{'Ascorbic Acid Treatment, Cold Symptoms': 23.913978494623656,
 'Ascorbic Acid Treatment, No Cold Symptoms': 115.08602150537635,
 'Placebo Treatment, Cold Symptoms': 24.086021505376344,
 'Placebo Treatment, No Cold Symptoms': 115.91397849462365}

## ${\chi}^2$ Distribution

In [20]:
arrVals = [
    abs(obs['a'] - aEXP) ** 2 / aEXP,
    abs(obs['b'] - bEXP) ** 2 / bEXP,
    abs(obs['c'] - cEXP) ** 2 / cEXP,
    abs(obs['d'] - dEXP) ** 2 / dEXP,
]

sum(arrVals) ## The Chi-Squared Value

4.81141264632079

#### P-value (with 1 DF)

In [35]:
from scipy import stats
pValueChiSq = 1 - stats.chi2.cdf(sum(arrVals) , 1) 
pValueChiSq

0.0282718602468226

Based on the p-value from the ${\chi}^2$ test (p < 0.05), we can reject the null hypothesis that there is not a difference between Ascorbic Acid and placebo treatments.

## Fisher's Exact Test

In [38]:
oddsRatio, pValueFET = stats.fisher_exact([
    [obs['a'], obs['c']], 
    [obs['b'], obs['d']]
], alternative = "less")
pValueFET

0.020522715992754428

Much like the ${\chi}^2$ test, we can reject the null hypothesis that there is not a difference between Ascorbic Acid and placebo treatments.

#### Difference from ${\chi}^2$ test (|${\chi}^2$ p-value - Fisher's p-value|)

In [37]:
abs(pValueChiSq - pValueFET)

0.0077491442540681722

## Relative Risk Ratio

For the purpose of clarity, weâ€™ll define the following variables:

* Exposed: treatment by Ascorbic Acid
* Nonexposed: treatment by Placebo
* Disease: presence of cold symptoms after 2 weeks
* Nondisease: non-presence of cold symptoms

In [51]:
exposed = obs['a'] + obs['c']
nonExposed = obs['b'] + obs['d']

rr = (obs['a'] / exposed) / (obs['b'] / nonExposed)
rr

se = (1 / obs['a'])**0.5
se

12.409673645990857