# Assignment 3: Hypothesis Testing
*Andrea Hassler (ah4412)*

In [1]:
# python 2 and 3 compatibility
from __future__ import print_function

# import relevant modules
import pylab as pl
import pandas as pd
import numpy as np
import scipy.stats as spst

## Hypotheses
### State the Null and Alternative Hypotheses
**NULL:** The mean duration of the new bus route is greater than or the same as the mean duration of the original bus route.  
**ALTERNATIVE:** The mean duration of the new bus route is less than the mean duration of the original bus route.

### State the Null and Alternative Hypotheses as Formulas
$H_0$: $\mu_{population} - \mu_{sample} \leq 0$  
$H_a$: $\mu_{population} - \mu_{sample} > 0$  

## Significance Threshold and Test Choice
### Set alpha level
The significance threshold is set to $\alpha=0.05$.

### Test Choice
We are comparing a sample to a population with known mean and standard deviation, and we assume the population is Gaussian, so we use a one-tailed, one-sample Z-test.

## Find $p$-value
Here, we are finding the probability $p$ that we would obtain our test statistic Z (or one more extreme than Z) given that the null hypothesis were true. In the context of this problem, we are finding the probability of finding a positive difference between the two means that matches or is more extreme than our difference, given that the difference is really 0 or negative.

### Load the Data

In [2]:
# download dataset
times = pd.read_csv("https://raw.githubusercontent.com/fedhere/PUI2018_fb55/master/Lab4_fb55/times.txt", 
                    header=None)

In [3]:
# save population parameters as floats
pop_mean = 36.
pop_sd = 6.

### Calculate Sample Statistics

In [4]:
# find sample statistics
samp_mean = times.values.mean()
samp_sd = times.values.std()
samp_size = float(len(times))

In [5]:
# print all info so far
print("Population Parameters\nMean: {0:.1f} minutes\n\
Standard Deviation: {1:.1f} minutes\n\n\
Sample Statistics\n\
Mean: {2:.1f} minutes\n\
Standard Deviation: {3:.1f} minutes\n\
Sample Size: {4:.0f} trips"\
.format(pop_mean, pop_sd, samp_mean, samp_sd, samp_size))

Population Parameters
Mean: 36.0 minutes
Standard Deviation: 6.0 minutes

Sample Statistics
Mean: 34.5 minutes
Standard Deviation: 7.1 minutes
Sample Size: 100 trips


### Compute Z-statistic

In [6]:
# compute Z
Z = (pop_mean - samp_mean) / (pop_sd/np.sqrt(samp_size))

In [7]:
# compute p-value
p_val = spst.norm.sf(abs(Z))
print("The Z-statistic is {0:.2f}\n\
The computed p-value is {1:.2f}".format(Z, p_val))

The Z-statistic is 2.56
The computed p-value is 0.01


## Decision
Since this is a one-tailed test, the rejection region remains on one side of the distribution and the alpha value is not split between both tails. Since the $p$-value is less than $\alpha$, **we reject the null hypothesis**. Equivalently, the Z-statistic is beyond 2 standard deviations of the mean, or about 97.5% of the distribution (one-tailed), which supports rejecting the null.