# Formulate the Null hypothesis in words and in formulae for the 4 experiments below:

### Do diets help lose more fat than the exercise?
Experimental setup: you have a test and a control sample.

**$H_a$**: The change in BMI of people in the diet sample is higher than the change in BMI of people in the exercise sample after 1 year.
$$\Delta BMI_{\%,diet} > \Delta BMI_{\%,exercise}$$

**$H_0$**: The average percent change in BMI of people in the diet sample is significantly lower than the average percent change in BMI of people in the exercise sample after X year(s).
$$\Delta BMI_{\%,diet} \leq \Delta BMI_{\%,exercise}$$

### Quantify the danger of smoking for pregnant women. (Olds et al., 1994, p. 223)
Experimental setup: measure IQ of children of smokers and non-smokers at age 1, 2, 3, and 4 years.

**$H_a$**: The IQ of children whose parents smoked during pregnancy is lower than the ID of children whose parents did not smoke during pregnancy.
$$\overline{IQ}_{smoke} < \overline{IQ}_{no-smoke}$$

**$H_0$**: The average IQ of children at ages 1, 2, 3, and 4 whose parents smoked at least X cigarettes a day during pregnancy is the same as or higher than the average IQ of children whose parents did not smoke during pregnancy.
$$\overline{IQ}_{smoke} \geq \overline{IQ}_{no-smoke}$$

### Food deserts contribute to the incidence of diabetes. 
Food deserts are defined as parts of the country vapid of fresh fruit, vegetables, and other healthful whole foods, usually found in impoverished areas. This is largely due to a lack of grocery stores, farmers' markets, and healthy food providers.

**$H_a$**: The distance to healthy foods is correlated to the level of diabetes.
$$d_{healthy} \propto L_{diabetes}$$

**$H_0$**: There is no correlation between the distribution of diabetes and the lack of access to healthy food in that location.
$$d_{healthy} \not\propto L_{diabetes}$$

# Bus Times

**$H_a$**: The new average bus time is statistically significantly lower than the old average bus time.
$$Z \geq Z_{thresh}$$

**$H_0$**: The new average bus time is not statistically significantly lower than the old average bus time.
$$Z < Z_{thresh}$$

Significance level: $\alpha = 0.05$

Critical value: $Z_{thresh} = -1.645$

In [8]:
from __future__ import print_function
import numpy as np
try:
    import urllib2 as urllib
except ImportError:
    import urllib.request as urllib

In [2]:
data_url = 'https://github.com/fedhere/PUI2017_fb55/raw/master/Lab3_fb55/times.txt'
old_mu = 36
old_sigma = 6
Z_critical_value = -1.645

In [10]:
# Load data string from url
data_str = urllib.urlopen(data_url).read().decode("utf-8")

# Convert line separated strings to np array
data = np.array(filter(bool, data_str.split('\n')), dtype=float)

new_mu = data.mean()
new_sigma = data.std()

print('Data Type:', data.dtype)
print('Data Shape:', data.shape)
print('')
print('Time Mean:', new_mu)
print('Time Standard Deviation:', new_sigma)
data[:10]

Data Type: float64
Data Shape: (100,)

Time Mean: 34.4661616883
Time Standard Deviation: 7.10150406819


array([ 31.62223931,  32.82137636,  30.2291014 ,  31.41376587,
        39.01055035,  34.82207891,  39.87188366,  39.57994562,
        31.02658678,  27.66246068])

### Z Test:
$$Z = \frac{\mu_{sample} - \mu_{pop}}{\sigma / \sqrt{N}}$$

In [11]:
Z = (new_mu - old_mu) * np.sqrt(len(data)) / old_sigma

print('Z score / Critical Value: {:.2f} / {:.2f}'.format(Z, Z_critical_value))

Z score / Critical Value: -2.56 / -1.65


### The Z score exceeded the critical value, showing statistical significance of 0.05. This means that the null hypothesis can be rejected.