## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [32]:
import pandas as pd
import scipy
from scipy.stats.mstats import normaltest
import numpy as np

In [6]:
temp = pd.read_csv('human_body_temperature.csv',index_col=0)

In [25]:
temp=temp.reset_index()
temp=temp[['temperature','gender']]
temp.head()

Unnamed: 0,temperature,gender
0,99.3,F
1,98.4,F
2,97.8,M
3,99.2,F
4,98.0,F


# Is the distribution normal?
To test if the distribution is normal I apply the scipy.stats.mstats.normaltest . It is based on D’Agostino and Pearson’s [R231], [R232] test that combines skew and kurtosis to produce an omnibus test of normality.
To do this I need to change my column of temperature data into an array.

In [26]:
aa=temp.temperature.as_matrix()
aa

array([  99.3,   98.4,   97.8,   99.2,   98. ,   99.2,   98. ,   98.8,
         98.4,   98.6,   98.8,   96.7,   98.2,   98.7,   97.8,   98.8,
         98.3,   98.2,   97.2,   99.4,   98.3,   98.2,   98.6,   98.4,
         97.8,   98. ,   97.8,   98.2,   98.4,   98.1,   98.3,   97.6,
         98.5,   98.6,   99.3,   99.5,   99.1,   98.3,   97.9,   96.4,
         98.4,   98.4,   96.9,   97.2,   99. ,   97.9,   97.4,   97.4,
         97.9,   97.1,   98.9,   98.3,   98.5,   98.6,   98.2,   98.6,
         98.8,   98.2,   98.2,   97.6,   99.1,   98.4,   98.2,   98.6,
         98.7,   97.4,   97.4,   98.6,   98.7,   98.9,   98.1,   97.7,
         98. ,   98.8,   99. ,   98.8,   98. ,   98.4,   97.4,   97.6,
         98.8,   98. ,   97.5,   99.2,   98.6,   97.1,   98.6,   98. ,
         98.7,   98.1,   97.8,  100. ,   98.8,   97.1,   97.8,   96.8,
         99.9,   98.7,   98.8,   98. ,   99. ,   98.5,   98. ,   99.4,
         97.6,   96.7,   97. ,   98.6,   98.7,   97.3,   98.8,   98. ,
      

In [34]:
scipy.stats.mstats.normaltest(aa, axis=0)

NormaltestResult(statistic=2.7038014333192031, pvalue=0.2587479863488254)

## Normality Test
The null hypothesis of normality is not rejected by the test since the pvalue is 0.26.

## Observe the temperature sample statistics

In [28]:
temp.describe()

Unnamed: 0,temperature
count,130.0
mean,98.249231
std,0.733183
min,96.3
25%,97.8
50%,98.3
75%,98.7
max,100.8


## H0
The null hypothesis that the true population mean is 98.6 degrees F implies a z value of 

In [42]:
z=(98.6-98.249231)/(0.733183/np.sqrt(130))
#where the denominator is the estimated mean sample's standard deviation
z

5.4548208794611863

## This value corresonds to a p value of

In [40]:
p = scipy.stats.norm.sf(abs(z))
p

2.45111178952425e-08

## Hence H0 is rejected

## Someone's temperature can be considered abnormal 
if it differs from the mean with a 5% confidence level, that is with abs(z) = 1.96


In [71]:
t=98.249231+1.96*(0.733183/np.sqrt(130-1))
print(t)

98.3757552008


## or below

In [72]:
t=98.249231-1.96*(0.733183/np.sqrt(130-1))
print(t)

98.1227067992


## Consider now H0 = "Male temperature equals Female temperature"

In [56]:
temp[temp.gender=='F'].describe()

Unnamed: 0,temperature
count,65.0
mean,98.393846
std,0.743488
min,96.4
25%,98.0
50%,98.4
75%,98.8
max,100.8


In [57]:
temp[temp.gender=='M'].describe()

Unnamed: 0,temperature
count,65.0
mean,98.104615
std,0.698756
min,96.3
25%,97.6
50%,98.1
75%,98.6
max,99.5


## Hypothesis test
With 5% confidence, women's temperature is warmer than men's.

In [69]:
t=(98.393846-98.104615)/np.sqrt((0.743488*0.743488+0.743488*0.743488)/64)
t

2.2006241007722189

## Alternatively


In [68]:
aa=temp[temp.gender=='F'].temperature.as_matrix()
bb=temp[temp.gender=='M'].temperature.as_matrix()
scipy.stats.ttest_ind(aa,bb)


Ttest_indResult(statistic=2.2854345381656103, pvalue=0.023931883122395609)

## This confirms the previous result
Modulo numerical error