## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

## IMPORT PACKAGES AND LOAD DATASET

In [15]:
from matplotlib.pyplot import *
import numpy as np
import pandas as pd
from scipy import stats
np.set_printoptions(precision=4,suppress=True)

useFullPath = True;
if useFullPath:
    df = pd.read_csv('C:\Users\dwhitney\Desktop\Springboard Course\Course Notes\Mini-projects\statistics project 1\data\human_body_temperature.csv')
else:
    df = pd.read_csv('data/human_body_temperature.csv');

# Question #1: Is the distribution of body temperatures normal?

In [16]:
[testStatShapiro  ,pShapiro  ]=stats.shapiro(df['temperature'])
[testStatDAgostino,pDAgostino]=stats.mstats.normaltest(df['temperature']);
print "Test for normality: p-value (Shapiro-Wilk) is {:04.2f} and p-value (D’Agostino Omnibus) is {:04.2f}".format(pShapiro,pDAgostino)

Test for normality: p-value (Shapiro-Wilk) is 0.23 and p-value (D’Agostino Omnibus) is 0.26


To answer this question, I applied two standard statistical tests for normality from the sci-kit package: Shapiro-Wilk and D’Agostino Omnibus test. The null hypothesis in both of these tests is a normal distribution. Because the p-values for both statistical tests are greater than 0.05, we can safely conclude that the temperature dataset does not deviate from a normal distribution.

# Question #2: Is the true population mean really 98.6 degrees F?

In [24]:
[testStat,p]=stats.ttest_1samp(df.temperature,98.6)
zValue=(98.6-df.temperature.mean())/df.temperature.std()
print "One-sample t-test: p={:08.8f}\nOne-sample Z-test: Z={:08.8f}".format(p,zValue);

One-sample t-test: p=0.00000024
One-sample Z-test: Z=0.47841965


A one-sample hypothesis test will help us determine whether the sample mean from our measurements significantly differs from a population mean of 98.6 degrees F. A one-sample t-test is more appropriate here than a one-sample z-test because the true population standard deviation is unknown. We only know about our sample standard deviation, sample mean, and a putative population mean. 

Although it would be incorrect, we could perform a z-test where our null distribution is drawn from our samples (mean and standard deviation) and then test whether 98.6 degrees F significantly differs from this "population." However, if we did setup such a z-test, we would find that 98.6 degrees does not significantly differ from the null distribution. This is because a value of Z=0.48 is less than the critical value of 1.96 required to achieve statistical significance (i.e. p<0.05 for a two-tailed test).

# Question #3: At what temperature should we consider someone's temperature to be "abnormal"?

In [30]:
significanceThreshold=0.05;
n=df.temperature.shape[0];
criticalValue=stats.t.ppf(1-significanceThreshold/2,n-1); # i.e. 1.96
marginOfError=criticalValue*df.temperature.std()/np.sqrt(n)
print "Margin of Error = {:04.2f}, 95% Confidence Intervals Around the Mean: [{:04.2f},{:04.2f}]".format(marginOfError,df.temperature.mean()-marginOfError,df.temperature.mean()+marginOfError)

Margin of Error = 0.13, 95% Confidence Intervals Around the Mean: [98.12,98.38]


Based on the 95% confidence intervals, any temperature values deviating either below 98.12 or above 98.38 would be considered abnormal.

# Question #4: Is there a significant difference between males and females in normal temperature?

In [32]:
[testStat,p]=stats.ttest_ind(df.temperature[df.gender=='M'],df.temperature[df.gender=='F']);
meanTemp_Male = df.temperature[df.gender=='M'].mean();
SEMTemp_Male  = df.temperature[df.gender=='M'].std()/np.sqrt(0.0+df.temperature[df.gender=='M'].shape[0]);
meanTemp_Female = df.temperature[df.gender=='F'].mean();
SEMTemp_Female  = df.temperature[df.gender=='F'].std()/np.sqrt(0.0+df.temperature[df.gender=='F'].shape[0]);
print "Temperatures (Mean+/-SEM): Males={:05.3f}+/-{:05.3f} and Females={:05.3f}+/-{:05.3f}".format(meanTemp_Male,SEMTemp_Male,meanTemp_Female,SEMTemp_Female);
print "Two-sample t-test: p={:04.3f}".format(p);

Temperatures (Mean+/-SEM): Males=98.105+/-0.087 and Females=98.394+/-0.092
Two-sample t-test: p=0.024


Yes, based on a two-sample t-test beteween males and females, there is a significant difference between the temperature of males and females (p<0.05).