# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<div class="span5 alert alert-info">
<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****
</div>

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import random
import math

df = pd.read_csv('data/human_body_temperature.csv')

### Is the distribution of body temperatures normal?

We can use the normaltest function from scipy to check whether the data is normally distributed.  The function performs a hypothesis test with the data being normally distributed as the null hypothesis.  The p-value in this case is above 0.05, and as such, we cannot reject the null hypothesis.  As such, we can reasonably conclude that the distribution of body temperatures is normal.

In [2]:
stats.normaltest(df['temperature'])

NormaltestResult(statistic=2.7038014333192031, pvalue=0.2587479863488254)

### Is the sample size large? Are the observations independent?

In total, our sample size comprises of 130 observations.  This amount of observations is enough for the hyposthesis tests following.

In [3]:
df['temperature'].count()

130

### Is the true population mean really 98.6 degrees F?

By running a 1-sample t-test, we can determine whether the population mean is 98.6 degrees.  The null hypothesis is that the mean of the sample is 98.6.  Through the t-test, we find that the p value is 0.00000024, easily enough to reject the null hypothesis and conclude that the true population mean is not 98.6.

In [4]:
stats.ttest_1samp(df['temperature'], 98.6)

Ttest_1sampResult(statistic=-5.4548232923645195, pvalue=2.4106320415561276e-07)

### At what temperature should we consider someone's temperature to be "abnormal"?

By calculating the margin of error, and then subtracting or adding it to the temperature mean, we can find the CI. Any temperature inside this CI can be considered normal.  In this case, temperatures between 98.12 and 98.38 are normal, and any other temperatures are abnormal. 

In [5]:
z_crit = stats.norm.ppf(q = 0.975)
pop_stdev = df['temperature'].std()
moe = z_crit * (pop_stdev/math.sqrt(df['temperature'].count()))
print("Margin of error: %f" % moe)
ci = (df['temperature'].mean() - moe , df['temperature'].mean() + moe)
print("Confidence interval:")
print(ci)

Margin of error: 0.126034
Confidence interval:
(98.123196428181657, 98.375265110279898)


### Is there a significant difference between males and females in normal temperature?

As gender should be a independent variable, we can use the ttest_ind function from scipy, with a significance value of 0.05.  This test compares two means for two independent samples, with a null hypothesis stating that the two samples have an equal mean.  In this case, when we run the test on our male and female samples, we find the p value is 0.024, lower than our significance value.  As such, we can safely reject the null hypothesis and conclude that there is a significant difference between males and females in normal temperature, with females having the higher temperature on average.

In [6]:
male_temp = df['temperature'][df['gender'] == 'M']
female_temp = df['temperature'][df['gender'] == 'F']
stats.ttest_ind(male_temp, female_temp)

Ttest_indResult(statistic=-2.2854345381656103, pvalue=0.023931883122395609)