# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<div class="span5 alert alert-info">
<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****
</div>

In [7]:
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')

In [8]:
import numpy as np
import scipy.stats as scipy
import matplotlib.pyplot as pyplot

In [5]:
# let's have a look at the data first
df.head(10)

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0
5,99.2,M,83.0
6,98.0,M,71.0
7,98.8,M,78.0
8,98.4,F,84.0
9,98.6,F,86.0


In [None]:
# Question 1, is the distribution of body temperatures normal
# Let's try a histogram first
df.temperature.hist(bins=25)
pyplot.show()

In [13]:
# The histogram does not really reflect a normal distribution

In [10]:
# Calculate mean xmean, standard deviation sdev and standard error SE
# xmean, xsdev, SE = df.temperature.mean(), df.temperature.std(), df.temperature.std()/len(df)**0.5
xmean, xsdev, SE = df.temperature.mean(), df.temperature.std(), scipy.stats.sem(df.temperature)
print (xmean, xsdev, SE)

98.24923076923078 0.7331831580389454 0.0643044168379


In [9]:
# Question 2, is the sample size large
len(df)

130

In [None]:
#So it appears to be a rather small sample

In [None]:
# Question 2, are the observations independent
# No idea where to start

In [12]:
# Question 3, is the true population mean really 98.6 degrees F?
# Since we have more than 30 samples, a z-test is most appropriate
# Null hypothesis: H0: population mean = 98.6 
# Alternative hypothesis: H1: population mean <> 98.6 
Z = (xmean - 98.6) / SE
print (Z)

-5.45482329236


In [17]:
# Two-sided test hence we can you below calculation to determine P (probability)
P = scipy.norm.sf(abs(Z))*2
print(P)

4.90215701411e-08


In [None]:
# Probabilty turns out to be almost zero, much smaller than a significance level of 0.05
# Hence we can reject our null hypothesis, meaning the mean temperature is not 98.6 degrees F

In [None]:
# Question 4, at what temperature should we consider someone's temperature to be "abnormal"
# Start by computing the margin of error and confidence interval.

In [19]:
# ME = margin of error
# CI = confidence interval
ME = 1.96 * SE
CI = (xmean - ME, xmean + ME)
print (CI)

(98.123194112228518, 98.375267426233037)


In [None]:
# Hence there is a probability of 95% that any body temperatures below 98.1 degrees F or above 98.4 degrees F are abnormal

In [None]:
# Question 5, is there a significant difference between males and females in normal temperature? 
# Null hypothesis H0: mean male temperature = mean female temperature
# Alternative hypothesis H1: mean male temperature <> mean female temperature

In [29]:
# Check what kind of test we can use:
df.gender.value_counts()
# You should propably also check upon whether observations are independent 

M    65
F    65
Name: gender, dtype: int64

In [56]:
#Calculate Z score and probablity P
f = df.temperature[df.gender=='F']
fmean = f.mean()
fdev = f.std()
m = df.temperature[df.gender=='M']
mmean = m.mean()
medef = m.std()
mean_diff = fmean - mmean
SEfm = (fdev**2/len(f) + mdev**2/len(m))**0.5
Z = mean_diff/SEfm
P = scipy.norm.sf(abs(Z))*2
print(P)

0.0222873607607


In [None]:
# Probabilty turns out to be much smaller than a significance level of 0.05
# Hence we can reject our null hypothesis, meaning there is a significant difference between male and femal normal temperature