## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [2]:
import pandas as pd
import math

In [3]:
df = pd.read_csv('data/human_body_temperature.csv')


1. Is the distribution of body temperatures normal? 
    
    The condition required to satisfy the CLT are:
        a) Independence - Since body temperature of one person does not depend on body temperature of another person, it is reasonable to assume that they are independent
        b) Sample size - The size of the data (n = 130) is obviously less that 10% of the population and greater than the required thumb rule of 30.
        
2. Is the true population mean really 98.6 degrees F?
   
   Since we assume the CLT hold that means the sampling distribution will be centered at the population mean. If we assume that the population mean is indeed 98.6 then let's see where the null hypotheis gets us.
   
   The null hypothesis is that the population mean is 98.6
   
   The alternative hypotheis is that mean tempertaure is not equal to 98.6.
   
   To prove or disprove the null hypothesis let's first find out the sample mean:

In [9]:
mean = df['temperature'].mean()
mean

98.24923076923078

In [10]:
sd = df['temperature'].std()
sd

0.7331831580389454

We don't know the standard deviation of the population so we'll have to make do with the standard deviation of the sample.
Let's calculate the z value for this sample whose mean is 98.24


In [14]:
z = ( mean - 98.6) / sd
z

-0.4784196512470687

Looking from the tables the probablity of seeing this value is 0.3156
This is above the threshold value of 5%. So we can conclude that the null hypotheis is true

3)  At what temperature should we consider someone's temperature to be "abnormal"?
   According to the CLT all the data must lie within 3 standard deviations. Since the mean is at 98.6 a difference of 3 standard deviations means the data must be greater than or less than:

In [15]:
temp_high = 98.6 + 2.58 * sd
temp_low = 98.6 - 2.58 * sd
print temp_high
print temp_low


100.491612548
96.7083874523


4) Is there a significant difference between males and females in normal temperature?


In [19]:
df_male = df[df.gender.str.contains('M')]
df_female = df[df.gender.str.contains('F')]
#calculate the two means
mean_male = df_male['temperature'].mean()
mean_female = df_female['temperature'].mean()
# calculate the two sd
std_male = df_male['temperature'].std()
std_female = df_female['temperature'].std()
#calculate the two z scores
z_male = (mean_male - 98.6) / std_male
z_female = (mean_female - 98.6) / std_female

0.698755762327
0.743487752731


To calculate significance, we have the z values for males and females. Lets calculate the probablity of seeing these results:

In [24]:
import scipy
p_value_male = scipy.stats.norm.sf(abs(z_male))
p_value_female = scipy.stats.norm.sf(abs(z_female))
print p_value_male
print p_value_female

0.239176989955
0.390782783823


Both of these are above the significance value of 5% and hence statistically significant