# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<div class="span5 alert alert-info">
<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****
</div>

In [1]:
import pandas as pd
import scipy.stats as stats
import math

df = pd.read_csv('data/human_body_temperature.csv')
df.head()

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0


<div>
<h3>Exercises</h3>

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> The distribution of body temperatures is normal because it follows the CLT rule of number of samples greater than 30.
    <li> We first use python to check the number of samples.
    <li> And to conclude we use the python method normaltest() to check the distribution.  Since the p-value is greater than .05 we can conclude that the distribution is normal(which is the null hypothesis of this method).
    </ul>
</div>

In [2]:
#create a dataframe with just the temperature values
temp = df.temperature

# Print the number of samples using len() method
print("Number of samples:",len(temp))

# Use the normaltest() method to check if the distribution is normal
tvalue, pvalue =stats.normaltest(temp)

# Print the pvalue
print("Pvalue: ",pvalue)

Number of samples: 130
Pvalue:  0.258747986349


<div>
<h3>Exercises</h3>

<p>2. Is the sample size large? Are the observations independent? 
    <ul>
    <li> As mentioned earlier the sample size is larger than 30, which makes the sample sufficient for tests.
    <li> The observation values are independent because they come from different people.
    </ul>
</div>

<div>
<h3>Exercises</h3>

<p>3.  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <ul>Since we have only one group of samples, I will use a one-sample type of test.</ul>
    <li>In this situation, is it appropriate to use the $t$ or $z$ statistic?
    <ul>We have more than 30 samples we can use the $z$ statistics.</ul>
    <li> Now try using the other test. How is the result be different? Why?
    <ul> I was also able to use the $t$ statistic since $t$ behaves like a normal dist with large number of samples.</ul>
    </ul>
    </div>

<div>
<p>First, I use the z statistic to determine if 98.6 is the true population mean.
    <ul>
    <li> The z-test method gives the z value and the pvalue of the distribution.
    <li> With the Null Hypothesis that 98.6 is the true mean we calculate the pvalue.
    </ul>
    </div>

In [3]:

from statsmodels.stats import weightstats

#Use the ztest() method to calculate the pvalue.
z,p = weightstats.ztest(temp,value = 98.6, usevar='pooled', ddof=1.0)
print('z-statistic:  {z} \np-value:  {p}'.format(z=z,p=p/2))

z-statistic:  -5.4548232923645195 
p-value:  2.4510785070506077e-08


<div>

<p>From the z test method used we have a pvalue of less than .05(or 5% significance)
    <ul>
    <li> With this pvalue we can reject the Null Hypothesis that the true mean of the population is 98.6.
    </ul>
    </div>

In [4]:
# Now we use the sample procedure and calculate the t statistic.
t,p = stats.ttest_1samp(temp,98.6)
print('t-statistic:  {t} \np-value:  {p}'.format(t=t,p=p/2))

t-statistic:  -5.4548232923645195 
p-value:  1.2053160207780638e-07


<div>

<p>From the t test method used we also have a pvalue of less than .05(or 5% significance)
    <ul>
    <li> This method also gives very similar pvalue as the z test.
    <li> With this pvalue we can reject the Null Hypothesis that the true mean of the population is 98.6.
    <li> The true mean of the population is NOT 98.6 F
    </ul>
    </div>

<div>
<h3>Exercises</h3>

<p>4.At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> We first need to calculate the 95% confidence interval.
    <li> We will calculate standard deviation,zvalue, margin of error, standard error.
    </ul>
</div>

In [5]:

#First calculate the z value using the ppf() method with a q = .975(which is the 95% confidence interval)
z_value = stats.norm.ppf(q = 0.975)  # Get the z-value for a 95% confidence interval*

#sample size
sample_size = len(temp)

#mean value of the sample
sample_mean = temp.mean()

#standard deviation
stdev = temp.std() 

#margin of error
margin_of_error = z_value * (stdev/math.sqrt(sample_size))  

#standard error
sem = stdev/math.sqrt(sample_size)

#95% confidence interval
confidence_interval = (sample_mean - margin_of_error,
                       sample_mean + margin_of_error)

print('z-value(95% confidence):  {z} \nMargin of Error:  {m}\nConfidence Invertal(95%): {c}'.format(z=z_value,m=margin_of_error,c=confidence_interval))

z-value(95% confidence):  1.959963984540054 
Margin of Error:  0.1260343410491174
Confidence Invertal(95%): (98.123196428181657, 98.375265110279898)


<div>

<p>As we can see from the above results of the confidence of interval?
    <ul>
    <li> The low temperature of the 95% confidence interval is 98.12 F and the high value is 98.37 F.
    <li> Any body temperature that is below 98.12 F or above 98.37 F is considered ABNORMAL according to this sample statistic.
    </ul>
</div>

<div>
<h3>Exercises</h3>
<p>5. Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
    </div>

<div>
<p>To answer this question I will do the following steps.
    <ul>
    <li> Separate the data from males and females.
    <li> We will use 2 sample ztest method in python that compares the means of two groups of data.
    <li> The Null Hyphothesis is that both means are equal and the pvalue will tell us if rejection is needed.
    </ul>
    </div>

In [6]:


# 2 sample z test is used.  We separate the data by gender.

# separate the data in to two groups depending on gender
male_temp = df[df['gender'] == 'M']['temperature']
female_temp = df[df['gender']== 'F']['temperature']

# use the z test method ztest() to calculate the pvalue of difference in means of the two groups.
z,p = weightstats.ztest(male_temp, female_temp, value = 0, usevar='pooled', ddof=1.0)
print('z-statistic:  {z} \np-value:  {p}'.format(z=z,p=p/2))



z-statistic:  -2.2854345381656103 
p-value:  0.011143680380328775


<div>
<p>As we can see the z-statistic value is -2.285 and the pvalue is .011.
    <ul>
    <li> Since the pvalue is less than our 5% significance(p $<$ .05), we can reject the Null Hypothesis of equal means.
    <li> We can conclude that there is a significant diference in body temperature between males and females.
    </ul>
    </div>