# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  Draw a small sample of size 10 from the data and repeat both tests. 
    <ul>
    <li> Which one is the correct one to use? 
    <li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [1]:
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')

In [2]:
# Your work here.
# My work:

# See what data we have
df.head()

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 3 columns):
temperature    130 non-null float64
gender         130 non-null object
heart_rate     130 non-null float64
dtypes: float64(2), object(1)
memory usage: 3.1+ KB


In [4]:
import scipy
from scipy import stats
import numpy as np

In [5]:
# Run the Scipy D'Agostino and Pearson test for normality

normal_test = scipy.stats.mstats.normaltest(df['temperature'])

In [6]:
print('Test statistic:  {}'.format(normal_test[0]))
print('p-value: {}'.format(normal_test[1]))

Test statistic:  2.703801433319203
p-value: 0.2587479863488254


1) Since the p-value is much lower than the test statistic and is close to 0, there is good evidence to not reject the null hypothesis, that the distribution is normal, therefore, the distribution of body temperatures is normal.

2) There are 130 observations, which is greater than 30, the default minimum, so yes, the sample size is large enough.  Without evidence to the contrary, yes, we can assume the observations are independent.

3) The appropriate test is the one-sample test, because we're comparing a sample mean from a single data set to the accepted mean.  The z-statistic works best with larger sample sizes (> 30) like this one.

In [7]:
# Run the Statsmodels z-test

import statsmodels
from statsmodels import stats
from statsmodels.stats import weightstats

z_test_tstat,z_test_pval = statsmodels.stats.weightstats.ztest(df['temperature'], value=98.6, alternative='two-sided')
print('z-statistic is',z_test_tstat)
print('p-value is',z_test_pval)

z-statistic is -5.45482329236
p-value is 4.9021570141e-08


The p-value is much lower than the default 0.05, so the null hypothesis, that the mean body temperature is 98.6, can be rejected.

In [8]:
# Now run the Scipy one sample t-test for comparison

t_test_tstat,t_test_pval = scipy.stats.ttest_1samp(df['temperature'], 98.6)
print('t-statistic is',t_test_tstat)
print('p-value is',t_test_pval)

t-statistic is -5.45482329236
p-value is 2.41063204156e-07


Using the t-test instead yields the same t-statistic, but a slightly larger p-value.  However, the result is the same and the null hypothesis can be rejected.  The main difference is due to the usual usage between the 2 tests.  The t-test is generally for smaller sample sizes (< 30) and the z-test is generally for larger sample sizes (> 30).

In [9]:
# Setup the small sample and run the 2 tests on it

small_sample = np.random.choice(df['temperature'], 10, replace=False)

small_z_test_tstat,small_z_test_pval = statsmodels.stats.weightstats.ztest(small_sample, value=98.6, alternative='two-sided')
print('z-statistic from small sample is',small_z_test_tstat)
print('p-value from small sample is',small_z_test_pval)

small_t_test_tstat,small_t_test_pval = scipy.stats.ttest_1samp(small_sample, 98.6)
print('t-statistic from small sample is',small_t_test_tstat)
print('p-value from small sample is',small_t_test_pval)

z-statistic from small sample is -2.12948453157
p-value from small sample is 0.0332141933261
t-statistic from small sample is -2.12948453157
p-value from small sample is 0.0620734716746


4) With the small sample size, the t-test would be more appropriate to use.  Between the 2 tests, the p-value is smaller using the z-test with both the large and small sample sizes.  To me, that means the t-test probably safer, because if the z-test is always providing a smaller p-value, always using the z-test would result in more evidence to reject a null hypothesis without further analysis, which might be a false conclusion.

In [10]:
# Compute the confidence interval and margin of error

from statsmodels.stats.weightstats import DescrStatsW

low_bound,up_bound = statsmodels.stats.weightstats.DescrStatsW(df['temperature']).tconfint_mean(alpha=0.05, alternative='two-sided')
margin_error = 1.96 * np.std(df['temperature']) / np.sqrt(len(df))

print('95% Confidence Interval is [{} - {}]'.format(low_bound,up_bound))
print('95% Margin of error is +/-',margin_error)

95% Confidence Interval is [98.12200290560803 - 98.3764586328535]
95% Margin of error is +/- 0.125550964803


5) So, theoretically, the assumed mean body temperature of 98.6 would be, according to this sample, abnormal, since 98.5 is the upper bound of the confidence interval plus the margin of error.  Anything between 98.0 and 98.5 would be normal with 95% confidence.

In [11]:
# Break the sample into male and female groups and make sure the shape is the same

df_male = df[df['gender'] == 'M']
df_female = df[df['gender'] == 'F']

df_male.info()
df_female.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 65 entries, 2 to 128
Data columns (total 3 columns):
temperature    65 non-null float64
gender         65 non-null object
heart_rate     65 non-null float64
dtypes: float64(2), object(1)
memory usage: 2.0+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 65 entries, 0 to 129
Data columns (total 3 columns):
temperature    65 non-null float64
gender         65 non-null object
heart_rate     65 non-null float64
dtypes: float64(2), object(1)
memory usage: 2.0+ KB


In [12]:
t2_test_tstat, t2_test_pval = scipy.stats.ttest_ind(df_male['temperature'],df_female['temperature'],equal_var=False)

print('t-statistic from gender comparison is',t2_test_tstat)
print('p-value from gender comparison is',t2_test_pval)

t-statistic from gender comparison is -2.28543453817
p-value from gender comparison is 0.0239382641829


I ran a 2-sample t-test to determine if there is a significant difference in mean body temperature between males and females.  The p-value of 0.02 is < 0.05, suggesting there is evidence to support rejecting the null hypothesis that there is no significant difference.  However, the somewhat small t-statistic suggests there is not a large difference.

In [13]:
# How about a little extra work?  Since there is a difference between male and female mean body temperature,
# which is farther from the assumed mean of 98.6

tm_test_tstat,tm_test_pval = scipy.stats.ttest_1samp(df_male['temperature'], 98.6)
tf_test_tstat,tf_test_pval = scipy.stats.ttest_1samp(df_female['temperature'], 98.6)
print('t-statistic for male difference is',tm_test_tstat)
print('p-value for male difference is',tm_test_pval)
print('t-statistic for female difference is',tf_test_tstat)
print('p-value for female difference is',tf_test_pval)

t-statistic for male difference is -5.71575744932
p-value for male difference is 3.08384031731e-07
t-statistic for female difference is -2.23549807968
p-value for female difference is 0.0288804507897


Since both p-values are less than 0.05, although the female is close, the null hypothesis can be rejected, and both males and females have mean body temperatures under 98.6.  The t-statistic for males is larger than for females, suggesting the mean body temperature for males is lower than for females.

# Conclusion

Based on all my calculations, the mean body temperature is likely less than 98.6 F, with males having a lower mean body temperature than females.  Certainly, the sample size is very small compared to the total population available, so building another larger sample would be appropriate to verify these results.  Also, since the mean body temperature of females is closer to the assumed 98.6 F, it might be worth researching the gender split of the original book.  From these calculations, it is possible the original book sampled more females than males, leading to the higher assumed mean body temperature.