## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance to answer the following questions:

1.  Is the distribution of body temperatures normal? 
2.  Is the true population mean really 98.6 degrees F?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
4.  Is there a significant difference between males and females in normal temperature?



#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm


In [91]:
import pandas as pd
from scipy.stats import stats
import numpy as np
from statsmodels.stats.weightstats import DescrStatsW, CompareMeans, ttest_ind, ztest, zconfint


In [31]:
df = pd.read_csv('human_body_temperature.csv')

# Exercise

Answer the following questions in this notebook and submit to your Github account. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

In [79]:
df.temperature.describe()

count    130.000000
mean      98.249231
std        0.733183
min       96.300000
25%       97.800000
50%       98.300000
75%       98.700000
max      100.800000
Name: temperature, dtype: float64

In [57]:
#shows distribution is normal - p value above .05

stats.normaltest(df.temperature)

NormaltestResult(statistic=2.7038014333192031, pvalue=0.2587479863488254)

In [94]:
#shows true population mean is not 98.6, p value less than .05, we use a t test because we don't know the population std dev

stats.ttest_1samp(df.temperature, 98.6, axis=0)

Ttest_1sampResult(statistic=-5.4548232923645195, pvalue=2.4106320415561276e-07)

In [74]:
#confidence interval - below 98 or above 98.4 at 99.9% confidence interval

sms.DescrStatsW(df.temperature).tconfint_mean(alpha = .001)

(98.032682658320084, 98.465778880141443)

In [67]:
a = df[df.gender == 'F']

In [69]:
b = df[df.gender == 'M']

In [72]:
#shows no statistically significant difference (p>.05)

stats.ks_2samp(a.temperature, b.temperature)

Ks_2sampResult(statistic=0.18461538461538457, pvalue=0.19539014047941772)