# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  Draw a small sample of size 10 from the data and repeat both tests. 
    <ul>
    <li> Which one is the correct one to use? 
    <li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('human_body_temperature.csv')

In [3]:
print(df)
#A simple plot shows that we have a sample size of 130 patients with 3 attributes of body temp, gender, and heart rate.


     temperature gender  heart_rate
0           99.3      F        68.0
1           98.4      F        81.0
2           97.8      M        73.0
3           99.2      F        66.0
4           98.0      F        73.0
5           99.2      M        83.0
6           98.0      M        71.0
7           98.8      M        78.0
8           98.4      F        84.0
9           98.6      F        86.0
10          98.8      F        89.0
11          96.7      F        62.0
12          98.2      M        72.0
13          98.7      F        79.0
14          97.8      F        77.0
15          98.8      F        83.0
16          98.3      F        79.0
17          98.2      M        64.0
18          97.2      F        68.0
19          99.4      M        70.0
20          98.3      F        78.0
21          98.2      M        71.0
22          98.6      M        70.0
23          98.4      M        68.0
24          97.8      M        65.0
25          98.0      F        87.0
26          97.8      F     

#Next answer question 1:
##Is the distribution of body temperatures normal?
###Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.

In [4]:
print(df.head())
df.describe()
#Notice the describe fucntion does not give us categorical data counts
#(i.e Male and Female counts), so we will have to find this another way.


   temperature gender  heart_rate
0         99.3      F        68.0
1         98.4      F        81.0
2         97.8      M        73.0
3         99.2      F        66.0
4         98.0      F        73.0


Unnamed: 0,temperature,heart_rate
count,130.0,130.0
mean,98.249231,73.761538
std,0.733183,7.062077
min,96.3,57.0
25%,97.8,69.0
50%,98.3,74.0
75%,98.7,79.0
max,100.8,89.0


In [5]:
pip install pandas-profiling
import pandas_profiling
pandas_profiling.ProfileReport(df)

SyntaxError: invalid syntax (<ipython-input-5-2f0fa0add3aa>, line 1)

In [None]:
print(df['gender'].value_counts())
#Luckily for us, there are 65 patients of each sex! 

##Now to perform some EDA for visualization of if body temperature is normal (gaussian)!

In [None]:
plt.plot(df['temperature'])
plt.show()
#First we simply plot all patients vs body temperature

print(np.mean(df['temperature']))

## With this we see that the distribution of body temperatures does not quite look normal!
### In particular we see there is a lot of data outside the curve, most notably at about  98.8 degrees!

In [None]:
import matplotlib.mlab as mlab
import math

plt.hist(df['temperature'],bins=25, normed = True)
mu = np.mean(df['temperature'])
variance = np.var(df['temperature'])
sigma = math.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 130)
plt.plot(x,mlab.normpdf(x, mu, sigma))
plt.xlabel('Body Temperature Deg F')
plt.ylabel('Frequency')
plt.show()

## Question 2 - Is the sample size large? Are the observations independent?

### The sample size is 130, which should be sufficiently large for our purposes (n >= 30 common rule of thumb).  Asssuming all patients are in fact different people, we must assume all observations are independent.

## Is the true population mean really 98.6 degrees F?

### See above for describe(), true pop mean is 98.2 Deg F

### Would you use a one-sample or two-sample test? Why?

### Two sample t-test may be used as all observations are independent of each other.

## In this situation, is it appropriate to use the  t  or  z  statistic?

### The z statistic as our sample size is sufficiently large and we know our standard deviation.



Now try using the other test. How is the result be different? Why?

In [None]:
print(df.drop.sample(5))
print(df['temperature'].sample(30))


In [None]:
import scipy
from scipy import stats

t_test = scipy.stats.ttest_ind(df['temperature'].sample(30),df['temperature'].sample(30))
    
print(t_test)

In [None]:
import statsmodels.api as sm

z_test = CompareMeans.ztest_ind(df['temperature'].sample(30),df['temperature'].sample(30),alternative='two-sided', usevar='pooled', value=0)

print(z_test)

## #5.At what temperature should we consider someone's temperature to be "abnormal"?
## Start by computing the margin of error and confidence interval.

In [None]:
zscore = scipy.stats.zscore(df['temperature'], axis=0, ddof=0)
print(zscore)

In [None]:
standard_error = scipy.stats.sem(df['temperature'], axis=0, ddof=1, nan_policy='propagate')

print("standard error is " +  str(standard_error))

#Perhaps z = 2 should be conisdered abnormally high? encomapssing 97% under normal conditions


print("standard deviation is " +  str(np.std(df['temperature'])))

## Is there a significant difference between males and females in normal temperature?
### What test did you use and why?
### Write a story with your conclusion in the context of the original problem

In [31]:
#Male temps
dfm = []

for peep in df['gender']:
    #print(peep)
    if peep == 'M':
        dfm.append(df['temperature'])

print("average male temp is " +  str(np.average(dfm)))
print("average male temp is " +  str(np.std(dfm)))

#Female temps
dff = []

for peep in df['gender']:
    #print(peep)
    if peep == 'F':
        dff.append(df['temperature'])

print("average female temp is " +  str(np.average(dff)))
print("average female temp is " +  str(np.std(dff)))

## There is no difference between male and female temperature -- this can be seen visually (numbers are same, this dataset must have been constructed as such...)



average male temp is 98.2492307692
average male temp is 0.730357778905
average female temp is 98.2492307692
average female temp is 0.730357778905
