## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [42]:
import pandas as pd
import numpy as np
import scipy as sp

In [2]:
df = pd.read_csv('data/human_body_temperature.csv')

# Q1 Is the distribution of body temperatures normal?

In [20]:
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline  

In [21]:
temp = np.array(df['temperature'])

In [29]:
meantemp = temp.mean()
stdtemp = temp.std()
print(meantemp)
print(stdtemp)

98.2492307692
0.730357778905


* H0: the temperature is normally distributed with mean = 98.24923 and std = 0.73 
* H1: the temperature is not normally distributed

In [34]:
#stats.stats.normaltest function tests the null hypothesis that a sample comes from a normal distribution. 
# I use significance level of 5% 
_, p_value = stats.stats.normaltest(temp)
print(p_value)

0.258747986349


p_value > 0.05. So we will not be rejecting the null hypothesis and therefore we can assume that the data is normally distributed

**Q1 Is the distribution of body temperatures normal?** : **Yes**

# Q2 Is the true population mean really 98.6 degrees F?

Setting up one sample hypothesis test with 5% significance level

* H0: the true population mean is 98.6 degrees F
* H1: the true population mean **is not** 98.6 degrees F

In [46]:
# Sample mean is meantemp
# Sample standard devitiation 
sstdtemp = stdtemp/np.sqrt(len(temp))
print(len(temp))

130


Our sampling size is greater than 30 therefore, we can use z-statistic for hypothesis testing

In [49]:
# score 
score= (98.6-meantemp)/sstdtemp
print(score)

5.47592520208


In [50]:
zp_value = sp.stats.norm.sf(abs(score)) #one-sided, z-statistic
print(zp_value)

2.17615758294e-08


**One sample hypothesis test based on z-test statistics suggests to reject the null hypothesis**

In [52]:
tp_value = stats.t.sf(np.abs(score), len(temp)-1) #checking p-value for t-statistic
print(tp_value)

1.0943732312e-07


Also very-very small and suggest to reject the null hypothesis

# Q3 At what temperature should we consider someone's temperature to be "abnormal"?

Calculating 99% confidence intervals for the popullation mean

In [66]:
per = 0.99
zscore = np.abs(stats.norm.ppf(abs((1-per)/2)))

In [67]:
confidence_intervals = [meantemp - zscore*sstdtemp, meantemp + zscore*sstdtemp]
print(confidence_intervals)

[98.084231864012764, 98.414229674448791]


Probablity of having temperature < 98.08 or > 98.41 def F is les than 1%. Therefore **temperature < 98.08 or > 98.41** def F can be considered abnormal

# Q4 Is there a significant difference between males and females in normal temperature?

In [71]:
tempF = df['temperature'][df.gender == 'F']
tempM = df['temperature'][df.gender == 'M']

In [72]:
nF = len(tempF)
nM = len(tempM)

meanF = np.mean(tempF)
meanM = np.mean(tempM)

stdF = np.std(tempF)
stdM = np.std(tempM)

In [74]:
difftempmean = meanF - meanM

difftempstd = np.sqrt(stdF**2/nF + stdM**2/nM)

* H0: there is **no significant difference** between males and females in normal temperature (difftempmean = 0)
* H1: there is **significant difference** between males and females in normal temperature

Setting up two-tail hypothesis test with 5% significance level 

In [96]:
per = 0.95
zscore_crit = np.abs(stats.norm.ppf(abs((1-per)/2)))

In [97]:
zscore_crit*difftempstd

0.24612578031405749

In [95]:
sp.stats.norm.sf(difftempmean/difftempstd)

0.010633225915065233

**p_value < 5% - suggesting rejecting the null hypothesis**

95% confidence intervals for the difference between two mean

In [98]:
confidence_intervals = [difftempmean - zscore_crit*difftempstd, difftempmean + zscore_crit*difftempstd]
print(confidence_intervals)

[0.043104988916669501, 0.53535654954478451]


Looks like women are warmer in general..