# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

# <h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for the Central Limit Theorem to hold (read the introduction on Wikipedia's page about the CLT carefully: https://en.wikipedia.org/wiki/Central_limit_theorem), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    <li> Think about the way you're going to check for the normality of the distribution. Graphical methods are usually used first, but there are also other ways: https://en.wikipedia.org/wiki/Normality_test
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the Central Limit Theorem, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> First, try a bootstrap hypothesis test.
    <li> Now, let's try frequentist statistical testing. Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  Draw a small sample of size 10 from the data and repeat both frequentist tests. 
    <ul>
    <li> Which one is the correct one to use? 
    <li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> As in the previous example, try calculating everything using the boostrap approach, as well as the frequentist approach.
    <li> Start by computing the margin of error and confidence interval. When calculating the confidence interval, keep in mind that you should use the appropriate formula for one draw, and not N draws.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What testing approach did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [1]:
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')

In [2]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns

plt.hist(df.temperature)
plt.xlabel("Temperature")
plt.ylabel("Frequency")
plt.title("Temperature Histogram")

Text(0.5,1,'Temperature Histogram')

In [3]:
#1.Is the distribution of body temperatures normal?

Based soley on the histogram, the data looks normally distributed. However, I will test for normality using the Shapiro-Wilks 
test below. The null hypothesis of the Shapiro-Wilks test is that the population is normally distributed.

In [4]:
stats.shapiro(df.temperature)

(0.9865769743919373, 0.2331680953502655)

Since the p-value (0.2331680953502655) is > 0.05, we do not have enough evidence to reject the null hypothesis that the 
distribution is normal.

In [5]:
#2. Is the sample size large? Are the observations independent? 
df.info()
df.shape

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 3 columns):
temperature    130 non-null float64
gender         130 non-null object
heart_rate     130 non-null float64
dtypes: float64(2), object(1)
memory usage: 3.1+ KB


(130, 3)

With 130 observations, it is safe to say the sample size is large enough to meet the requirements of the CLT. It can also be
assumed that the data is random and independent. This allows us to 

In [12]:
#3. Is the true population mean really 98.6 degrees F?
"""Bootstrap test: (1 sample bootstrap test because we are comparing a sinlge number (mean of 98.6) to an array). Our test 
statistic is the difference of means.
Null Hypothesis: The true population mean of temperature is 98.6 degrees F. 
We must simulate the situation where the true mean of the temperature is equal to 98.6. To do this, we must shift the value of
the sample mean to equal 98.6. 
We must define two functions, one to generate a bootstrap replicate of 1 dimensional data and another to generate many
bootstrap replicates for our hypothesis test:"""
def bootstrap_replicate_1d(data, func):
    return func(np.random.choice(data, size=len(data)))

def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data, func)

    return bs_replicates

"""Shifting data:"""
test_mean = 98.6
temp_shifted = df.temperature - np.mean(df.temperature) + test_mean

"""We must define another function in order to calcualte the test statistic, diff_obs:"""
def diff_from_test_mean(data, test_mean=98.6):
    return np.mean(data) - test_mean

diff_obs = diff_from_test_mean(df.temperature)

"""Next, we generate the 10000 bootstrap replicates and compute the p-value:"""
bs_replicates = draw_bs_reps(temp_shifted, diff_from_test_mean, 10000)

p = (np.sum(bs_replicates <= diff_obs) / 10000)
print('p= {:1.10f}'.format(p))

p= 0.0000000000


Since the bootstrap replicate p value < 0.05 we can reject the null hypothesis that the true mean is 98.6. 

In [13]:
"""Now we will try a one sample t-statistic test as we do not know the population standard deviation"""
t_stat, pvalue = stats.ttest_1samp(df.temperature, test_mean)
print("The t-statistic is {:1.3f} and the p-value is {:1.7f}".format(t_stat, pvalue))

The t-statistic is -5.455 and the p-value is 0.0000002


In [14]:
"""Next we will try the z-statistic"""
n = len(df.temperature)
se = np.std(df.temperature)/np.sqrt(n)
z_stat = (np.mean(df.temperature) - 98.6)/se
p_value = stats.norm.sf(np.abs(z_stat))*2
print("The z-statistic is {:1.3f} and the p-value is {:1.8f}".format(z_stat, p_value))

The z-statistic is -5.476 and the p-value is 0.00000004


In [10]:
#4. Draw a small sample of size 10 from the data and repeat both frequentist tests.
#Which one is the correct one to use?
#What do you notice? What does this tell you about the difference in application of the  t  and  z  statistic?
"""Draw random sample of 10"""
samp10 = np.random.choice(df.temperature, 10)
samp10

array([97.8, 97.2, 98.6, 97.8, 97.6, 98.8, 98. , 98.2, 98. , 98.6])

In [15]:
"""Assign variables"""
samp10 = np.array([97.8, 97.2, 98.6, 97.8, 97.6, 98.8, 98. , 98.2, 98. , 98.6])
pop_mean = np.mean(df.temperature)
samp10_mean = np.mean(samp10)

In [16]:
"""First the t-statistic"""
t_stat10, pvalue10 = stats.ttest_1samp(samp10, pop_mean)
print("The t-statistic is {:1.3f} and the p-value is {:1.7f}".format(t_stat10, pvalue10))

The t-statistic is -1.198 and the p-value is 0.2613652


In [17]:
"""Next we will try the z-statistic"""
se = np.std(samp10)/np.sqrt(10)
z_stat10 = (samp10_mean - pop_mean)/se
p_value10 = stats.norm.sf(np.abs(z_stat))*2
print("The z-statistic is {:1.3f} and the p-value is {:1.8f}".format(z_stat10, p_value10))

The z-statistic is -1.263 and the p-value is 0.00000004


Based off the results of each test on the random sample of 10, we see the z-statistic is the correct test to use. This tells me
that the z-statistic should be used if the population standard deviation is unknown, even if the sample size is less than 30. 

In [18]:
#5. At what temperature should we consider someone's temperature to be "abnormal"?
"""We will compute the 95% confidence interval of the mean using bootstraps technique and functions defined before."""
bs_replicates_mean = draw_bs_reps(df.temperature, np.mean, 10000)

conf_int_95 = np.percentile(bs_replicates_mean, [2.5, 97.5])
conf_int_95

array([98.12536538, 98.37384615])

Based off the 95% confidence interval, we would consider any temperature < 98.125 or > 98.374 to be abnormal.

In [21]:
#6.Is there a significant difference between males and females in normal temperature?
#What testing approach did you use and why?
#Write a story with your conclusion in the context of the original problem.
"""Null hypothesis: there is no significant difference between the average temperatures of males and females. We will need to use
a two sample bootstrap hypothesis test for difference of means because we are comparing the means of two sample arrays. First we
will need ot define the male and female temperature arrays and shift both of the arrays to have the same mean"""
male = df['temperature'][df.gender=='M']
female = df['temperature'][df.gender=='F']
male_shifted = male - np.mean(male) + np.mean(df.temperature)
female_shifted = female - np.mean(female) + np.mean(df.temperature)

In [22]:
"""Now we compute bootstrap replicates from both shifted arrays"""
bs_replicates_male = draw_bs_reps(male_shifted, np.mean, size=10000)
bs_replicates_female = draw_bs_reps(female_shifted, np.mean, size=10000)

In [23]:
"""Next we get replicates of difference of means and set the empirical differnece of means """
bs_replicates_mf = bs_replicates_male - bs_replicates_female
empirical_diff_means = np.mean(male) - np.mean(female)

In [24]:
"""Finally compute the pvalue"""
p = np.sum(bs_replicates_mf >= empirical_diff_means) / len(bs_replicates_mf)
print('p =', p)

p = 0.9891


Since the p-value is greater than 0.05, there is no evidence to reject the null hypothesis. It is likely there is no significant 
difference between male and female normal temperatures.