# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for the Central Limit Theorem to hold (read the introduction on Wikipedia's page about the CLT carefully: https://en.wikipedia.org/wiki/Central_limit_theorem), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    <li> Think about the way you're going to check for the normality of the distribution. Graphical methods are usually used first, but there are also other ways: https://en.wikipedia.org/wiki/Normality_test
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the Central Limit Theorem, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> First, try a bootstrap hypothesis test.
    <li> Now, let's try frequentist statistical testing. Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>

In [None]:
#Let's take a look at the data
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')

In [None]:
#View the first 5 rows of the data 
df.head()

In [None]:
#Get a sense of the descriptive statistics
df.describe()

### Task 1: Is the distribution of body temperatures normal? 

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import norm


%matplotlib inline 

#Fit the given dataset to a normal distribution. The obtained parameters 
#are the mean and the standard deviation of the fitted normal curve
mu, std = norm.fit(np.array(df['temperature']))

#Generate an array of x-values based on the min and max values of the given boday temperature dataset
xmin, xmax = np.min(df['temperature']), np.max(df['temperature'])
x = np.linspace(xmin, xmax, 1000)

#Calculate the probability of x array
p = norm.pdf(x, mu, std)

#Plotting histogram and the line plot together
plt.hist(df['temperature'], bins=25, density=True, alpha=0.6, color='r')
plt.plot(x, p, 'k', linewidth=2)
title = "Fit Results: mu = %.2f,  std = %.2f" % (mu, std)
plt.xlabel('body temperature')
plt.ylabel('frequency')
plt.title('Histogram of Body Temperature with %s' %title)
plt.legend(['fitted normal distribution'])
plt.show()

### Task 2: Is the sample size large? Are the observations independent?

### Task 3: Is the true population mean really 98.6 degrees F?

In order to solve for this problem, a hypothesis test will need to be conducted. The null hypothesis would be that the true mean of body temperature is 98.6 F while the alternative hypothesis would be the mean of the body temperature is different from 98.6 F. Let T be the mean of body temperature, then the following statements would be applied for this hypothesis test.

#### Step 1: Boostrap Hypothesis Testing

In [None]:
size=10000
alpha = 0.05

# Under the assumption that the null hypothesis is true, we will need to shift the mean of the temperature dataset 
# to be exactly 98.6 
shifted_temp = df['temperature'] -np.mean(df['temperature']) + 98.6

def draw_bs_replicates(data, func, size):
    #Initialize array of replicates 
    bs_replicates = np.empty(size)
    for i in range(size):
        # Generate bootstrap sample: bs_sample
        bs_sample = np.random.choice(data, len(data))
        #Calculate the mean of bootstrap sample
        rep = func(bs_sample)
        bs_replicates[i] = rep
    return bs_replicates

#Compute and print mean of boostrap replicates
bs_replicates = draw_bs_replicates(shifted_temp, np.mean, size)
bs_mean = np.mean(bs_replicates)
print('The mean of bootstrap replicates is %f' %bs_mean)

#Compute and print the standard error of the means
sem = np.std(df['temperature'])/np.sqrt(len(df))
print('The standard error of the mean body temperature is %f:' %sem)


#Compute and print p-value:
observed_mean1 = np.mean(df['temperature'])
interval = 98.6 - observed_mean1
observed_mean2 = bs_mean + interval

p = np.sum(np.sum(bs_replicates < observed_mean1) + np.sum(bs_replicates > observed_mean2))/len(bs_replicates)
print('The p-value of the number of bootstrap replicates that have a mean of 98.6 is %d:' %p, \
      'which is much less than alpha=0.05.')

#Testing null hypothesis using p-value
print(test_null(p))

#Generate histogram plot of the results
plt.hist(bs_replicates, bins=40, normed=True)
plt.xlabel('Mean Body Temperature')
plt.ylabel('Percentage')
plt.axvline(x=98.6, color='r')
plt.axvline(x=observed_mean1, color='k')
plt.axvline(x=observed_mean2, color='k')
plt.show()

#### Step 2: Frequentist Statistical Testing

A one-sample frequentist statistical test would be appropriate in this scenario since the dataset of boday temperatures will be tested against the population mean of 98.6 F. Since the population standard deviation is unknown, the t-statistic is more appropriate than the z-statistic in this case. However, since the sample size of this dataset is 130, which is sufficiently large and thus the results for the mean confidence interval obtained from z-statistic. In this scenario, a two-tailed z-test will be performed for a 95% confidence interval. 

In [None]:
#Testing using z_score

mean = np.mean(df['temperature'])
std = np.std(df['temperature'])
z_val = (mean - 98.6)/(std/np.sqrt(len(df)))
z_test = stats.norm.cdf(z_val)*2
print('The z-value is %f' %z_val + ' and the p-value is %f' %z_test)
print('Since the p-value of the two-tailed z-test is much less than 0.05, ' + test_null(z_test))