# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for the Central Limit Theorem to hold (read the introduction on Wikipedia's page about the CLT carefully: https://en.wikipedia.org/wiki/Central_limit_theorem), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    <li> Think about the way you're going to check for the normality of the distribution. Graphical methods are usually used first, but there are also other ways: https://en.wikipedia.org/wiki/Normality_test
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the Central Limit Theorem, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> First, try a bootstrap hypothesis test.
    <li> Now, let's try frequentist statistical testing. Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  Draw a small sample of size 10 from the data and repeat both frequentist tests. 
    <ul>
    <li> Which one is the correct one to use? 
    <li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> As in the previous example, try calculating everything using the boostrap approach, as well as the frequentist approach.
    <li> Start by computing the margin of error and confidence interval. When calculating the confidence interval, keep in mind that you should use the appropriate formula for one draw, and not N draws.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What testing approach did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import plotly as py
import plotly.graph_objs as go
import plotly.figure_factory as ff

py.offline.init_notebook_mode(connected=True)
df = pd.read_csv('data/human_body_temperature.csv')

In [2]:
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 3 columns):
temperature    130 non-null float64
gender         130 non-null object
heart_rate     130 non-null float64
dtypes: float64(2), object(1)
memory usage: 3.1+ KB


Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0


In [3]:
unique_temps = df.temperature.unique()
unique_temps.sort()
print(unique_temps)
print(len(unique_temps))

[ 96.3  96.4  96.7  96.8  96.9  97.   97.1  97.2  97.3  97.4  97.5  97.6
  97.7  97.8  97.9  98.   98.1  98.2  98.3  98.4  98.5  98.6  98.7  98.8
  98.9  99.   99.1  99.2  99.3  99.4  99.5  99.9 100.  100.8]
34


***
#### 1.) Is the distribution of body temperatures normal?

In [4]:
def ecdf(data):
    """Computes the Empirical Cummulative Distribution Function of a 1-dimensional numerical array"""
    n = len(data)
    x = np.sort(data)
    y = np.arange(1, n + 1) / n
    return x, y

In [5]:
x_temp, y_temp = ecdf(df.temperature)

sample_mean = np.mean(df.temperature)
sample_stdv = np.std(df.temperature)
samps_nrm = np.random.normal(sample_mean, sample_stdv, 10000)
x_nrml, y_nrml = ecdf(samps_nrm)
x_mean, y_mean = np.array([[sample_mean, sample_mean], [-0.05, 1.05]])


trace0 = go.Histogram(name="Actual Samples", x=x_temp, histnorm="probability")
trace1 = go.Scatter(name="Actual Samples", x=x_temp, y=y_temp, mode="markers", marker=dict(color="rgba(31,119,180,1)"))
trace2 = go.Scatter(name="Normal Distribution", x=x_nrml, y=y_nrml, mode="line")
trace3 = go.Scatter(x=x_mean, y=y_mean, mode="lines", line=dict(color="red"), name="Actual Sample Mean")

fig = py.tools.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.01)
fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 2, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 1)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=600,
    title="<b>Human Body Temperature Sample Distribution</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(title="<b>PDF</b>", titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)), 
    yaxis2=dict(title="<b>ECDF</b>", titlefont=dict(family="serif", size=14), dtick=0.2,
               tickfont=dict(family="serif", size=14), range=[-0.05, 1.05]),
    xaxis1=dict(title="<b>Human Body Temperature, {0}F</b>".format(u'\xb0'), dtick=0.5,
               titlefont=dict(family="serif", size=14), tickfont=dict(family="serif", size=14)))

py.offline.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x1,y2 ]



With some quick, graphical exploratory data analysis (EDA) the human body temperature sample distribution appears to be normally distributed.

***
#### 2.) Is the sample size large? Are the observations independent?

Yes, the sample size can be considered large enough (n >= 30) for statistical inference because there are 130 samples (n = 130). Yes, the samples can be considered as independent observations without replacement because the sample size is well below 10% of the population being considered (all humans).

***
#### 3.) Is the true population mean really 98.6 degrees F?
<ul>
    *<li> First, try a bootstrap hypothesis test.*
    *<li> Now, let's try frequentist statistical testing. Would you use a one-sample or two-sample test? Why?*
    *<li> In this situation, is it appropriate to use the $t$ or $z$ statistic?* 
    *<li> Now try using the other test. How is the result be different? Why?*
</ul>

We need to start this statistical inference analysis by setting up some hypothesis testing parameters. The parameters I used are as follows:  
  
**Null Hypothesis:** The Population Mean Human Body Temperature is equal to 98.6 degrees F (mu = 98.6)  
**Alternative Hypothesis:** The Population Mean Human Body Temperature is NOT equal to 98.6 degrees F (mu != 98.6)  
**Confidence Interval:** 95%  
**Signifigence Level:** 5%  

In [6]:
def draw_bs_replicates(data, func, size=1):
    """Computes the bootstrap replicate of a 1-dimensional numerical array"""
    bs_replicates = np.empty(size)
    
    for i in range(size):
        bs_sample = np.random.choice(data, size=len(data))
        bs_replicates[i] = func(bs_sample)
        
    return bs_replicates

In [7]:
# Bootstrap Replication Statistical Analysis
mu0 = 98.6
bs_means = draw_bs_replicates(df.temperature, np.mean, 1000)
x_bsrp, y_bsrp = ecdf(bs_means)
x_mu0, y_mu0 = np.array([[mu0, mu0], [-0.05, 1.05]])

trace0 = go.Histogram(name="Bootstrap Means", x=x_bsrp, histnorm="probability")
trace1 = go.Scatter(x=x_bsrp, y=y_bsrp, mode="markers", name="Bootstrap Means", marker=dict(color="rgba(31, 119, 180, 1)"))
trace2 = go.Scatter(x=x_mean, y=y_mean, mode="lines", line=dict(color="rgba(31, 119, 180, 1)"), name="Actual Sample Mean")
trace3 = go.Scatter(name="Null Hypothesis Mean", x=x_mu0, y=y_mu0, mode="lines")

fig = py.tools.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.01)
fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 2, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 1)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=600,
    title="<b>Bootstrap Replication of Human Body Temperature Means</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(title="<b>PDF</b>", titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)), 
    yaxis2=dict(title="<b>ECDF</b>", titlefont=dict(family="serif", size=14), dtick=0.2,
               tickfont=dict(family="serif", size=14), range=[-0.05, 1.05]),
    xaxis1=dict(title="<b>Human Body Temperature, {0}F</b>".format(u'\xb0'), dtick=0.1,
               titlefont=dict(family="serif", size=14), tickfont=dict(family="serif", size=14), range=[98, 98.7]))

py.offline.iplot(fig)

print("Bootstrap Replication Statistics for mu=98.6{0}F:".format(u'\xb0'))

bs_conf_int = np.percentile(bs_means, [2.5, 97.5])
bs_moe = (bs_conf_int[1] - bs_conf_int[0]) / 2
print("     The margin of error is {0}".format(u"\u00B1") + str(round(bs_moe, 3)) + "{0}F.".format(u'\xb0'))
print("     The 95% confidence interval is " + str(round(bs_conf_int[0], 3)) + "{0}F to ".format(u"\xb0") + str(round(bs_conf_int[1], 3)) + "{0}F.".format(u"\xb0"))

bs_p = np.sum(bs_means >= mu0) / len(bs_means) * 2 # two-tail p-value
print("     The p-value is " + str(round(bs_p * 100, 2)) + "%.")

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x1,y2 ]



Bootstrap Replication Statistics for mu=98.6°F:
     The margin of error is ±0.124°F.
     The 95% confidence interval is 98.122°F to 98.371°F.
     The p-value is 0.0%.


The **bootstrap replication hypothesis test** above suggests that we reject the null hypothesis and support the alternative hypothesis that the mean human body temperature is NOT 98.6 degrees F when considering the provided sample data.

I have used a one-sample inference test for the **frequentist statistical analysis**  because we have been provided the historical mean human body temperature (98.6 degrees F) which represents the population mean (mu) and a sample set of human body temperature measurements where we can extract the sample mean (x-bar) and the sample standard deviation (Sx).  

In the situation of analyzing statictical inference about the population mean, I'm choosing to use the t-statistic because the standard deviation of the population (sigma) is unknown.

In [8]:
# Frequentist Statistical Analysis
print("Frequentist Statistics for mu=98.6{0}F using a t-statistic:".format(u'\xb0'))

n = len(df.temperature)
t_star = stats.t.ppf(0.975, n - 1)
pop_stdv = (np.var(df.temperature) / n) ** 0.5
moe = t_star * pop_stdv
conf_int = np.array([sample_mean - moe, sample_mean + moe])

print("     The margin of error is {0}".format(u"\u00B1") + str(round(moe, 3)) + "{0}F.".format(u'\xb0'))
print("     The 95% confidence interval is " + str(round(conf_int[0], 3)) + "{0}F to ".format(u'\xb0') + str(round(conf_int[1], 3)) + "{0}F.".format(u'\xb0'))

t = (sample_mean - mu0) / pop_stdv
t_p = stats.t.sf(np.abs(t), n - 1) * 2 # two-sided p-value

print("     The p-value is " + str(round(t_p * 100, 2)) + "%.")

Frequentist Statistics for mu=98.6°F using a t-statistic:
     The margin of error is ±0.127°F.
     The 95% confidence interval is 98.122°F to 98.376°F.
     The p-value is 0.0%.


The **frequentist statistical hypothesis testing** above also suggests that we reject the null hypothesis and support the alternative hypothesis that the mean human body temperature is NOT 98.6 degrees F when considering the provided sample data.

In [9]:
# Frequentist Statistical Analysis
print("Frequentist Statistics for mu=98.6{0}F using a z-statistic:".format(u'\xb0'))

z_star = stats.norm.ppf(0.975) # two-tailed probability around 95% confidence has 2.5% on each side of the mean
pop_stdv = (np.var(df.temperature) / n) ** 0.5
moe = z_star * pop_stdv
conf_int = np.array([sample_mean - moe, sample_mean + moe])

print("     The margin of error is {0}".format(u"\u00B1") + str(round(moe, 3)) + "{0}F.".format(u'\xb0'))
print("     The 95% confidence interval is " + str(round(conf_int[0], 3)) + "{0}F to ".format(u'\xb0') + str(round(conf_int[1], 3)) + "{0}F.".format(u'\xb0'))

z = (sample_mean - mu0) / pop_stdv
z_p = stats.norm.sf(np.abs(z)) * 2 # two-sided p-value

print("     The p-value is " + str(round(z_p * 100, 2)) + "%.")

Frequentist Statistics for mu=98.6°F using a z-statistic:
     The margin of error is ±0.126°F.
     The 95% confidence interval is 98.124°F to 98.375°F.
     The p-value is 0.0%.


If we approximate the z-statistic equal to the t-statistic and then evaluate the probability, then the probability decreases as the z-probability distribution (the standard normal probability distribution) has a tighter probability distribuiton around the mean than the t-probability distribution.

***
#### 4.) Draw a small sample of size 10 from the data and repeat both frequentist tests.
<ul>
    *<li> Which one is the correct one to use?*
    *<li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?*
</ul>

In [10]:
print("Frequentist Statistics for mu=98.6{0}F using a t-statistic and a small sample size (n=10):".format(u'\xb0'))

small_n = 10
small_sample = np.random.choice(df.temperature, small_n)
small_sample_mean = np.mean(small_sample)

t_star = stats.t.ppf(0.975, n - 1)
small_pop_stdv = (np.var(small_sample) / n) ** 0.5
moe = t_star * small_pop_stdv
conf_int = np.array([small_sample_mean - moe, small_sample_mean + moe])

print("     The margin of error is {0}".format(u"\u00B1") + str(round(moe, 3)) + "{0}F.".format(u'\xb0'))
print("     The 95% confidence interval is " + str(round(conf_int[0], 3)) + "{0}F to ".format(u'\xb0') + str(round(conf_int[1], 3)) + "{0}F.".format(u'\xb0'))

t = (small_sample_mean - mu0) / small_pop_stdv
t_p = stats.t.sf(np.abs(t), small_n - 1) * 2 # two-sided p-value

print("     The p-value is " + str(round(t_p * 100, 2)) + "%.")

print("Frequentist Statistics for mu=98.6{0}F using a z-statistic and a small sample size (n=10):".format(u'\xb0'))

z_star = stats.norm.ppf(0.975)
small_pop_stdv = (np.var(small_sample) / n) ** 0.5
moe = z_star * small_pop_stdv
conf_int = np.array([small_sample_mean - moe, small_sample_mean + moe])

print("     The margin of error is {0}".format(u"\u00B1") + str(round(moe, 3)) + "{0}F.".format(u'\xb0'))
print("     The 95% confidence interval is " + str(round(conf_int[0], 3)) + "{0}F to ".format(u'\xb0') + str(round(conf_int[1], 3)) + "{0}F.".format(u'\xb0'))

z = t
z_p = stats.norm.sf(abs(z)) * 2 # two-sided p-value

print("     The p-value is " + str(round(z_p * 100, 2)) + "%.")

Frequentist Statistics for mu=98.6°F using a t-statistic and a small sample size (n=10):
     The margin of error is ±0.176°F.
     The 95% confidence interval is 98.024°F to 98.376°F.
     The p-value is 0.15%.
Frequentist Statistics for mu=98.6°F using a z-statistic and a small sample size (n=10):
     The margin of error is ±0.175°F.
     The 95% confidence interval is 98.025°F to 98.375°F.
     The p-value is 0.0%.


The correct statistic to use is the t-static because this small sample set is less than 30 (n < 30) and we don't know the standard deviation of the population (sigma). The z-statistic is going to result in a smaller probability which could falsely reject the null hypothesis.

***
#### 5.) At what temperature should we consider someone's temperature to be "abnormal"?
<ul>
    *<li> As in the previous example, try calculating everything using the boostrap approach, as well as the frequentist approach.*
    *<li> Start by computing the margin of error and confidence interval. When calculating the confidence interval, keep in mind that you should use the appropriate formula for one draw, and not N draws.*
</ul>

In [11]:
bs_means = draw_bs_replicates(df.temperature, np.mean, 1000)
bs_conf_int = np.percentile(bs_means, [2.5, 97.5])

print("The bootstrap replicated mean 95% confidence interval is " + str(bs_conf_int))

t_95_129 = stats.t.ppf(1 - 0.025, len(df.temperature) - 1) # two-tail for 95% confidence interval
conf_int = np.array([sample_mean - (t_95_129 * sample_stdv / (len(df.temperature) ** 0.5)),
                     sample_mean + (t_95_129 * sample_stdv / (len(df.temperature) ** 0.5))])

print("The frequentist mean 95% confidence interval utilizing the t-statistic is " + str(conf_int))

The bootstrap replicated mean 95% confidence interval is [98.11844231 98.36846154]
The frequentist mean 95% confidence interval utilizing the t-statistic is [98.12249319 98.37596835]


According to the resulting 95% confidence interval(s) obtained above from the provided sample data, an "abnormal" human body temperature could be any temperature below 98.1 and above 98.4 degrees F.

***
#### 6.) Is there a significant difference between males and females in normal temperature?</h4>
<ul>
    *<li> What testing approach did you use and why?*
    *<li> Write a story with your conclusion in the context of the original problem.*
</ul>

We need to start this statistical inference analysis by setting up some hypothesis testing parameters. The parameters I used are as follows:  
  
**Null Hypothesis:** The difference between Male and Female Population Mean Human Body Temperatures is equal to 0 degrees F  (mu1 = mu2)  
**Alternative Hypothesis:** The diffence between Male and Female Population Mean Human Body Temperatures is NOT equal to 0 degrees F  (mu1 != mu2)  
**Confidence Interval:** 95%  
**Signifigence Level:** 5%  

In [12]:
# Plot 1
male_samples = df.temperature[df.gender == "M"]
female_samples = df.temperature[df.gender == "F"]
male_mean = np.mean(male_samples)
female_mean = np.mean(female_samples)

x_male, y_male = ecdf(male_samples)
x_female, y_female = ecdf(female_samples)
x_mmean, y_mmean = np.array([[male_mean, male_mean], [-0.05, 1.05]])
x_fmean, y_fmean = np.array([[female_mean, female_mean], [-0.05, 1.05]])

trace0 = go.Histogram(name="Male Samples", x=male_samples, histnorm="probability", 
                      opacity=0.75, nbinsx=50)
trace1 = go.Histogram(name="Female Samples", x=female_samples, histnorm="probability", 
                      opacity=0.75, nbinsx=50)
trace2 = go.Scatter(name="Male Samples", x=x_male, y=y_male, mode="markers", 
                    marker=dict(color="rgba(87, 153, 199, 1)"))
trace3 = go.Scatter(name="Female Samples", x=x_female, y=y_female, mode="markers",
                    marker=dict(color="rgba(255, 159, 74, 1)"))
trace4 = go.Scatter(name="Male Sample Mean", x=x_mmean, y=y_mmean, mode="lines",
                    line=dict(color="rgba(87, 153, 199, 1)"))
trace5 = go.Scatter(name="Female Sample Mean", x=x_fmean, y=y_fmean, mode="lines",
                    line=dict(color="rgba(255, 159, 74, 1)"))

fig = py.tools.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.01)
fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 1)
fig.append_trace(trace5, 2, 1)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=600,
    title="<b>Human Body Temperature Sample Distribution by Gender</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(title="<b>PDF</b>", titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)), 
    yaxis2=dict(title="<b>ECDF</b>", titlefont=dict(family="serif", size=14), dtick=0.2,
               tickfont=dict(family="serif", size=14), range=[-0.05, 1.05]),
    xaxis1=dict(title="<b>Human Body Temperature, {0}F</b>".format(u'\xb0'), dtick=0.5,
               titlefont=dict(family="serif", size=14), tickfont=dict(family="serif", size=14)))

py.offline.iplot(fig)

# Plot 2
bs_mmean = draw_bs_replicates(male_samples, np.mean, 1000)
bs_fmean = draw_bs_replicates(female_samples, np.mean, 1000)
bs_mean_diff = bs_fmean - bs_mmean

x_bsdf, y_bsdf = ecdf(bs_mean_diff)
x_bsdm, y_bsdm = np.array([[np.mean(bs_mean_diff), np.mean(bs_mean_diff)], [-0.05, 1.05]])
x_d0, y_d0 = np.array([[0, 0], [-0.05, 1.05]])

trace0 = go.Histogram(name="Sample Difference", x=bs_mean_diff, histnorm="probability")
trace1 = go.Scatter(name="Sample Difference", x=x_bsdf, y=y_bsdf, mode="markers", marker=dict(color="rgba(31, 119, 180, 1)"))
trace2 = go.Scatter(name="Mean Sample Difference", x=x_bsdm, y=y_bsdm, mode="lines", line=dict(color="rgba(31, 119, 180, 1)"))
trace3 = go.Scatter(name="Null Hypothesis Mean Diff", x=x_d0, y=y_d0, mode="lines")

fig = py.tools.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.01)
fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 2, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 1)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=600,
    title="<b>Bootstrap Human Body Temperature Sample Difference by Gender</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(title="<b>PDF</b>", titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)), 
    yaxis2=dict(title="<b>ECDF</b>", titlefont=dict(family="serif", size=14), dtick=0.2,
               tickfont=dict(family="serif", size=14), range=[-0.05, 1.05]),
    xaxis1=dict(title="<b>Difference in Human Body Temperature, {0}F</b>".format(u'\xb0'),
               titlefont=dict(family="serif", size=14), tickfont=dict(family="serif", size=14)))

py.offline.iplot(fig)

# Bootstrap Replication Statistical Analysis
print("Bootstrap Replication Statistics for mu1-mu2:")

bs_conf_int = np.percentile(bs_mean_diff, [2.5, 97.5])
bs_moe = (bs_conf_int[1] - bs_conf_int[0]) / 2
print("     The margin of error is {0}".format(u"\u00B1") + str(round(bs_moe, 4)) + "{0}F.".format(u'\xb0'))
print("     The 95% confidence interval is " + str(round(bs_conf_int[0], 4)) + "{0}F to ".format(u"\xb0") + str(round(bs_conf_int[1], 4)) + "{0}F.".format(u"\xb0"))

d0 = 0
bs_p = np.sum(bs_mean_diff <= d0) / len(bs_mean_diff) * 2 # two-tail p-value
print("     The p-value is " + str(round(bs_p * 100, 2)) + "%.")

# Frequentist Statistical Analysis
print()
print("Frequentist Statistics for mu1-mu2 using a z-statistic:")

# The margin of error for a 95% confidence interval
n = 65 # both probability batches have the same number of trials n1 = n2 = n
z_star = stats.norm.ppf(0.975) # two-tail 95% interval has 2.5% on both sides
sample_diff_stdv = (np.var(male_samples) / len(male_samples) + np.var(female_samples) / len(female_samples)) ** 0.5
moe = z_star * sample_diff_stdv

print("     The margin of error is {0}".format(u"\u00B1") + str(round(moe, 4)) + "{0}F.".format(u'\xb0'))

# The 95% confidence interval
sample_mean_diff = abs(male_mean - female_mean)
conf_int = np.array([sample_mean_diff - moe, sample_mean_diff + moe])

print("     The 95% confidence interval is " + str(round(conf_int[0], 4)) + "{0}F to ".format(u'\xb0') + str(round(conf_int[1], 4)) + "{0}F.".format(u'\xb0'))

# The p-value
z = (sample_mean_diff - d0) / sample_diff_stdv
z_p = stats.norm.sf(abs(z)) * 2 # two-tail p-value

print("     The p-value is " + str(round(z_p * 100, 2)) + "%.")

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x1,y2 ]



This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x1,y2 ]



Bootstrap Replication Statistics for mu1-mu2:
     The margin of error is ±0.2523°F.
     The 95% confidence interval is 0.0369°F to 0.5416°F.
     The p-value is 1.8%.

Frequentist Statistics for mu1-mu2 using a z-statistic:
     The margin of error is ±0.2461°F.
     The 95% confidence interval is 0.0431°F to 0.5354°F.
     The p-value is 2.13%.


The resulting statistical inference test(s) above suggests that we *reject the null hypothesis* and that there may be a significant difference between the normal body temperature of males and females.