# The Mean and Standard Deviation of a Normal Distribution
\begin{equation}\newcommand{\prob}{\mathcal{P}}\end{equation}
In this notebook, you are going to explore how the *normal* probability distribution $\prob(x)$ is characterized by the mean,
\begin{equation}
\newcommand{\mean}[1]{\langle#1\rangle}
\newcommand{\dif}{\mathrm{d}}
\mean{x} = \int x\prob(x)\,\dif x,
\end{equation}
and the standard deviation,
\begin{equation}
\mean{(x-\mean{x})^2} = \mean{x^2}-\mean{x}^2   
\end{equation}
with
\begin{equation}
\mean{x^2} = \int x^2\prob(x)\,\dif x.
\end{equation}
For a normal distribution,
\begin{equation}
\prob(x;\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left[-\frac{\left(x-\mu\right)^{2}}{2\sigma^{2}}\right],
\end{equation}
you can show that $\mean{x} = \mu$ and $\mean{x^2}-\mean{x}^2 = \sigma^2$.

To begin, we'll set up our work environment; in addition, we shall import from `scipy.stats` the `norm` class.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import norm

We are going to make several different plots, so let's put all of the steps for doing that in a single function `plot_one`, which is in the following cell.

The line
```python
    d = norm(loc=mu,scale=sigma)
```
means that we make an *object* named `d` that holds a description of our normal distribution.  The keywords `loc` and `scale` set the mean and standard deviation of this distribution.
We next set up an array of `npts=100` values from $x_\min=\mu-5\sigma$ to $x_\max=\mu+5\sigma$ and define $p$ to be the values of the normal distribution at these points.
```python
    x = np.linspace(xmin,xmax,npts)
    p = d.pdf(x)
```
We then make a simple plot.

The last line
```python
    return d
```
means that the routine returns the object `d` for use.

Thus if we write
```python
    d = plot_one(0.0,1.0)
```
we will get a plot of a normal distribution with $\mu=0$, $\sigma=1$.

In [None]:
def plot_one(mu,sigma):
    d = norm(loc=mu,scale=sigma)
    xmin = mu - 5*sigma
    xmax = mu + 5*sigma
    npts = 100
    x = np.linspace(xmin,xmax,npts)
    p = d.pdf(x)
    plt.xlim(xmin,xmax)
    plt.plot(x,p,color='black')
    plt.xlabel('x')
    plt.ylabel('p(x)')
    return d

In [None]:
d = plot_one(0.0,1.0)

This is good.  Let's verify the mean of the distribution.

In [None]:
d = plot_one(0.0,1.0)
# the mean and standard deviation
xmean = d.mean()
# height of the distribtion at the mean, standard deviation
pmean = d.pdf(d.mean())
plt.vlines(xmean,0,pmean,color='red',linestyle='dotted')
plt.annotate(s='mean: {0:4.1f}'.format(xmean),xy=(xmean,pmean),xytext=(15,0),xycoords='data',
            textcoords='offset points',ha='left',va='top')

Now let's explore how the standard deviation is related to the width of the distribution.  We'll define a new function, which makes the plot and then shades in the region $\mu-N\sigma$ to $\mu+N\sigma$, where $N$ is an argument of the function.

In [None]:
def plot_one_with_sigma(mu,sigma,N):
    d = plot_one(mu,sigma)
    x = np.linspace(mu-N*sigma,mu+N*sigma,100)
    plt.fill_between(x, 0.0, d.pdf(x), facecolor='0.6',edgecolor='none')
    return d

Let's try if for $N=1$.

In [None]:
d = plot_one_with_sigma(0.0,1.0,1)

Now let's make $\mu=4$, $\sigma=1$, and shade the region $\mu\pm2\sigma$.

In [None]:
d = plot_one_with_sigma(4.0,1.0,2)

Another convenient measure of the width is the *Full-Width-Half-Max* (FWHM):

    1. Draw a horizontal line halfway between 0 and the maximum value of the distribution.
    
    2. The width of the region in which this horizontal line is under the curve is the (FWHM).

In [None]:
d = plot_one_with_sigma(4.0,1.0,1)
half_max = 0.5*d.pdf(d.mean())
plt.axhline(half_max,0,1,linestyle=':',color='red')

<i class="fa fa-pencil" style="color:red; font-size:1.5em"></i> **Question:** (edit this cell to provide an answer)  How does the FWHM compare to $\sigma$?

The probability that $x$ lies between two values, call them $x_L$ and $x_H$, is given by the integral over the distribution:
\begin{equation}
p(x_L < x < x_H) = \int_{x_L}^{x_H} \prob{x}\,\dif x
\end{equation}
Thus, the probability that $x$ is within one standard deviation of the mean, $p(\mu-\sigma < x < \mu + \sigma)$, is the shaded area in the following plot.

In [None]:
d = plot_one_with_sigma(3.0,2.0,1)

The shaded region has 68% of the total area under the curve.  In other words, if our measurement error follows a normal distribution, then if we make repeated measurements 68% of them will lie within one standard deviation of the mean.