In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

In [None]:
def gaussian(x, mu=0., sigma=1.):
    return np.exp(-((x-mu)/(np.sqrt(2)*sigma))**2)/(sigma*np.sqrt(2*np.pi))

# Gaussian distribution, aka Normal Distribtuion, aka Bell Curve

A Gaussian distritubion is a distribution defined by:

$G(x | \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x - \mu)^2}{2\sigma^2}}$

Where the notation $G(x | \mu, \sigma)$ means that:

The function $G$ depends on $x$, $\mu$ and $\sigma$.

$x$ is what we sometimes call the "indepedent variable", while $\mu$ and $\sigma$ are sometimes called "parameters".  Basically each set of values of $\mu$ and $\sigma$ defined a different curve.

Some properties of the gaussian we can infer just by examing the formula.



If we take $\mu = 0$ and $\sigma = 1$ then to formula simplifies somewhat:

$G(x | \mu=1, \sigma=1) = \frac{1}{1\sqrt{2\pi}}e^{-\frac{(x - 0)^2}{2*1^2}} = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}$

Let's plot that.  We are going to use the equation I wrote out above, and also the 'scipy.stats.norm' function to do it and then compare the two.

In [None]:
x_vals = np.linspace(-6, 6, 601)
y_vals_check = gaussian(x_vals, mu=0, sigma=1)
# Note that the arguments have different names, and the function is called slightly differently
y_vals = stats.norm(loc=0, scale=1).pdf(x_vals)

In [None]:
_ = plt.plot(x_vals, y_vals, label="stats")
_ = plt.plot(x_vals, y_vals_check, label="func")
_ = plt.legend()
_ = plt.title("My first Gaussian")
_ = plt.xlabel(r'$x$')
_ = plt.ylabel(r'$G(x, \mu=0, \sigma=1)$')

Some properties of the gaussian we can infer just by examing the formula.

1. Since $\frac{(x-\mu)^2}{2\sigma^2}$ is always postive, and there is a minus sign in from on it, the term in the exponent is always zero or negative.  That means the maximum when that term is zero, i.e., when $x = \mu$, thus, $\mu$ gives the peak of the distribution.  
2. Since $\frac{(x - \mu)^2}{2\sigma}$ is symmetric about $\mu$, the distribution is symmetric about $\mu$.  I.e., $G(x = \mu + a, \mu, \sigma) = G(x = \mu - a, \mu, \sigma)$
3. The distribution is always positive.  I.e., $e^{-x} > 0$ for all x.
4. The distribution goes towards zero pretty quickly if $(x - \mu)^2$ is bigger than $\sigma$.  

And we can confirm those by looking at the plot. 

One thing to note:  the peak of the distribution is at $\frac{1}{\sqrt{2\pi}} \sim 0.4$.  This is to ensure that the integral of the distribution is 1.  

$\int_{-\infty}^{\infty} G(x, \mu, \sigma) = 1$


In [None]:
for mu in np.linspace(-3, 3, 7):
    _ = plt.plot(x_vals, gaussian(x_vals, mu=mu), label="mu = %0.1f" % mu)

_ = plt.xlabel(r'$x$')
_ = plt.ylabel(r'$G(x, \mu, \sigma=1)$')
_ = plt.legend()

In [None]:
for sigma in np.linspace(0.4, 1.6, 7):
    _ = plt.plot(x_vals, gaussian(x_vals, sigma=sigma), label="sigma = %0.1f" % sigma)

_ = plt.xlabel(r'$x$')
_ = plt.ylabel(r'$G(x, \mu=0, \sigma)$')
_ = plt.legend()

# Why a Gaussian

We are learning about Gaussians because they occur all the time in nature.

In short a Gaussian distribution is what you get when a lot of random effects add together.

We are going to do two different things and show that we get very Gaussian-looking distribtuions.

1.  We are going to generate 10000 sets of 12 random numbers between 0 and 1, and add each set together.  This will give use 10000 numbers between 0 and 12, and we will see that their distribution looks a lot like a Gaussian with $\mu = 6$ and $\sigma = 1$.

2.  We are going generate 10000 sets of 1000 random numbers between 0 and 1, and count how many numbers in each set are less than 0.1.   This will give use 10000 numbers between 0 and 1000, and we will see that their distribution looks at lot like a Gaussian with $\mu = 100$ and $\sigma = 10$.

In [None]:
# This line tells numpy to generate 120000 random numbers betweeen 0 and 1
# then to split them into 10000 gropus of 12 
randomNumbers = np.random.uniform(size=120000).reshape(10000, 12)
# This line takes the sum of each group of 12, giving us a total of 10000 numbers
sums = np.sum(randomNumbers, axis=1)
print("Some numbers are ", sums)
print("And we have %i numbers total" % sums.size)

In [None]:
xvals = np.linspace(0,12,121)

# Note.  The Gaussian is defined so that it integrates to 1.  But:
#  1) We generated 10000 numbers
#  2) Our histogram bins are 0.1 units wide.  
# So, to get the height of the curve to match the histogram we need to multiply by a prefactor
prefactor = 10000 * 0.1
_ = plt.hist(sums, bins=xvals)
myGauss = prefactor*stats.norm(loc=6, scale=1).pdf(xvals)
_ = plt.plot(xvals, myGauss, label=r'$G(x, \mu=6, \sigma=1)$')
_ = plt.xlabel(r'sum of 12 random numbers[$a.u.$]')
_ = plt.ylabel(r'Counts [per $0.1 a.u.$]')
_ = plt.legend()

In [None]:
# This line tells numpy to generate 10000000 random numbers betweeen 0 and 1
# Then to split them into 10000 groups of 1000 each
randomNumbers = np.random.uniform(size=10000000).reshape(10000, 1000)
# Then this line tells numpy to count how many numbers in each group of 1000 are less that 0.1
nPass = np.sum(randomNumbers < 0.1, axis=1)
print("Some numbers are ", nPass)
print("And we have %i numbers total" % nPass.size)

In [None]:
xvals = np.linspace(50,150,101)
_ = plt.hist(nPass, bins=xvals)
# Note.  The Gaussian is defined so that it integrates to 1.  But:
#  1) We generated 10000 numbers
#  2) Our histogram bins are 1.0 units wide.  
# So, to get the height of the curve to match the histogram we need to multiply by 10000
myGauss = 10000*stats.norm(loc=100, scale=10).pdf(xvals)
_ = plt.plot(xvals, myGauss)
_ = plt.xlabel(r'Number of values with $x < 0.1$')
_ = plt.ylabel(r'Counts [per number]')
_ = plt.plot(xvals, myGauss, label=r'$G(x, \mu=100, \sigma=10)$')
_ = plt.legend()