# Measurements with normally-distributed fluctuations
In this notebook, we explore what happens when we make a number of measurements that follow a *normal* (ie, Gaussian) distribution with a given mean $\mu$ and standard deviation $\sigma$.

To follow along, click on each cell and the press `shift-return`.  Don't worry if you don't understand all of the code; the point of this exercise is to look at some numerical experiments, not to learn `Python`.

There are several questions for you to answer in this workbook.  Look for the text, "**Double-click in this box and type your answer:**".  Do just that: double-click on the text, and then type your answer in the box.  Don't forget to save your work!  When you are finished, upload your notebook into the dropbox on [D2L](https://d2l.msu.edu/).

## Set up your work environment
First, we need to set up our work environment.  We'll import the `numpy` and `matplotlib.pyplot` modules, and tell ipython that we want our plots to be inline.

In [None]:
from __future__ import print_function, division
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

For this exercise I wrote a module called `measurements`. This module contains a bit of code,
called `measurementWithUncertainty`, that will supply random numbers that follow a normal distribution.  That is, if you used this tool to generate a great many random numbers and plotted a histogram, it would appear gaussian with a certain mean and standard deviation.  BTW, the mean and standard deviation are chosen at random: you can compare your results to those of a classmate to see how different distributions look.

Run the following cell to load the module.

In [None]:
# %load measurements.py
################################################################################
# Edward Brown
# Michigan State University
#
# a simple class to make a measurement with gaussian uncertainties.
#
################################################################################
from __future__ import print_function, division
import numpy as np
import numpy.random as nr

class measurementWithUncertainty(object):
    """
    Upon initialization, generates a normal (gaussian) distribution with a mean 
    in the range [10.0,20.0] and a standard deviation in the range [1.0,2.0].   
    The mean and standard deviation are themselves chosen at random.
    
    Example
    -------
    
        >>> from measurements import measurementWithUncertainty
        >>> d = measurementWithUncertainty()
        made data  12.492 +/-   1.12
        >>> x = d.make_measurements(10)
        >>> print(x)
        [ 13.85981811  11.70694115  10.28276517  12.41290813  11.7629273
          11.5970737   10.06126323  13.05767109  13.45734073  12.21781776]
        >>> print(d.mean
        12.4922289309
        >>> print(d.stddev)
        1.12196280663
        
    Here the call to make_measurements(N) generates N measurements chosen 
    randomly from this distribution.
    """
    
    _sample_low = 10.0
    _sample_high = 20.0
    _stddev_bias = 1.0
    
    def __init__(self):
        """
        Sets the mean and standard deviation of our sample.
        """
        a = self._sample_low
        b = self._sample_high
        self.mean = (b-a)*nr.random() + a
        self.stddev = nr.random() + self._stddev_bias
        print('made a gaussian distribution with mean {0:7.3f} and std. dev. {1:6.3}'.\
              format(self.mean,self.stddev))

    def make_measurements(self,n):
        """
        Returns a set of measurements drawn from distribution. Called with  
        argument n, the number of desired measurements.
        """
        return nr.normal(loc=self.mean,scale=self.stddev,size=n)

    @property
    def mean(self):
        """
        Returns the mean of the distribution.  Note that this returns the 
        parameter in the distribution.  and *not* the average of a set of 
        numbers drawn from the  distribution.
        """
        return self._mean
    
    @mean.setter
    def mean(self,value):
        self._mean = value
    
    @property
    def stddev(self):
        """
        Returns the standard deviation of the distribution. Note that this  
        returns the parameter in the distribution, and *not* the variance of a 
        set of numbers drawn from the  distribution.
        """
        return self._stddev
    
    @stddev.setter
    def stddev(self,value):
        self._stddev = value


The code
```python
class measurementWithUncertainty:
```
allows us to define an *object* called `measurementWithUncertainty`.  The various functions in this module, like `make_measurements` do things with this class.  First, let's define an object.

In [None]:
d = measurementWithUncertainty()

Make a note of the mean and std. dev. of this distribution; you will need them later.

Now have a gaussian distribution $d$ with a given mean and uncertainty, let's generate a set of 10 data points $x_i, i=1,\ldots,10$ from this distribution, and compute their average
\begin{equation*}
\langle x\rangle = \frac{1}{N} \sum_{i=1}^{10} x_i
\end{equation*}
and variance
\begin{equation*}
s =  \left[\frac{1}{N} \sum_{i=1}^{N} \left(x_i - \langle x\rangle\right)^2\right]^{1/2} =  \sqrt{\langle x^2\rangle - \langle x\rangle^2}
\end{equation*}
of these 10 datapoints.

In [None]:
x = d.make_measurements(10)
xbar = np.average(x)
x2bar = np.average(x**2)
s = np.sqrt(x2bar - xbar**2)
print('mean(x) = {0:7.3f}; variance(x) = {1:7.3f}'.format(xbar,s))
print('compare with mu = {0:7.3f}; sigma = {1:7.3f}'.format(d.mean,d.stddev))

The average and variance computed from our dataset are close, but not equal to the true mean and standard deviation of our gaussian distribution.

**Exercise:** What would happen to `mean(z)` and `std. dev. (z)` if we repeated the cell above with `z = d.make_measurements(100)`?  What about with 1000 or 10000 measurements?  Predict what would happen, and then try it.  Describe in this box what you find.

## Make many datasets, and compute their statistics
We're now ready to explore how the statistical measures of a dataset relate to the underlying distribution. What we are going to do is repeatedly make datasets of `Nmeasures = 10` measurements—just as before—and take the average and variance of each dataset.

For each dataset we shall also compute
\begin{equation*}
\chi^2 = \sum_{i=1}^{N} \frac{\left(x_i-\mu\right)^2}{\sigma^2}
\end{equation*}
(pronounced "khi-square") where $\mu$ and $\sigma$ are the true mean and standard deviation, which we access with `d.mean` and `d.stddev`.

We will make `Ntrials = 1000` datasets, and we'll examine the distributions of $\langle x\rangle$, $s^2$, and $\chi^2$.  To start, let's first make some arrays — `xbar`, `s2`, and `chi2` — to hold these values.  The `numpy` routine `zeros` will generate such an array and initialize it to zero.  We also will make a giant array `xall` to hold *all* `Ntrials*Nmeasures=10000` of the measurements.

In [None]:
Nmeasures = 10
Ntrials = 1000
xbar = np.zeros(Ntrials)
s2 = np.zeros(Ntrials)
chi2 = np.zeros(Ntrials)
xall = np.zeros(Ntrials*Nmeasures)

Now let's run our trials.  The following block of code is a *loop*.
```python
    for i in range(Ntrials):
        x = d.make_measurements(Nmeasures)
        xbar[i] = np.average(x)
        x2bar = np.average(x**2)
        s2[i] = x2bar-xbar[i]**2
        q2 = (x-d.mean)**2/(d.stddev)**2
        chi2[i] = np.sum(q2)
        xall[i*10:i*10+10] = x
```
The first line,
```python
    for i in range(Ntrials):
```
means that we are going to repeat the following indented lines `Ntrials` times.  The variable `i` is a counter: the first time through the loop, `i = 0`; the next time, `i=1`, and so on until the 1000th time through the loop, when `i = 999`.

The next line
```python
        x = d.make_measurements(Nmeasures)
```     
generates an array, `x`, that contains our set of 10 measurements *for this trial*.
Next we'll compute the average of $x$ and $x^2$, followed by the variance $s^2$.
```python
        xbar[i] = np.average(x)
        x2bar = np.average(x**2)
        s2[i] = x2bar-xbar[i]**2
```
Last, we'll compute an array `q2`, with each element holding the value $(x_i-\mu)^2/\sigma^2$, and then we'll sum up the elements of `q2` to get $\chi^2$ for this trial.
```python
        q2 = (x-d.mean)**2/(d.stddev)**2
        chi2[i] = np.sum(q2)
```
Finally, we'll store these numbers in `xall`.  The line
```python
        xall[i*10:i*10+10] = x
```
means to put the 10 elements of `x` in the slots in `xall` starting from `i*10` to `i*10 + 9`.  So on the first trip through the loop, `x` is stored in `xall[0]...xall[9]`; on the second trip `x` is stored in `xall[10]...xall[19]`; and so on.

In [None]:
for i in range(Ntrials):
    x = d.make_measurements(Nmeasures)
    xbar[i] = np.average(x)
    x2bar = np.average(x**2)
    s2[i] = x2bar-xbar[i]**2
    q2 = (x-d.mean)**2/(d.stddev)**2
    chi2[i] = np.sum(q2)
    xall[i*10:i*10+10] = x

## Analyze the results
Now let's look at our trials! We'll first plot this distribution of all points, `xall`, along with the `Ntrials=1000` individual estimates of the mean from our dataset.  To put both plots on the same footing, we call `plt.hist` with `normed=True`, which scales the plots to the area under each curve is one.  The histogram labeled $x$ is all 10000 points, while the curve labeled $\langle x\rangle$ is the distribution of the 1000 different averages that we computed.

In [None]:
plt.hist([xall,xbar],bins=50,normed=True,histtype='step')
plt.legend((r'$\langle x\rangle$',r'$x$'))
plt.xlabel(r'$x$')
plt.ylabel(r'$N$')

**Exercise:** Are the distributions for $x$ and $\langle x\rangle$ as expected?  Explain why or why not.  How do their means and widths compare with the "true" values of the distribution?

**Double-click in this box and type your answer:**


Next we'll plot the distribution of $s^2$.  For convenience, we'll divide by $\sigma^2$; that is, if $s^2 = \sigma^2$, then the distribution will peak at 1. We'll print the actual value of $s^2/\sigma^2$ on the plot.

In [None]:
sigma2 = d.stddev**2
plt.hist(s2/sigma2,bins=50,histtype='step')
plt.xlabel(r'$s^2/\sigma^2$')
plt.ylabel(r'$N$')
s2bar = np.average(s2)
desc = r'$\langle s^2\rangle/\sigma^2 = {0:6.2f}$'.format(s2bar/sigma2)
plt.annotate(s=desc,xy=(0.7,0.8),xycoords='figure fraction')

**Exercise:** Is the value of $\langle s^2\rangle/\sigma^2$ what you expect? Explain why or why not.

**Double-click in this box and type your answer:**


Finally, let's plot the distribution of $\chi^2$.  As before, we'll write the value of $\langle\chi^2\rangle$ on the plot.

In [None]:
plt.hist(chi2,bins=50,histtype='step')
plt.xlabel(r'$\chi^2$')
plt.ylabel(r'$N$')
chi2bar = np.average(chi2)
desc = r'$\langle\chi^2\rangle = {0:6.2f}$'.format(chi2bar)
plt.annotate(s=desc,xy=(0.7,0.8),xycoords='figure fraction')

**Exercise:** Describe the distribution of $\chi^2$.  Does the mean value of $\chi^2$ make sense to you? Explain why or why not.

**Double-click in this box and type your answer:**
