# ASTR302 Lab 2: Statistical Distributions

In the Lab 1 tutorials you were introduced to the python language using several Jupyter notebooks.  You will now use those skills to explore the statistical distributions we discussed in lecture.

## Jupyter Review

Each entry in a Jupyter notebook is called a cell.  

You can run the contents of individual cells by selecting them and pressing **CTRL-Enter**.

You can run the contents of individual cells AND add a new cell underneath by pressing **ALT-Enter**.

You can delete a cell by selecting it and pressing **d** twice.

You can add a text block like this cell by pressing **Esc-m** (or use the dropdown at the top and change from Code to Markdown.  This is useful for adding in notes that you want to remember.

You can find more information on Jupyter notebooks here: 
https://jupyter-notebook.readthedocs.io/en/stable/
***

## Python Review

The core python programming language provides a small number of built-in functions.  You can see a description of them here: https://docs.python.org/3/library/functions.html . Most of the high-level functions you will want for numerical data analysis are not built-in.  You access these by importing *packages*.  

The Anaconda distribution you installed included a large library of packages, but in order to use them you need to first import them into your current programing environment.

You can import an entire package like this:

In [None]:
import astropy

This gives you access to the *astropy* package, which provides numerous astronomical utilities.  There are sub-packages within *astropy*, such as *constants*, which contains useful astronomical constants.  You import the subpackage like this:

In [None]:
import astropy.constants
c = astropy.constants.c  #Retrieve the speed of light, and store in variable 'c'
print(c) #Print a variable using the print object
print(c.value) #Get just the value

You will frequently see someone do imports like this:

In [None]:
from datetime import *

This says *from the datetime package, import everything directly into the current environment.* **This is very bad, and you should not do it.**  If two different packages have functions in them that are named the same thing, and if you import both of them like this, then they will overwrite each other and your code will be confusing.  

*There are a few exceptions to this rule, but you will likely not encounter them in this class.*

***

## Mathematics

To calculate things we need to import a mathematical package.  *numpy* is the standard.

In [None]:
import numpy as np

Note that I am both importing the 'numpy' package, and changing its name (in this session) to 'np'.  This is because I am lazy and typing 'np' is faster than typing 'numpy'.

You can find the documentation on the numpy package here: [https://numpy.org/devdocs/reference/index.html](https://numpy.org/devdocs/reference/index.html) . I found this by searching for 'numpy manual'.

Numpy allows you to create vector and matrix arrays:

In [None]:
v = np.array([1,2,3,4])
v

Note that I did not use print(v).  Instead I asked python to give me information about the object itself.  It tells me that it is *type* array, and has elements [1, 2, 3, 4]

In [None]:
print(v)

Print just shows the contents (as a python list).

Now make a bigger array:

In [None]:
v = np.arange(0,10000,1) #This arange call creates a long array from 0 to 10000 with step size = 1

In [None]:
v

Note that python doesn't display the entire array, only the ends.

***

## Uniform Distribution

Let's use numpy to generate a uniform set of 100 random numbers from -10 to 10

In [None]:
samples = np.random.uniform(-10,10,100)

Let's examine those samples:

In [None]:
print(samples)

Create a histogram of those samples, with one bin per integer value:

In [None]:
bins = np.linspace(-10,10,21)

In [None]:
print(bins)

In [None]:
histogram, bins = np.histogram(samples, bins=bins)

In [None]:
print(histogram)

Why did we use linspace instead of array?  Read the documentation on numpy.linspace

## Plotting

Now we can use the matplotlib package to plot this distribution.  First import it, and configure it to draw within the python notebook

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

We want the plot to show bin centers halfway between the edge of each bin

In [None]:
bin_centers = 0.5*(bins[1:]+bins[:-1])

In [None]:
print(bin_centers)

Now setup the plot.

In [None]:
plt.figure(figsize=(10,6))
plt.plot()
plt.plot(bin_centers, histogram)
plt.ylim(0,np.max(histogram*1.2))
plt.xlabel('Value')
plt.ylabel('Frequency (N)')
plt.title('Uniform Distribution')
plt.show()

That doesn't look very uniform.  What happens if we increase the number of points?  Change the size of the sample in the original np.random.uniform call and re-run the notebook steps

When plotting histograms, it is useful for the plot to not play 'connect-the-dots'.  Rather, we want step style plots.

In [None]:
plt.figure(figsize=(10,4))
plt.plot()
plt.step(bin_centers, histogram)
plt.ylim(0,np.max(histogram*1.2))
plt.xlabel('Value')
plt.ylabel('Frequency (N)')
plt.title('Uniform Distribution')
plt.show()

## Gaussian Distribution

Now change from a uniform distribution to a normal (or Gaussian) distribution

In [None]:
samples = np.random.normal(0,2.5,100000)

Note that the function parameters for 'normal' are different than for uniform.  Use the numpy manual to figure out what they are.

In [None]:
print(samples)

In [None]:
histogram, bins = np.histogram(samples, bins=bins)

In [None]:
plt.figure(figsize=(10,4))
plt.plot()
plt.step(bin_centers, histogram)
plt.ylim(0,np.max(histogram*1.2))
plt.xlabel('Value')
plt.ylabel('Frequency (N)')
plt.title('Uniform Distribution')
plt.show()

Experiment with changing the standard deviation and center location of the distribution.  You may need to change the range of the bins.  Insert the required code into the notebook above the plotting section.

## Poisson Distribution

The binomial distribution governs the outcome of an experiment where there are two possible outcomes.  This is the coin-flip experiment.

The distribution has two parameters: n=number of trials; p=success probability for each trial (0<p<1)

numpy will generate a binomial distribution.  For a class of 20 students, conducting the coin flip experiment 100 times:

In [None]:
n = 20
p = 0.5
samples = np.random.binomial(n,p,10)

In [None]:
print(samples)

In [None]:
bins = np.linspace(0,20,21)

In [None]:
print(bins)

In [None]:
bin_centers = 0.5*(bins[1:]+bins[:-1])

In [None]:
histogram, bins = np.histogram(samples, bins=bins)

In [None]:
plt.figure(figsize=(10,4))
plt.plot()
plt.step(bin_centers, histogram)
plt.ylim(0,np.max(histogram*1.2))
plt.xlabel('Number of Occurances')
plt.ylabel('Frequency (N)')
plt.title('Binomial Distribution')
plt.show()

**What happens when you increase the number of experiements?  What if you use a six sided dice instead of a coin?**

## Poisson Distribution

The Poisson distribution gives you the probability of encountering a certain number of events in a given period, if those events are occuring at a known constant mean rate.  This is the distribution that governs most 'counting' situations, including the collection of photons from stars!  This is also the limit of the binomial distribution for large N.

The distribution has one parameters: the average rate of events, often expressed as lambda

Use the Poisson generator in numpy to generate a sample function with a mean rate of 10 and 100 events.

(Note: 'lambda' is a built-in function name in python, so we will call this variable 'lam')

In [None]:
lam = 3
samples = np.random.poisson(lam, 100)

In [None]:
print(samples)

In [None]:
histogram, bins = np.histogram(samples, bins=bins)

In [None]:
bins = np.linspace(0,20,21)

In [None]:
bin_centers = 0.5*(bins[1:]+bins[:-1])

In [None]:
plt.figure(figsize=(10,4))
plt.plot()
plt.step(bin_centers, histogram)
plt.ylim(0,np.max(histogram*1.2))
plt.xlabel('Number of Occurances')
plt.ylabel('Frequency (N)')
plt.title('Poisson Distribution')
plt.show()

**Increase the number of samples to smooth out the function**

**What happens to the Poisson distribution at very small lambda?**

**What happens at very large lambda?**

***

## Conclusion: 

 Save your notebook.  Append your LastNameFirstInitial to the filename, and email it to me