In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
%matplotlib inline 

**[matplotlib.pyplot.plot](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.plot.html)**
```python
matplotlib.pyplot.plot(*args, **kwargs)
```
> Plot lines and/or markers to the Axes. args is a variable length argument, allowing for multiple x, y pairs with an optional format string.

calling `plt.show` after a renders all graph objects previously made in the block to a single graph. You can also call `plt.show` between graphs objects to render them to separate graphs

# SIMPLE CHARTS

In [None]:
# plotting a line
x = np.arange(0,10)
y = np.arange(0,10)
plt.plot(x,y)

In [None]:
# generate linear, exponential and flat lines
x = np.arange(-10,10)
a = np.arange(0,20)
b = [b**2 for b in range(0,20)]
c = [100 for c in range(0,20)]

# metadata for the plot
plt.title("Title")
plt.plot(x,a, label='linear')
plt.plot(x,b, label='exponential')
plt.plot(x,c, label='flat')
plt.legend(loc='upper left', frameon=True)
plt.ylabel('Y')
plt.xlabel('X')

In [None]:
# make a random scatter plot
n = 1000
ax = np.random.randn(n)
ay = np.random.randn(n)
plt.scatter(ax, ay)

**[matplotlib.pyplot.hist](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.hist.html)**
```python
n, bins, patches = matplotlib.pyplot.hist(x, bins, **kwargs)
```
> Plot a histogram.

# NORMAL (GAUSSIAN) DISTIBUTION

**[numpy.random.normal](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.normal.html)**
```python
numpy.random.normal(loc, scale, size)
```
> Draw random samples from a normal (Gaussian) distribution.

**[probability density function for normal distribution](https://en.wikipedia.org/wiki/Normal_distribution)**
$$f(x|\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-2)^2}{2\sigma^2}}$$


#### SAMPLE FROM NORMAL DISTRIBUTION 
   we want to take a certain number of random draws from a 
   normal distribution with a given mean and standard deviation

In [None]:
mu, sigma = 0, 0.1 # mean and standard deviation
draws = 1000
s = np.random.normal(mu, sigma, draws)

#### PLOT NORMAL DISTRIBUTION WITH NORMAL PDF
   we will plot a histogram of the samples `s` as well as a pdf using the 
   formula above for a normal distribution with the same parameters

In [None]:
count, bins, ignored = plt.hist(s, 30, normed=True) # plt.hist returns three positional arguments
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
         np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
         linewidth=2, color='r')
plt.show()

# UNIFORM DISTIBUTION

**[numpy.random.uniform](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.uniform.html)**
```python
numpy.random.uniform(low, high, size)
```
> Draw samples from a uniform distribution.

#### SAMPLE FROM UNIFORM DISTRIBUTION 
   we want to take a certain number of random draws from a 
   uniform distribution with a given lower and upper bound

In [None]:
bins = 30
draws = 10000
s = np.random.uniform(0, 1, draws)

#### PLOT UNIFORM DISTRIBUTION WITH PDF
   we will plot a histogram of the samples s as well as a pdf for a uniform
   ie the distribution where the number of draw is spread equally across all
   bins, `draws/bins`

In [None]:
plt.hist(s, bins)
plt.plot((0, 1), (draws/bins, draws/bins), color='red') 
plt.show()

# BINOMIAL DISTIBUTION

   A **bi**nomial distribution only has two outcomes, such as Heads or Tails
   on a coinflip, and the probability of either outcome is set by a single probability parameter.
   We want to see what the probability of getting 0 to n Heads would be for n random draws (also called Bernoulli    trials) from a binomial probability mass distribution. Probability Mass Functions (pmf) are the discrete
   equivalent of the continous Probability Density Functions (pdf).


**[scipy.stats.binom](https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.binom.html)**
```python
scipy.stats.binom = <scipy.stats._discrete_distns.binom_gen object>
```
> A binomial discrete random variable.
As an instance of the rv_discrete class, binom object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

```python
mean, var, skew, kurtosis = scipy.stats.binom.stats(n, p, moment='mvsk')
```

> Mean(‘m’), variance(‘v’), skew(‘s’), and/or kurtosis(‘k’).

```python
scipy.stats.binom.ppf(q, n, p)
```

> Percent point function (inverse of `cdf` — percentiles).

```python
scipy.stats.binom.pmf(k, n, p)
```
> Probability mass function.

#### RENDERING MUTIPLE GRAPHS PER FIGURE

**[matplotlib.pyplot.subplots](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.subplots.html)**
```python
figure, axes = matplotlib.pyplot.subplots(nrows, ncols, **kwargs)
```
> Create a figure and a set of subplots. This utility wrapper makes it convenient to create common layouts of subplots, including the enclosing figure object, in a single call.  
> [here's pretty good overview of how `figure`, `axes`, and other subplot grid functions work in `matplotlib`](http://nbviewer.jupyter.org/github/WeatherGod/AnatomyOfMatplotlib/blob/master/AnatomyOfMatplotlib-Part1-Figures_Subplots_and_layouts.ipynb)

A note about **[`matplotlib.pyplot.axes`](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.axes.html)**
> *Speaking of terminology, I must pause here and complain about one aspect of Matplotlib itself: the Axes class is horribly named. First, it’s a plural word used for singular instances, so the documentation is littered with references to “an Axes,” which is jarring to any native speaker of English. Even worse, the name of the Axes class is thoroughly misleading. If you know what the word axes means, you would think that the Axes class would be restricted to things like plot limits, log vs. linear scaling, tick mark spacing, tick labeling, and so on. But you’d be wrong. All of the methods for plotting data are Axes methods, which makes no sense. Why the Axes class wasn’t called Plot or Frame or Graph is beyond me, but that name has been my biggest source of confusion when reading the documentation. Even after I knew that Axes covered more than just the axes, my natural tendency to assign the usual meaning to the word would lead me astray.*  
[Source](http://leancrew.com/all-this/2013/07/the-matplotlib-documentation-problem/)



#### GENERATE 2 PMF BINOMIAL GRAPHS IN A FIGURE

In [None]:
from scipy.stats import binom

# createa a subplot matrix and grab the positional arguments figure (fig) and axes (ax).
# axes in this case is a numpy array with each index being a row in the plot
fig, ax = plt.subplots(2, 1) 


# SET PARAMS FOR BINOMIAL PMF

n, p = 5, 0.4 # set the number of draw and the p value for the binomial pmf
mean, var, skew, kurt = binom.stats(n, p, moments='mvsk')

# MAKE AXES FOR BOUNDED PLOT

x = np.arange(binom.ppf(0.01, n, p), # get a range of integer x values where the 
              binom.ppf(0.99, n, p)) # p(Heads) is from 0.01 to 0.99
b_pmf = binom.pmf(x, n, p) #get Y values for a binomial pmf using the values in x

# plot to the first axes in the figure the pmf, set the title, and set a vertical 
# line up to the pmf scatter plot value (for style!)
ax[0].plot(x, b_pmf, 'bo', ms=8, label='binom pmf')
ax[0].set_title('Bounded Binomial PMF (1st to 99th Percentile)')
ax[0].vlines(x, 0, binom.pmf(x, n, p), colors='b', lw=5, alpha=0.5)

# MAKE AXES FOR ALL OUTCOMES PLOT

x2 = np.arange(0, n+1, 1)
b_pmf2 = binom.pmf(x2, n, p)
             
ax[1].plot(x2, b_pmf2, 'ro', ms=8, label='binom pmf')
ax[1].set_title('Binomial PMF')
ax[1].vlines(x, 0, binom.pmf(x, n, p), colors='r', lw=5, alpha=0.5)

# PLOT FIGURE

plt.tight_layout() # ensures that subplot axes are not overlapping. this is an experimental method

# POISSON DISTIBUTION

A Poisson distribution is a discrete distribution used to describe the probability of an event occuring within
a fixed interval, often time of space. The number of events expected to occur iwthin the interval is described using the lambda parameter, and the events are considered independent of one another, ie the event occuring at some time point is independent of all other time point (ie the probability of an even occuring at all time points remains the same).
> *For instance, an individual keeping track of the amount of mail they receive each day may notice that they receive an average number of 4 letters per day. If receiving any particular piece of mail does not affect the arrival times of future pieces of mail, i.e., if pieces of mail from a wide range of sources arrive independently of one another, then a reasonable assumption is that the number of pieces of mail received in a day obeys a Poisson distribution. Other examples that may follow a Poisson include the number of phone calls received by a call center per hour and the number of decay events per second from a radioactive source.*  
[Source](https://en.wikipedia.org/wiki/Poisson_distribution)

For sufficiently large values of lambda, (say >1000), the normal distribution with mean=lambda and variance=lambda ( standard deviation=sqrt(lambda) ) is an excellent approximation to the Poisson distribution.

**[numpy.random.poisson](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.poisson.html)**
```python
numpy.random.poisson(lam, size)
```
> Draw samples from a Poisson distribution.



#### SAMPLE FROM POISSON DISTRIBUTIONS WITH DIFFERENT LAMBDA VALUES AND PLOT VS NORMAL APPORXIMATION

In [None]:
import math

#figsize is one of the keyword arguments subplots takes for(width, height) in inches
fig, ax = plt.subplots(6,1, figsize=(8, 12)) 

lambdas = [1,5,10,100,1000,10000] # these are the lambda values we will loop through
for index, lambduh in enumerate(lambdas): # 'lambda' is a protected keyword in python used to write short effecient functions
    # GENERATE RANDOM POISSON SAMPLES
    s = np.random.poisson(lambduh, 10000)
     
    # PLOT HISTOGRAM FOR THIS LAMBDA VALUE
    
    # num_bins give an integer value of 10 for lambduh less than 1000, otherwise lambduh/100 + 10
    num_bins = int(math.ceil(lambduh/100)+10)                   
    count, bins, ignored = ax[index].hist(s, num_bins, normed=True)
    
    # PLOT NORMAL DISTIBUTION WITH MEAN AND VARIANCE SET TO LAMBDA
    sigma = lambduh ** .5
    mu = lambduh
    ax[index].plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
                   np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
                   linewidth=2, color='r')
    ax[index].set_title(('lambda = {}').format(lambduh))
plt.tight_layout()

# [PyPlot tutorial](https://matplotlib.org/users/pyplot_tutorial.html) - excellent for more practice plotting with matplotlib
# [Seaborn](https://seaborn.pydata.org/) - a plotting package built on top of matplolib that makes beautiful graphs