# Distributions in Pandas
------

- Distribution means: Set of all possible random variables.
- Example:
 - Flipping a **coin** for heads and tails
 - A **binomial distribution** (two possible outcomes)
 - Discrete (categories of heads an tails, no real numbers)
 - Evenle weighted (heads are just as likely as tails)

In [None]:
import pandas as pd
import numpy as np

- `numpy` actually has some distributions built into it allowing us to make random flips of a coin with given parameters. 
- Here we ask for a number from the `numpy` binomial distribution. We have two parameters to pass in. The first is the number of times we want it to run. The second is the chance we get a zero, which we will use to represent heads here. 

In [None]:
np.random.binomial(1, 0.5)

- What if we run the simulation a thousand times and divided the result by a thousand. Well you see a number pretty close to 0.5 which means half of the time we had a heads and half of the time we had a tails. 

In [None]:
np.random.binomial(1000, 0.5) / 1000

- We can also have unevenly weighted binomial distributions. For instance what's the chance that there will be a tornado today? It is pretty low. So maybe there is a hundredth of a percentage chance. 
- We can put this into a binomial distribution as a weighting in NumPy. If we run this 100,000 times we see there are pretty minimal tornado events. 

In [None]:
chance_of_tornado = 0.01 / 100
np.random.binomial(100000, chance_of_tornado)

- Let's take one more example. Let's say the chance of a tornado here in Ann Arbor on any given day, is 1% regardless of the time of year. And lets say if there's a tornado I'm going to get away from the windows and hide, then come back and do my recording the next day. So what's the chance of this happening two days in a row?  

$~$

- Here we create an empty list and we create a number of potential tornado events by asking the NumPy binomial function using our chance of tornado. We'll do this a million times which is just shy of 3,000 years worth of events. 
- This process is called sampling the distribution. 
- Now we can write a little loop to go through the list and look for any two adjacent pairs of ones which means that there were two days that had back to back tornadoes. 

In [None]:
chance_of_tornado = 0.01

tornado_events = np.random.binomial(1, chance_of_tornado, 1000000)

two_days_in_a_row = 0

for j in range(1, len(tornado_events)-1):
    if tornado_events[j]==1 and tornado_events[j-1]==1:
        two_days_in_a_row += 1
        
print(f'{two_days_in_a_row} tornadoes back to back in {1000000/365}')

- Many of the distributions you use in data science are not discrete binomial, and instead are continues where the value of the given observation isn't a category like heads or tails, but can be represented by a real number. 
- It's common to then graph these distributions when talking about them, where the x axis is the value of the observation and the y axis represents the probability that a given observation will occur.  
- If all numbers are equally likely to be drawn when you sample from it, this should be graphed as a flat horizontal line. And this flat line is actually called the **uniform distribution**.  

$~$

- There are few other distributions that get a lot more interesting. Let's take the **normal distribution** which is also called **Gaussian Distribution** or sometimes, a **Bell Curve**. 
- This distribution looks like a hump where the number which has the highest probability of being drawn is a zero, and there are two decreasing curves on either side of the X axis. 
- One of the properties of this distribution is that the mean is zero, not the two curves on either side are symmetric. 