# Probability distributions & meteor shower gazing


**Our goals for today:**
- Discuss some key statistics topics: samples versus populations and empirical versus theorectical distributions
- Simulate a head/tail coin toss and well drilling i.e. Binomial distribution
- Simulate meteors entering Earth's atmosphere i.e. Poisson distribution
- Simulate geomagnetic polarity reversals i.e. Gamma distribution


## Setup

Run this cell as it is to setup your environment.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as scipy
import scipy.stats as sps

## Flipping a coin

Let's pretend we are flipping a coin 10 times using ```np.random.choice([0, 1])```. How many times will be get heads? 1 is heads, 0 is tails. Let's use a for loop and get Python to simulate such a coin flip scenario for us.

This code block is the first time we are using a **for loop**. For loops, result in a a chunk of code (in python the chunk that is indented) being run multiple times. In this case, the code will get looped through 10 times -- specified by ```range(0,10)```.

In [None]:
for flip in range(0,10):
    flip_result = np.random.choice([0, 1])
    print(flip_result)

Now let's record how many times the result was heads. We will make a list called `flip_results` and have it be blank to start. Each time we go through the code we will append the result to the list:

In [None]:
flip_results = []

for flip in range(0,10):
    flip_result = np.random.choice([0, 1])
    flip_results.append(...)
    
flip_results    

We can calculate how many times were heads by taking the sum of the list:

In [None]:
np.sum(flip_results)

Now let's flip the coin 10 times and do that 10 times. Each time we flip it, let's record how many heads resulted from the flip.

In [None]:
number_heads = []

for flip_experiment in range (0,10):

    flip_results = []
    
    for flip in range(0,10):
        flip_result = np.random.choice([0, 1])
        flip_results.append(flip_result)
    
    number_heads.append(...)
        
number_heads

In [None]:
plt.hist(number_heads,bins=[-0.5,0.5,1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5,9.5,10.5],density=True)
plt.show()

<font color=goldenrod>**_Code for you to write_**</font>

Instead of doing 10 coin flips 10 times, do 10 coin flips 1000 times. Plot the histogram of the result.

## Binomial distribution:

### Theoretical

A relatively straight-forward distribution is the _binomial_ distribution which describes the probability of a particular outcome when there are only two possibilities (yes or no, heads or tails, 1 or 0).   For example, in a coin toss experiment (heads or tails), if we flip the coin  $n$ times, what is the probability of getting $x$ 'heads'?  We assume that the probability $p$ of a head for any given coin toss is 50%; put another way $p$ = 0.5.  

The binomial distribution can be described by an equation: 

$$P=f(x,p,n)= \frac{n!}{x!(n-x)!}p^x(1-p)^{n-x}$$

We can look at this kind of distribution by evaluating the probability for getting $x$ 'heads' out of $n$ attempts. We'll code the equation as a function, and calculate the probability $P$ of a particular outcome (e.g., $x$ heads in $n$ attempts). 

Note that for a coin toss, $p$ is 0.5, but other yes/no questions can be investigated as well (e.g., chance of finding a fossil in a sedimentary layer, whether or not a landslide occurs folling an earthquake). 

In [None]:
def binomial_probability(x,p,n):
    """
    This function computes the probability of getting x particular outcomes (heads) in n attempts, where p is the 
    probability of a particular outcome (head) for any given attempt (coin toss).
    
    Parameters
    ----------
    x : number of a particular outcome
    p : probability of that outcome in a given attempt
    n : number of attempts
    
    Returns
    ---------
    prob : probability of that number of the given outcome occuring in that number of attempts
    """

    prob = (np.math.factorial(n)/(np.math.factorial(x)*np.math.factorial(n-x)))*(p**(x))*(1.-p)**(n-x)

    return prob

We can use this function to calculate the probability of getting 10 heads ($x=10$) when there are 10 coin tosses ($n=10$) given with the $p$ (probability) of 0.5.  

In [None]:
binomial_probability(x=10,p=0.5,n=10)

Let's calculate the probability of getting [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10] heads.

In [None]:
head_numbers = np.arange(0,11)
head_numbers

In [None]:
prob_heads = 0.5
n_flips = 10
probabilities = []

for head_number in head_numbers:
    prob = binomial_probability(...)
    probabilities.append(prob)
    
probabilities

<font color=goldenrod>**_Code for you to write_**</font>

Make a plot where you both plot the histogram from 1000 coin flips (using ```plt.hist()``` with ```density=True```) and you plot the results head_numbers probabilities (using ```plt.plot()```).

In [None]:
plt.hist()
plt.plot()

plt.xlabel('Number of heads out of $n$ attempts') # add labels
plt.ylabel('Fraction of times with this number of heads') 

plt.title('Coin flip results (n=10)');

Hopefully what we should see is that number of coin flips from our random samples matches the theoritical probability distribution pretty well. The more flip experiments we numerically take, the better it should match.

### Empirical

The type of sampling we were doing above where we were flipping coins is called a _Monte Carlo simulation_. We can use simulate data from all sorts of distributions. Let's keep focusing on the binomial distribution and look at using the ```np.random.binomial``` function.

In [None]:
help(np.random.binomial)

`np.random.binomial( )` requires 2 parameters, $n$ and $p$, with an optional keyword argument `size` (if `size` is not specified, it returns a single trial). We could have used this function earlier to get the number of heads that were flipped, but the way we did it also worked.

Let's follow the example the is given in the `np.random.binomial( )` docstring.

A company drills 9 wild-cat oil exploration wells (high risk drilling in unproven areas), each with an estimated probability of success of 0.1. All nine wells fail. What is the probability of that happening? *Note that success in this context means that liquid hydocarbons came out of the well. In reality, you may not consider this a success given that the result is that more hydrocarbons will be combusted as a result, leading to higher atmospheric carbon dioxide levels and associated global warming.*

If we do ```np.random.binomial(9, 0.1, 100)``` we will get a list of 100 values that represent the number of wells that yielded oil when there is a 10% (p = 0.1) chance of each individual well yielding oil.

In [None]:
np.random.binomial(9, 0.1, 100)

In [None]:
np.random.binomial(9, 0.1, 100) == 0

In [None]:
sum(np.random.binomial(9, 0.1, 100) == 0)

We can write a function that uses this process to simulate fraction of times that there no successful wells for a given number of wells, a given probability and a given number of simulations;

In [None]:
def wildcat_failure_rate(n_wells,prob,n_simulations):
    '''
    Simulate the number of times that there are no successful wells for a given number of wells and a given probability for each well.
    
    Parameters
    ----------
    n_wells : number of wells drilled in each simulation
    prob : probability that each well will be successful
    n_simulations : number of times that drilling n_wells is simulated
    '''
    
    failure_rate = sum(np.random.binomial(n_wells, prob, n_simulations) == 0)/n_simulations
    return failure_rate

<font color=goldenrod>**Put the `wildcat_failure_rate` function to use**</font>

Use the function to simulate the failure rate for the above scenario (10 wells drilled, 0.1 probability of success for each well) and do it for 10 simulations

<font color=goldenrod>**Put the `wildcat_failure_rate` function to use**</font>

Use the function to simulate the failure rate for the same scenario for 1000 simulations

<font color=goldenrod>**Put the `wildcat_failure_rate` function to use**</font>

Use the function to simulate the failure rate for 100,000 simulations

<font color=goldenrod>**Put the `binomial_probability` function to use**</font>

In the examples above we are simulating the result. Instead we could use the binomial_probability distribution to calculate the probability. Go ahead and do this for this wildcat drilling example. 

In [None]:
binomial_probability()

**How well does the calculated binomial_probability match the simulated wildcat_failure rates? How many times do you need to simulate the problem to get a number that matches the theoretical probability?** 

*Write your answer here*

## Poisson distribution:

A Poisson Distribution gives the probability of a number of events in an interval generated by a Poisson process: the average time between events is known, but the exact timing of events is random. The events must be independent and may occur only one at a time.

Within Earth and Planetary Science there are many processes that approximately meet this criteria.

### Theoretical

The Poisson distribution gives the probability that an event (with two possible outcomes) occurs $k$ number of times in an interval of time where $\lambda$ is the expected rate of occurance. The Poisson distribution is the limit of the binomial distribution for large $n$. So if you take the limit of the binomial distribution as $n \rightarrow \infty$ you'll get the Poisson distribution:

$$P(k) = e^{-\lambda}\frac{\lambda^{k}}{k!}$$


In [None]:
def poisson_probability(k,lam):
    """
    This function computes the probability of getting k particular outcomes when the expected rate is lam.
    """
    
    # compute the poisson probability of getting k outcomes when the expected rate is lam
    prob = (np.exp(-1*lam))*(lam**k)/np.math.factorial(k)
    
    #return the output
    return prob

## Observing meteors

<img src="./images/AMS_TERMINOLOGY.png" width = 600>

From https://www.amsmeteors.org/meteor-showers/meteor-faq/:

> **How big are most meteoroids? How fast do they travel?** The majority of visible meteors are caused by particles ranging in size from about that of a small pebble down to a grain of sand, and generally weigh less than 1-2 grams. Those of asteroid origin can be composed of dense stony or metallic material (the minority) while those of cometary origin (the majority) have low densities and are composed of a “fluffy” conglomerate of material, frequently called a “dustball.” The brilliant flash of light from a meteor is not caused so much by the meteoroid’s mass, but by its high level of kinetic energy as it collides with the atmosphere.

> Meteors enter the atmosphere at speeds ranging from 11 km/sec (25,000 mph), to 72 km/sec (160,000 mph!). When the meteoroid collides with air molecules, its high level of kinetic energy rapidly ionizes and excites a long, thin column of atmospheric atoms along the meteoroid’s path, creating a flash of light visible from the ground below. This column, or meteor trail, is usually less than 1 meter in diameter, but will be tens of kilometers long.

> The wide range in meteoroid speeds is caused partly by the fact that the Earth itself is traveling at about 30 km/sec (67,000 mph) as it revolves around the sun. On the evening side, or trailing edge of the Earth, meteoroids must catch up to the earth’s atmosphere to cause a meteor, and tend to be slow. On the morning side, or leading edge of the earth, meteoroids can collide head-on with the atmosphere and tend to be fast.

> **What is a meteor shower? Does a shower occur “all at once” or over a period of time?** Most meteor showers have their origins with comets. Each time a comet swings by the sun, it produces copious amounts of meteoroid sized particles which will eventually spread out along the entire orbit of the comet to form a meteoroid “stream.” If the Earth’s orbit and the comet’s orbit intersect at some point, then the Earth will pass through this stream for a few days at roughly the same time each year, encountering a meteor shower. The only major shower clearly shown to be non-cometary is the Geminid shower, which share an orbit with the asteroid (3200 Phaethon): one that comes unusually close to the sun as well as passing through the earth’s orbit. Most shower meteoroids appear to be “fluffy”, but the Geminids are much more durable as might be expected from asteroid fragments.

## Observing the Southern Taurids meteor shower

Let's say you are planning to go out and try to see shooting stars tonight in a rural location. You might be in luck because there is an active shower:

> **Southern Taurids**

> *Active from September 10th to November 20th, 2019*

> The Southern Taurids are a long-lasting shower that reaches a barely noticeable maximum on October 9 or 10. The shower is active for more than two months but rarely produces more than five shower members per hour, even at maximum activity. The Taurids (both branches) are rich in fireballs and are often responsible for increased number of fireball reports from September through November. https://www.amsmeteors.org/meteor-showers/meteor-shower-calendar/

At a rate of 5 observed meteors per hour, what is the probability of observing 6?

We can use the Poisson probability function to answer this question:

In [None]:
lamb = 5
k = 6
prob = poisson_probability(k,lamb)
print (prob)

So that result tells us that there is a 14.6% chance of observing exactly 6, but it would be much more helpful to be able to visualize the probability distribution. So let's go through and calculate the probability of seeing any number between 0 and 10. First, we can make an array between 0 and 11:

In [None]:
number_meteors_seen = np.arange(0,11)
number_meteors_seen

In [None]:
taurid_meteor_sighting_probability = []
taurid_meteor_rate = 5

for n in number_meteors_seen:
    prob = poisson_probability(number_meteors_seen[n],taurid_meteor_rate)
    taurid_meteor_sighting_probability.append(prob)

In [None]:
taurid_meteor_sighting_probability

In [None]:
plt.plot(number_meteors_seen,taurid_meteor_sighting_probability,label='Southern Taurids ($\lambda = 5$)')
plt.legend()
plt.show()

When there is not an active shower the background meteor rate is about 2 an hour (although it is variable depending on time of night and season; see more here: https://www.amsmeteors.org/meteor-showers/meteor-faq/).

<font color=goldenrod>**_Code for you to write_**</font>

- **Calculate the probability of seeing different numbers of meteors when the background rate is 2 an hour (lambda = 2).**
- **Plot that probability alongside the probability of seeing those same numbers during the Southern Taurids shower.**

## Simulate meteor observing

There are many cases where it can be useful to simulate data sets. In this case, one could simulate what your experience could be in terms of the number of hours you could spend looking at the night sky and seeing 1 meteor or more on a normal night vs. a night with the Southern Taurids shower ongoing.

We can use the `np.random.poisson` function to simulate 'realistic' data.  

`np.random.poisson( )` requires 1 parameter `lam` and an optional parameter `size`.  Each call to `np.random.poisson( )` returns `size` number of draws from a Poisson distribution with $\lambda =$ `lam`.

Let's try it with $\lambda = 2$ (the background rate).

In [None]:
lam = 2
number_hours_watched = 100
number_hours_w_meteor_sighting = []

for n in np.arange(0,number_hours_watched):
    number_meteors = np.random.poisson(lam)
    if number_meteors >= 1:
        number_hours_w_meteor_sighting.append(1)
        
fraction_w_sighting = len(number_hours_w_meteor_sighting)/number_hours_watched
print('percent of hours watched with a meteor sighting')
print(fraction_w_sighting*100)

<font color=goldenrod>**_Code for you to write_**</font>

- **Do the same meteor watching simulation with $\lambda = 5$ (the Southern Taurids rate). Do it for 10 hours, 100 hours, 100,000 hours.** 

**Export the notebook as .html and upload to bCourses**