# Lec 8: Probability Simulations
***

In this notebook you'll see how we can use Numpy to run probability simulations to estimate probabilities, gain intuition about random processes and to check your pencil and paper work. 

We'll need Numpy and Matplotlib for this notebook, so let's load and setup those libraries. 

In [3]:
import numpy as np 
import matplotlib.pylab as plt
%matplotlib inline

### Estimating Simple Probabilities 
*** 

In this example we'll see how we can use the Numpy function [np.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) to make random draws from a sample space and estimate the probability of certain random events. 

In [None]:
select_from = ["A", "B", "C", "D", "E", "F"]

We can simulate randomly selecting from this arry using `np.random.choice`, which returns a randomly selected entry from a Numpy array.  
If no optional parameters are passed in, `np.random.choice` assigns an **equal probability** to each entry of the array.   

In [None]:
...

We can simulate repeatedly selecting (with replacement) by specifying the size of an array:

In [None]:
np.random.choice(select_from,size=4)  #This assumes you are rolling WITH replacement

In [None]:
np.random.choice(select_from, size=2, replace=False) #Selects WITHOUT replacement

**Simulating Coin Tosses**

As a simple example, consider a fair coin.  We can represent the sample space for this coin with a Numpy array with two entries: "H" and "T"

In [None]:
coin = np.array(["H","T"])
coin

We can simulate flipping the coin using `np.random.choice` (https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html), which returns a randomly selected entry from a Numpy array.  
If no optional parameters are passed in, `np.random.choice` assigns an **equal probability** to each entry of the array.   

In [None]:

np.random.choice(coin)

We can simulate many flips of the coin and store the results in an array by passing the size parameter to np.random.choice. 

In [None]:
flips = np.random.choice(coin, size=10)
print(flips)

We can also count the number of times we get heads:

In [None]:
flips = np.random.choice(coin, size=10)
print(flips)
sum(flips == 'H')

We can alter the probability with which `np.random.choice` selects a particular entry of the sample space array by passing in an optional array of probabilities, e.g. p = [0.75, 0.25]:

In [None]:

biased_flips = np.random.choice(coin, p=[0.75, 0.25], size=10)

print(biased_flips)
sum(biased_flips == 'H')

*** 
###  Example 1: 
Simulate rolling a 6-sided die, 10 times:

In [None]:
die = ...
roll = ...
roll

**SideNote**: If instead of passing a list/array you pass an int, to `np.random.choice` the random sample is generated as if it were `np.arange(a)`

In [None]:
np.arange(5)

In [None]:
np.random.choice(5, size=10)

*** 
### Example 2- Flipping a Fair Coin

Now suppose we want to run a simple simulation to estimate the probability  that the coin comes up Heads (which we expect to be $0.5$ because the coin is fair).  One way to do this is to do a large number of coin flips and then divide the number of flips that come up Heads by the total number of flips. The following code flips the coin 50 times and computes the desired ratio: 

In [None]:
# The seed() method is used to initialize the random number generator.
# The random number generator needs a number to start with (a seed value),
#   to be able to generate a random number.
# By default the random number generator uses the current system time.
# We can use the seed() method to customize the start number of the random number generator
# so that we get the same results each time.

np.random.seed(12345)

flips = np.random.choice(coin, size=50)
approx_prob_heads = ...

print("the probability of heads is approximately {:.3f}".format(approx_prob_heads))

OK, so the simulation estimated that the probability of the coin coming up heads is $0.36$, which is pretty far off from the $0.5$ that we expected.  This is likely because we didn't do very many coin flips.  Let's see what happens if we rerun the simulation with $500$ coin flips. 

In [None]:

flips = np.random.choice(coin, size=500)
approx_prob_heads = np.sum(flips == "H") / len(flips)
print("the probability of heads is approximately {:.3f}".format(approx_prob_heads))

With $500$ coin flips our estimate came out much closer to $0.5$



It's an interesting exercise to make a plot of the running estimate of the probability as the number of coin flips increases.  We'll use the same random sequence of coin flips from the previous simulation.  

We'll be using the Python `range` function (https://docs.python.org/3/library/functions.html#func-range) in the following code, so let's see how it works:


In [None]:
# Notice how range function works:

for ii in range(5):
    print(ii)

In [None]:

def plot_estimates(n):
    
    flips = np.random.choice(["H","T"], size=n)
    running_prob = []

#Keep a "running estimate" of the probability of getting a heads as num_trials gets larger:    
    for ii in range(n):
        num=...
        denom=...
        running_prob.append(num/denom)
         # A growing sequence with ratio of 'H' being counted per times coin flipped.
        # Suppose we got T, H, T, ...
        # running_prob[0]=0/1   running_prob[1]=1/2   running_prob[2]=1/3   etc
        # running_prob = [0, 0.5, 0.3333..., etc]
    
    return running_prob



# Run code for num trials
num_trials=10000
p=plot_estimates(num_trials)

print("the probability of heads is approximately {:.3f}".format(p[num_trials-1]))
 
# Plot running estimate of probability of getting heads as num_trials gets larger:
fig, ax = plt.subplots(figsize=(12,6))

# plot the terms in p
ax.plot(p, color="steelblue",label="Simulated probability")

#Plot the theoretical probability
plt.axhline(y = 0.5, color = 'r', linestyle = '-', label = "Theoretical probability")

# put labels on the axes and give the graph a title.
ax.set_title("Running Estimate of Probability of Heads", fontsize=20)
ax.set_xlabel("Number of Flips", fontsize=16)
ax.set_ylabel("Estimate of Probability", fontsize=16)
# fix the y-axis to be between 0 and 1:
ax.set_ylim(0,1)
# include a legend:
ax.legend()
# put a faded grid behind the graphic
ax.grid(True, alpha=0.25)


Notice that for very few flips the estimate of the probability is understandably poor.  But as the number of flips increases the estimate settles down to very close to the expected $0.5$. 

***

### Example 3- Drawing Cards

What's the probability of drawing 2 hearts from a standard 52-card deck?

a).  P(2 hearts) = ...

b).  Write a simulation to verify your results.



In [None]:
# Create a 52 card deck (use string 'D2' to represent the 2 of Diamonds)


suits=['D','H','C','S']
ranks = ['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']


# Build a deck
deck = ...

print(deck)


In [None]:

#Write a function that takes a list of 2 cards as input and returns True if both are hearts. Otherwise it returns False.

def check_two_hearts(twocards):   
       
    #Check if they're both hearts
    return ...
       
    

Test your function:

In [None]:
check_two_hearts(['H7','S8']) 

#assert check_two_hearts(['H7','S7']) == False


Notice: In this scenario we want to draw two cards **WITHOUT REPLACEMENT**


**OPTION 1:** 
`np.random.choice(cards, replace=False, size=2)`


**OPTION 2:** `np.random.shuffle` https://numpy.org/doc/stable/reference/random/generated/numpy.random.shuffle.html
This randomly shuffles the items in a list




 

In [None]:
print(deck[:5])

np.random.shuffle(deck)

print(deck[:5])

In [None]:
# Function to simulate a single draw of 2 cards (WITHOUT REPLACEMENT) and check if both draws are hearts.

def single_draw():
    ...
    
    
    return check_two_hearts(first_two)
  

out = single_draw()
out

In [None]:
def probability_of_two_hearts(num_samples=100000):
    # simulate random draws   
    
    hearts = np.array([single_draw() for ii in range(num_samples)])
    # compute fraction of draws that resulted in 2 hearts
    #print(hearts)
    return hearts.sum()/num_samples


probability_of_two_hearts()

This is very close to the actual answer we analytically derived in part (a).