# Probability Simulations
***

In this notebook you'll see how we can use Numpy to run probability simulations to estimate probabilities, gain intuition about random processes and to check your pencil and paper work. 

We'll need Numpy and Matplotlib for this notebook, so let's load and setup those libraries. 

In [None]:
import numpy as np 
import matplotlib.pylab as plt
%matplotlib inline

## Simulating Probabilities

As we discussed in lecture, the frequentist definition of probability defines it in terms of a limit:


$$P(Event) = \lim_{n \to \infty} \frac{count(Event)}{n}$$



In English this reads: Perform n trials of an "experiment" which could result in a particular “Event” occurring. 

The probability of the event occurring, P(Event) , is the ratio of trials that result in the event, written as  count(Event), to the number of trials performed, n. 

In the limit, as your number of trials approaches infinity, the ratio will converge to the true probability.


One of the big payoffs of simulation is that it can let us answer some probability questions that are otherwise quite difficult.  We can instead just **simulate the experiment a large number of times and get approximate results based on simulation.**


There are several ways to draw random samples using Python:


#### Random Sampling in Pandas: 
- `df.sample(n)` (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html) draws a random sample of `n` rows from the DataFrame `df`. The output is a DataFrame consisting of the sampled rows. 



#### Random Sampling in Numpy:  https://numpy.org/doc/stable/reference/random/index.html

The `numpy.random`module implements pseudo-random number generators (PRNGs or RNGs, for short) with the ability to draw samples from a variety of probability distributions.   In general, you will create a Generator instance with default_rng and call the various methods on it to obtain samples from different distributions.


In [None]:
# Create a Generator instance:
rng = np.random.default_rng() 


- `rng.choice(a)` (https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) draws a random sample from a population whose elements are in an array `a`. The output is an array consisting of the sampled elements.

- `rng.shuffle(a)`
(https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.shuffle.html#numpy.random.Generator.shuffle)  Modifies an array or sequence in-place by shuffling its contents. 
  
Warning: The pseudo-random number generators implemented in the `numpy.random` module are designed for statistical modeling and simulation. They are not suitable for security or cryptographic purposes. See the `secrets` module from the standard library for such use cases.




### Estimating Simple Probabilities 
*** 

Suppose we wanted to randomly sample from the first 6 letters in the alphabet:

In [None]:
select_from =["A", "B", "C", "D", "E", "F"]

We can simulate randomly selecting from this list using `rng.choice`, which returns a randomly selected entry from a Numpy array.  
If no optional parameters are passed in, `rng.choice` assigns an **equal probability** to each entry of the array.   

In [None]:
rng.choice(select_from)

We can simulate repeatedly selecting (with replacement) by specifying the size of an array:

In [None]:
#Write code to randomly sample 10 times WITH replacement
...

In [None]:
#Write code to randomly sample 5 times WITHOUT replacement
...

**Simulating Coin Tosses**

As a simple example, consider a fair coin.  We can represent the sample space for this coin with a Numpy array with two entries: "H" and "T"

In [None]:
coin = np.array(["H","T"])
coin

In [None]:
rng.choice(coin)

We can simulate many flips of the coin and store the results in an array by passing the size parameter:

In [None]:
flips = rng.choice(coin, size=10)
print(flips)

We can also count the number of times we get heads:

In [None]:
flips = rng.choice(coin, size=10)
print(flips)
sum(flips == 'H')

We can alter the probability with which `choice` selects a particular entry of the sample space array by passing in an optional array of probabilities, e.g. p = [0.75, 0.25]:

In [None]:

biased_flips = rng.choice(["H", "T"], p=[0.75, 0.25], size=10)

print(biased_flips)
sum(biased_flips == 'H')

*** 
###  Example 1: 
Simulate rolling a 6-sided die, 10 times:

In [None]:
die = ...
roll = ...
roll

**SideNote**: If instead of passing a list/array you pass an int, to `rng.choice` the random sample is generated as if it were `np.arange(a)`

In [None]:
np.arange(1,10,2)

In [None]:
rng.choice(5, size=10)

*** 
### Example 2- Flipping a Fair Coin

Now suppose we want to run a simple simulation to estimate the probability  that the coin comes up Heads (which we expect to be $0.5$ because the coin is fair).  One way to do this is to do a large number of coin flips and then divide the number of flips that come up Heads by the total number of flips. The following code flips the coin 50 times and computes the desired ratio: 

In [None]:
# The seed() method is used to initialize the random number generator.
# The random number generator needs a number to start with (a seed value),
#   to be able to generate a random number.
# By default the random number generator uses the current system time.
# We can use the seed() method to customize the start number of the random number generator
# so that we get the same results each time.

rng = np.random.default_rng(seed=152)


# Write code to simulate flipping a coin 50 times and approximating the probability of heads using your simulation:
flips = ...
approx_prob_heads = ...

print("the probability of heads is approximately {:.3f}".format(approx_prob_heads))

OK, so the simulation estimated that the probability of the coin coming up heads is pretty far off from the $0.5$ that we expected.  This is likely because we didn't do very many coin flips.  Let's see what happens if we rerun the simulation with $500$ coin flips. 

In [None]:

flips = rng.choice(coin, size=50000)
approx_prob_heads = sum(flips=="H")/len(flips)
print("the probability of heads is approximately {:.3f}".format(approx_prob_heads))

With $500$ coin flips our estimate came out much closer to $0.5$



It's an interesting exercise to make a plot of the running estimate of the probability as the number of coin flips increases.    This is one way to visualize a simulation of the frequentist definition of probability. 


We'll use the same random sequence of coin flips from the previous simulation.  

We'll be using the Python `range` function (https://docs.python.org/3/library/functions.html#func-range) in the following code, so let's see how it works:


In [None]:
# Notice how range function works:

for ii in range(5):
    print(ii)

In [None]:

def plot_estimates(n):
    """ Simulates flipping a fair coin n times and returns running \\
        estimate of the probability of getting a heads as num_trials gets larger"""

        # Ex: Running_prob is a list with ratio of 'H' being counted per times coin flipped.
        # Suppose we got T, H, T, ...
        # running_prob[0]=0/1   running_prob[1]=1/2   running_prob[2]=1/3   etc
        # running_prob = [0, 0.5, 0.3333..., etc]
    
    flips = rng.choice(["H","T"], size=n)
    running_prob = []

    for ii in range(n):
        num= ...
        denom= ...
        running_prob.append(num/denom)
         
    
    return running_prob

# Run code for num trials
num_trials=10000

p = ...
x = ...

print("the probability of heads is approximately {:.3f}".format(p[num_trials-1]))
 
# Plot running estimate of probability of getting heads as num_trials gets larger:
fig, ax = plt.subplots(figsize=(12,6))

# plot the terms in p
ax.plot(..., ...  , color="steelblue",label="Simulated probability")

#Plot the theoretical probability
plt.axhline(y = ..., color = 'r', linestyle = '-', label = "Theoretical probability")

# put labels on the axes and give the graph a title.
ax.set_title("Running Estimate of Probability of Heads", fontsize=20)
ax.set_xlabel("Number of Flips", fontsize=16)
ax.set_ylabel("Estimate of Probability", fontsize=16)
# fix the y-axis to be between 0 and 1:
ax.set_ylim(.4,.6)
# include a legend:
ax.legend()
# put a faded grid behind the graphic
ax.grid(True, alpha=0.25)


Notice that for very few flips the estimate of the probability is understandably poor.  But as the number of flips increases the estimate settles down to very close to the expected $0.5$. 

***

### Example 3- Drawing Cards

You randomly draw 2 cards (without replacement) from a standard 52-card deck.  What's the probability that you draw 2 hearts?


a).  Compute the probability by hand

b).  Write a simulation to verify your results.



**Simulation**:

In [None]:
# Create a 52 card deck (use string 'D2' to represent the 2 of Diamonds)


suits=['D','H','C','S']
ranks = ['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']


# Build a deck
deck = ...

print(deck)


In [None]:

def check_two_hearts(twocards):   
    """ Function to simulate a single draw of 2 cards (WITHOUT REPLACEMENT)
        and check if both draws are hearts."""
    ...
    
    return ...
       
    

Test your function:

In [None]:
check_two_hearts(['DA','H8']) 

#assert check_two_hearts(['H7','S7']) == False


Notice: In this scenario we want to draw two cards **WITHOUT REPLACEMENT**


**OPTION 1:** 
`rng.choice(cards, replace=False, size=2)`


**OPTION 2:** `rng.shuffle(cards)`  and then choose the first 2 in the shuffled list




 

In [None]:
print(deck[:5])

rng.shuffle(deck)

print("Shuffled Deck", deck[:5])

In [None]:

def single_draw(deck):

    """ Function that takes in a card deck, randomly draws 2 cards without replacement
    and checks if both are hearts. Returns a Boolean. """

    rng.shuffle(deck)
    first_two=deck[:2]
    #print(first_two)
    
    return check_two_hearts(first_two)
  

out = single_draw(deck)
out

In [None]:
def probability_of_two_hearts(deck, num_sim):
    
    """Function that takes in a card deck and conducts num_sim simulations of drawing 2 cards (without replacement)
    and checking if they're both hearts. 
    Returns simulated probability out of num_sim simulations as well as a list with running probabilities."""
    
    ...
    
    return ...




This is very close to the actual answer we analytically derived in part (a).

In [None]:
# Write code to plot the running probability:

x = np.arange(1, 10001)

prob, running_frac = probability_of_two_hearts(deck, num_sim=10000)


print("Simulated probability", prob)

# Plot running estimate of probability of getting two hearts as num_trials gets larger:
fig, ax = plt.subplots(figsize=(12,6))

# plot the terms in p.   Use matplotlib's plt.plot function


ax.plot(..., ..., color="steelblue",label="Simulated probability")

#Plot the theoretical probability
plt.axhline(y = ..., color = 'r', linestyle = '-', label = "Theoretical probability")

# put labels on the axes and give the graph a title.
ax.set_title("Running Estimate of Probability of Two Hearts", fontsize=20)
ax.set_xlabel("Number of Draws", fontsize=16)
ax.set_ylabel("Estimate of Probability", fontsize=16)
# fix the y-axis to be between 0 and 1:
ax.set_ylim(0,0.1)
# include a legend:
ax.legend()
# put a faded grid behind the graphic
ax.grid(True, alpha=0.25)

*** 

### Example 4- Conditional Probabilities with Dice


Suppose you roll a fair die two times.  Let $A$ be the event "the sum of the throws equals 4" and $B$ be the event "at least one of the throws is a $3$"

a).   Compute (by hand) the probability that the sum of the throws equals 4 _given_ that at least one of the throws is a 3.  That is, compute $P(A \mid B)$. 

b).  Write a simulation to verify your results.

**Simulation**:


**Part B**: Write code to simulate the the conditional probability $P(A \mid B)$. **Hint**: Think about the definition of conditional probability.

*Hint:  the Numpy methods `np.logical_or` and `np.logical_and` are potentially useful.*

In [None]:
def simulateA_given_B(num_samples):
    """Write code that conducts num_samples simulations to estimate 
    the probability that when rolling two die, the sume of the throws equals 4 
    given that at least one of the throws is a 3.  Return the estimated probability"""
    
    ...
    
    
    
    
    return ...
                            
    
cond_prob = simulateA_given_B(1000)
    
print(cond_prob)                            
    