In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Topics Covered 

* Basics of randomness and simulation
* simulation and probability
* bootstrapping and resampling methods
* advanced applications of simulation


# Random variables
* continuous random variables
    * infinitely many possible values
    * e.g. height, weight, etc.
* discrete random variable
    * finite set of possible values
    * e.g. outcome of a six sided die
    
# probability distributions
* continuous probability distributions
    * PDF: mapping of the set of possible outcomes of a random variable vs the probability of observing that outcome
        * continuous variables
    * PMF: (probability mass function) does the same as a PDF, however, it is more of a barplot where each bar represents the number of observed outcomes
        * discrete variables
        * binomial and poisson distributions are widely used for discrete variables
        
# Python Random Module
np.random.choice(a, size=None, replace=True, p=None)

Parameters
----------
a : 1-D array-like or int
    If an ndarray, a random sample is generated from its elements.
    If an int, the random sample is generated as if a were np.arange(a)
    <br>
    <br>
size : int or tuple of ints, optional
    Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
    ``m * n * k`` samples are drawn.  Default is None, in which case a
    single value is returned.
    <br><br>
replace : boolean, optional
    Whether the sample is with or without replacement
    <br><br>
p : 1-D array-like, optional
    The probabilities associated with each entry in a.
    If not given the sample assumes a uniform distribution over all
    entries in a.


# Poisson distributions
* used for modeling the average rate at which events occur
* purpose of this exercise: 
    * draw samples from poisson distribution
    * see how the sample mean changes as you draw more samples

poisson(lam=1.0, size=None)

Draw samples from a Poisson distribution.
<br><br>
The Poisson distribution is the limit of the binomial distribution
for large N.
<br><br>
.. note::
    New code should use the ``poisson`` method of a ``default_rng()``
    instance instead; please see the :ref:`random-quick-start`.
<br><br>
Parameters
----------
lam : float or array_like of floats
    Expectation of interval, must be >= 0. A sequence of expectation
    intervals must be broadcastable over the requested size.<br><br>
size : int or tuple of ints, optional
    Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
    ``m * n * k`` samples are drawn.  If size is ``None`` (default),
    a single value is returned if ``lam`` is a scalar. Otherwise,
    ``np.array(lam).size`` samples are drawn.<br><br>

Returns
-------<br><br>
out : ndarray or scalar
    Drawn samples from the parameterized Poisson distribution.<br><br>
Notes
-----
The Poisson distribution
<br><br>
$\ f(k; \lambda)=\frac{\lambda^k e^{-\lambda}}{k!}$
<br><br>
For events with an expected separation :math:`\lambda` the Poisson
distribution :math:`f(k; \lambda)` describes the probability of
:math:`k` events occurring within the observed
interval :math:`\lambda`.
<br><br>
Because the output is limited to the range of the C int64 type, a
ValueError is raised when `lam` is within 10 sigma of the maximum
representable value.

In [7]:
# Initialize seed and parameters
np.random.seed(123) 
lam, size_1, size_2 = 5, 3, 1000  

# Draw samples & calculate absolute difference between lambda and sample mean
samples_1 = np.random.poisson(lam, size_1)
samples_2 = np.random.poisson(lam, size_2)
answer_1 = abs(lam-np.mean(samples_1))
answer_2 = abs(lam-np.mean(samples_2)) 

print("|Lambda - sample mean| with {} samples is {} and with {} samples is {}. ".format(size_1, answer_1, size_2, answer_2))

|Lambda - sample mean| with 3 samples is 0.33333333333333304 and with 1000 samples is 0.07699999999999996. 


In [8]:
deck_of_cards= [('Heart', 0),
 ('Heart', 1),
 ('Heart', 2),
 ('Heart', 3),
 ('Heart', 4),
 ('Heart', 5),
 ('Heart', 6),
 ('Heart', 7),
 ('Heart', 8),
 ('Heart', 9),
 ('Heart', 10),
 ('Heart', 11),
 ('Heart', 12),
 ('Club', 0),
 ('Club', 1),
 ('Club', 2),
 ('Club', 3),
 ('Club', 4),
 ('Club', 5),
 ('Club', 6),
 ('Club', 7),
 ('Club', 8),
 ('Club', 9),
 ('Club', 10),
 ('Club', 11),
 ('Club', 12),
 ('Spade', 0),
 ('Spade', 1),
 ('Spade', 2),
 ('Spade', 3),
 ('Spade', 4),
 ('Spade', 5),
 ('Spade', 6),
 ('Spade', 7),
 ('Spade', 8),
 ('Spade', 9),
 ('Spade', 10),
 ('Spade', 11),
 ('Spade', 12),
 ('Diamond', 0),
 ('Diamond', 1),
 ('Diamond', 2),
 ('Diamond', 3),
 ('Diamond', 4),
 ('Diamond', 5),
 ('Diamond', 6),
 ('Diamond', 7),
 ('Diamond', 8),
 ('Diamond', 9),
 ('Diamond', 10),
 ('Diamond', 11),
 ('Diamond', 12)]

# Shuffling a deck of cards

Often times we are interested in randomizing the order of a set of items. Consider a game of cards where you first shuffle the deck of cards or a game of scrabble where the letters are first mixed in a bag. As the final exercise of this section, you will learn another useful function - np.random.shuffle(). This function allows you to randomly shuffle a sequence in place. At the end of this exercise, you will know how to shuffle a deck of cards or any sequence of items.

Examine deck_of_cards in the shell. 

In [11]:
# Shuffle the deck
np.random.shuffle(deck_of_cards)

# Print out the top three cards
card_choices_after_shuffle = deck_of_cards[0:3]
print(card_choices_after_shuffle)

[('Spade', 2), ('Diamond', 11), ('Diamond', 6)]


# Simulation Basics
* Framework for modeling real world events
    * characterized by repeated random sampling
    * gives us an approximate solution
    * can help solve complex problems

* simulation steps
    1. define the possible outcomes for random variables
    2. assign probabilities, the probability distribution
    3. define the relationship between multiple random variables
    4. draw samples from the probability distributions
    5. analyze the sample outcomes

* 

# throwing a fair die 

In [18]:
# Define die outcomes and probabilities
die, probabilities, throws = [1,2,3,4,5,6], [1/6, 1/6,1/6, 1/6,1/6, 1/6], 1

# Use np.random.choice to throw the die once and record the outcome
outcome = np.random.choice(die, size=1, p=probabilities)
print("Outcome of the throw: {}".format(outcome[0]))

Outcome of the throw: 2


# throwing 2 fair die and checking if we the same number on each die 

In [19]:
# Initialize number of dice, simulate & record outcome
die, probabilities, num_dice = [1,2,3,4,5,6], [1/6, 1/6, 1/6, 1/6, 1/6, 1/6], 2
outcomes = np.random.choice(die, size=2, p=probabilities) 

# Win if the two dice show the same number
if outcomes[0] == outcomes[1]: 
    answer = 'win' 
else:
    answer = 'lose'

print("The dice show {} and {}. You {}!".format(outcomes[0], outcomes[1], answer))

The dice show 5 and 4. You lose!


# Simulating the dice game

We now know how to implement the first three steps of a simulation. Now let's consider the next step - repeated random sampling.

Simulating an outcome once doesn't tell us much about how often we can expect to see that outcome. In the case of the dice game from the previous exercise, it's great that we won once. But suppose we want to see how many times we can expect to win if we played this game multiple times, we need to repeat the random sampling process many times. Repeating the process of random sampling is helpful to understand and visualize inherent uncertainty and deciding next steps. 

In [25]:
# Initialize model parameters & simulate dice throw
die, probabilities, num_dice = [1,2,3,4,5,6], [1/6, 1/6, 1/6, 1/6, 1/6, 1/6], 2
sims, wins = 100, 0

for i in range(sims):
    outcomes = np.random.choice(die, size=2, p=probabilities) 
    # Increment `wins` by 1 if the dice show same number
    if  outcomes[0] == outcomes[1]: 
        wins = wins + 1 

print("In {} games, you win {} times".format(sims, wins))

In 100 games, you win 19 times


# <hr> 

# Probability Basics
