### Simulations
Simulations are a class of computational algorithms that use the relatively simple idea of random sampling to solve increasingly complex problems. Although they have been around for ages, they have gained in popularity recently due to the rise in computational power and have seen applications in multiple domains including Artificial Intelligence, Physics, Computational Biology and Finance just to name a few. Students will use simulations to generate and analyze data over different probability distributions using the important NumPy package. This course will give students hands-on experience with simulations using simple, real-world applications.



### Introduction to random variables

A random variable is a quantity that can take on multiple values based on random chance. When the variable can take on infinitely many values, it's called a continuous random variable. Think about the height of a person. Although the height lies within some reasonable limits on average, the actual value could have infinite possibilities in that interval. That is why we term it as a continuous random variable.

Similarly, if the variable can only take a finite set of values, it is called a discrete random variable. The roll of a six-sided die can have only one of six possible outcomes and is thus, considered a discrete random variable. Next, let's look at probability distributions.

### Probability distributions

A probability distribution is a mapping from the set of possible outcomes of a random variable to the probability of observing that outcome. It tells you how likely you are to observe a given outcome or a set of outcomes. Just like random variables, probability distributions are either discrete or continuous depending on the type of random variable they represent. For continuous random variables, the distribution is represented by a probability density function and probability is typically defined over an interval. The normal distribution is an example of a continuous distribution.

For discrete random variables, the distribution is represented by a probability mass function and probability can be defined at a single point or over an interval. Among discrete distributions, binomial and Poisson distributions are widely used. Python's numpy random module is a robust and flexible tool that lets us work with random variables.


In [1]:
import numpy as np

In [8]:
np.random.choice([1, 2, 3, 4, 5])

5

### Poisson distribution
Using np.random.poisson() draw samples from a Poisson distribution using lam (lambda) and size_1.
Repeat the above step, but this time use size_2.
For each of the above samples, calculate the absolute difference between their mean and lambda using np.mean() and abs().

In [10]:
# Initialize seed and parameters
np.random.seed(123) 
lam, size_1, size_2 = 5, 3, 1000  

# Draw samples & calculate absolute difference between lambda and sample mean
samples_1 = np.random.poisson(lam, size_1)
samples_2 = np.random.poisson(lam, size_2)
answer_1 = abs(samples_1.mean() - lam)
answer_2 = abs(samples_2.mean() - lam) 

print("|Lambda - sample mean| with {} samples is {} and with {} samples is {}. ".format(size_1, answer_1, size_2, answer_2))

|Lambda - sample mean| with 3 samples is 0.33333333333333304 and with 1000 samples is 0.07699999999999996. 


In [11]:
deck_of_cards = [('Diamond', 9), ('Spade', 9), ('Spade', 4), ('Club', 11), ('Club', 5), ('Heart', 11), ('Club', 8), ('Club', 0), 
                 ('Diamond', 4), ('Heart', 0), ('Heart', 5), ('Heart', 10), ('Spade', 10), ('Heart', 8), ('Heart', 12), ('Spade', 5),
                 ('Spade', 11), ('Heart', 1), ('Heart', 6), ('Spade', 3), ('Diamond', 1), ('Spade', 0), ('Diamond', 5), ('Club', 2), 
                 ('Spade', 1), ('Diamond', 10), ('Heart', 7), ('Club', 7), ('Diamond', 11), ('Heart', 3), ('Club', 10), ('Diamond', 7), 
                 ('Heart', 4), ('Club', 3), ('Diamond', 8), ('Club', 1), ('Diamond', 12), ('Club', 12), ('Diamond', 0), ('Diamond', 2), 
                 ('Heart', 9), ('Spade', 6), ('Spade', 7), ('Club', 9), ('Diamond', 3), ('Club', 6), ('Club', 4), ('Spade', 12), 
                 ('Spade', 8), ('Spade', 2), ('Heart', 2), ('Diamond', 6)]


In [12]:
# Shuffle the deck
np.random.shuffle(deck_of_cards) 

# Print out the top three cards
card_choices_after_shuffle = deck_of_cards[0:3]
print(card_choices_after_shuffle)

[('Club', 12), ('Heart', 5), ('Heart', 9)]


Simulation is a framework that allows us to model real-world systems and processes. It's a very popular tool that has been applied in multiple domains. Simulations are typically characterized by repeated random sampling, which means that we use the power of random variables to generate multiple outcomes.

Think of simulations as tossing a coin again and again to record the outcomes. Simulations typically give us an approximate solution. After recording the result of the coin tosses, you may find that after 100 tosses, you actually observe only 48 heads as opposed to 50. This is still a good enough approximation.

Finally, as we'll see later in the course, using very simple modeling techniques, we can use simulations to solve pretty complex problems. Simulations perform particularly well in some areas where traditional methods don't give us a clean solution.

### Simulation steps
Simulations typically involves the following steps. 1) Define the set of outcomes associated with a random variable. 2) Assign a probability to each of these outcomes - the probability distribution. 3) Define the relationship between multiple random variables. These three steps essentially describe our statistical model.

Draw samples from the probability distributions. 5) Analyze the sample outcomes. This might seem daunting at first, but we'll work through an example in the exercises to make it more clear.

Simulating the dice game
In the following exercises, you will be running your first simulation - a simple dice game. The dice game involves throwing two dice and winning if they show the same number. Thus, seeing 1 & 4 is a loss while 3 & 3 is a win. Let's see what we need for this simulation. For step 1 and 2 we first define the outcomes of the die and assign a probability to each outcome. Since both die A and B are fair dice, we can use identical probability distributions. Also since the probability of seeing each outcome is the same, this is a uniform distribution.

<img src = simgame.png width ='800' height = '600'>

8. Simulating the dice game
In step 3, we define the relationship between each of the dice. If they show the same number, we win, otherwise we lose. Steps 1, 2, and 3 give us the statistical model underlying the simulation. For any complex simulation, this is essentially the foundation - describing the model.

9. Simulating the dice game
Finally in step 4, we generate multiple outcomes through repeated random sampling. Keep in mind that there is an additional step 5 where we'll need to analyze the outcomes but for now, let's focus on the first 4 steps and make sure we are comfortable with them.

In [13]:
die1 = np.random.choice([1, 2, 3, 4, 5, 6])
die2 = np.random.choice([1, 2, 3, 4, 5, 6])

In [46]:
sim_range,counter = 100000, 0
for i in range(1, sim_range):
    die1 = np.random.choice([1, 2, 3, 4, 5, 6])
    die2 = np.random.choice([1, 2, 3, 4, 5, 6])
    if die1 == die2: 
        counter +=1
        
        
         

In [47]:
counter/sim_range

0.16505

In [48]:
### About 16% of probability

### Another implementation 

In [52]:
# Initialize model parameters & simulate dice throw
die, probabilities, num_dice = [1,2,3,4,5,6], [1/6, 1/6, 1/6, 1/6, 1/6, 1/6], 2
sims, wins = 100000, 0

for i in range(sims):
    outcomes = np.random.choice(die, size=num_dice, p=probabilities) 
    # Increment `wins` by 1 if the dice show same number
    if outcomes[0] == outcomes[1]:
        wins = wins + 1

print("In {} games, you win {} times".format(sims, wins))

In 100000 games, you win 16673 times


In [53]:
outcomes = np.random.choice(die, size=num_dice, p=probabilities)
outcomes

array([2, 1])

### Generalized version of a simple simulation workflow. 
- Here is a more generalized version of the simple simulation workflow you used in the previous exercise :
  
   -  Steps 1 and 2 involve defining the outcomes for random variables and assigning probabilities. We could also have constants   entering the model in this step.
   -  Step 3 involves defining the relationship between model parameters.
   -  In step 4, we repeatedly sample from the distributions of A & B to generate outcomes. Finally in
   -  In step 5, we analyze outcomes.


- Simulations let us ask nuanced questions of the model, questions which might not necessarily have easy analytical solutions. This is
because we can easily modify model inputs and see how the final outcomes are affected. For example, we might want to ask how the 
outcome will change if the probability distribution of B changes. In this case, we would just simulate a new set of outcomes after 
changing B. Let's see what happens if we do that.


- We might find that as a result of the change in B, the distribution of outcomes has changed significantly. This might help us decide, 
for instance, that we should not have a particular stock in our portfolio. It helps us see how sensitive out model is to changes in B.


- Another case where we use simulations is when we might want to see what values of a particular input get us the desired output. For 
instance, we might want to find the lowest value of the constant C for which the mean of the outcome distribution is 5.


- In this case, all we need to do is just iteratively keep changing the value of the constant C and record the outcomes. We will do this 
many times and keep recording the outcomes.


- Finally, we can choose the value of C where the mean of the outcome distribution is 5. Using simulation for making decisions is as 
easy as that! Such a simulation could, for example, help us decide the right price at which it is 
still profitable to invest in a loan.

### Using simulation for decision making

<img src = 'simulations.png' width = "900" height = "700">

### Simulating one lottery drawing
In the last three exercises of this chapter, we will be bringing together everything you've learned so far. We will run a complete simulation, take a decision based on our observed outcomes, and learn to modify inputs to the simulation model.

We will use simulations to figure out whether or not we want to buy a lottery ticket. Suppose you have the opportunity to buy a lottery ticket which gives you a shot at a grand prize of $10,000. 

Since there are 1000 tickets in total, your probability of winning is 1 in 1000. 
Each ticket costs $10. Let's use our understanding of basic simulations to first simulate one drawing of the lottery