## Learning Objectives

We will learn how to make a random variable from a real world situation. We also will learn how to use random varibles in simulation (but we will see more on this in foundations 4)

## Step away from the Whiteboard

This part is much easier, now that we have done the hard foundational stuff, I wanted to take a small lesson and teach you how to make a random variable. So let's get started.

To start off with let's make a random variable that represents drawing a card from a deck, let's say queens. The situation will be: we shuffle a deck of 52 cards and draw one off the top and we will be happy if we draw a queen. So let's make this guy:

In [2]:
import numpy as np

def queens():
    # queens are the 40th to the 44th cards in a deck
    # so first let's simulate drawing a card from the deck:
    card_number = np.random.randint(1,53)
    
    # and if the card number is such that it is the queen's number
    # we will return: 'Got a Queen!"
    # otherwise we will return: 'Not a Queen..."
    if card_number >= 40 and card_number < 44:
        return 'Got a Queen!'
    else:
        return 'Not a Queen!'
    
    # This is not a random variable because this does not return a number

Done! We can sample from the above by calling it:

In [3]:
queens()

'Not a Queen!'

In [4]:
queens()

'Not a Queen!'

But is this a random variable? Not quite. A random variable can only return numbers, so one easy way of transforming the above into a number is a 1 for success and a 0 for no queen:

In [5]:
import numpy as np

def queens_rv():
    card_number = np.random.randint(1,53)
    if card_number >= 40 and card_number < 44:
        return 1
    else:
        return 0

In [6]:
[queens_rv() for _ in range(12)]

#Draw 12 cards; how mnay of these are Queens

[0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0]

In [15]:
np.mean([queens_rv() for _ in range(9999)])

0.07910791079107911

## Not so useful...

Well the above is not super useful right? Perhaps if you are playing a game and need to do this type of action a thousand times, then yes it might be useful, but outside of games making a r.v. seems pretty useless right?

Well let's try a more useful example.

Let's say you were worked on a sales team. Let's say that each year your team reaches out to 500 customers and that you make a sale to that customer 15% of the time. Let's do one more thing here. Let's say you have two products: an expensive one that costs \$1,000K and a cheap one that costs \$5K. And let's say that 1% of your sales are expensive and the rest cheap. Your boss then asks you to project your earnings for next year. What do you do?

Well this time we will cover the hard part, that is transforming your current situation into a random variable, that way you can take some samples and get a good sense of what your sales might be. So let's make this a r.v.

In [16]:
def sales_rv():
    sales = 0
    # 500 customers
    for _ in range(500):
        # 15% chance of success
        if np.random.rand() < .15: # this is a rate that has been historically observed
            # 1% chance of a big sale
            if np.random.rand() < .01: # this is a rate that has been historically observed
                sales += 1000
            else:
                sales += 5
    return sales

In [19]:
sales_rv()

1420

In [18]:
sales_rv()

310

## What is missing

Well you may now be thinking: this is great! I can totally do sales forecasting now, but which sample do I use? If only there were other ways to extract information from my random variable. I don't get to see the distribution, or ask what the mean is, or anything like what we were doing on the previous data. If only we could...

Well all I'll say is wait till next time.

Gather .describe() on a sample


## Learning Objectives

We will learn how to make a random variable from a real world situation. We also will learn how to use random varibles in simulation (but we will see more on this in foundations 4)

## Comprehension Questions

1.	How do you make a random variable?
    1. Create a function with:
        1. Some Range or starting point
        1. Some probability of arriving within an ideal range
        1. Return a number
        
2.	Why is it useful to make a random variable?
    1. This describes the population and allows us to generate imaginary samples so we can aggregate and draw conclusions
    
3.	Are there any problems with simulating future sales with random variables?
    1. Conversion rates are in flux
    1. Lots of variation is possible in the future
    1. Making big assumptions that sales will happen in a certain way
    
4.	How would you make a random variable that could help you determine how many jobs you should apply for?
    1. Determine threshold you're willing to accept
    1. Historical/external data on probability that there is a chance of success
    
    
5.	How would you use a random variable to determine the amount you should pay for a car?
    1. 

6.	Why are simulations a good/bad tool for decision making?


In [29]:
# job success = .08

def job_apps_rv():
    job = 0
    if np.random.rand() < .08: # this is a rate that has been historically observed
        job += 1
    else:
        job += 0
    return job

In [49]:
job_apps_rv()

0

In [56]:
def hydra_dead_rv():
    hydra_heads = 3
    if np.random.rand() < .25: # probability head grows back
        if np.random.rand() < .25: # probablility 2 heads grow back
            hydra_heads += 2
        elif np.random.rand() < .5:
            hydra_heads += 1
    return hydra_heads()
            
            
        

In [57]:
hydra_dead_rv()

TypeError: 'int' object is not callable

In [None]:
[hydra_dead_rv() for _ in range(12)]

In [None]:
np.mean([hydra_dead_rv for _ in range(12)])b

In [60]:
def knight_iter():
    heads = 3
    swing = np.random.randint(r)
    
    # print(str(swing))
    while heads < 5:
        if heads <= 0:
            return 1
    
        swing = np.random.randomint(4)
    
        if (swing == 0) or (swing==1):
            heads -= 1
        elif swing == 2:
            heads += 1
        else:
            pass # Number of heads stays the same
    return 0
    

In [61]:
outcomes = [knight_iter() for _ in range (1000000000)]
np.mean(outcomes)

NameError: name 'r' is not defined