# Lesson 2.01 Probability

## Probability Problems

We often interpret probability like frequency.
- If I run an experiment over and over again and one event (call it $A$) occurs frequently, we might say that $P(A)$ is quite high.
- If I run an experiment over and over again and one outcome $A$ occurs infrequently, we might say that the probability of $A$ is low.

We can make this idea a bit more formal by assuming we can repeat an experiment a theoretically infinite number of times. Written out mathematically, this is:

$$
P(A) = \lim_{n \rightarrow \infty} \frac{\text{number of times A occurs}}{n}
$$

If you're not familiar with limits, that's okay! 
- The idea is that while we can't actually run the experiment an infinite number of times, if we ran the experiment 1,000 times, then 1,000,000 times, then 1,000,000,000 times, as we get closer to an infinite number of experiments, can we get an understanding of what $P(A)$ is?
- Limits are fundamentally important to *how* lots of machine learning and statistics work, but we're almost always able to do our work without getting into the weeds.

In many cases, we can find probabilities exactly by hand... but that quickly gets complicated. Instead, let's *estimate* $P(A)$ by leveraging Python to run some large number of experiments and seeing how frequently $A$ occurs.

For example, if I am rolling one die and my event $A$ is rolling a 6, I want to use Python to "roll my die" many times and count how frequently I roll a 6 compared to how many times I rolled my die.

Mathematically, we are estimating the probability of $A$ as:

$$
P(A) \approx \frac{\text{number of times A occurs}}{n}
$$

If we "run our experiment" for some large number of trials $n$, then our estimated probability should be pretty close to the true probability!

In [1]:
import numpy as np

### Problem 1: Suppose I roll one die. What is the probability of rolling an odd number?

In this case, I want to estimate $P(A)$, where $A$ is rolling an odd number.

In [2]:
dice = [1, 2, 3, 4, 5, 6]

In [3]:
np.random.choice(dice) # randomly generate one integer between 1 and 6

1

In [88]:
np.random.randint(1,7) # another way to simulate a die roll

6

In [5]:
np.random.seed(42) # set a seed so we can reproduce our results!

In [6]:
np.random.choice(dice) # randomly generate one integer between 1 and 6

4

In [7]:
count = 0                                # where we'll store our count
for i in range(10000):                   # let's run our experiment (roll one die) 10,000 times
    if np.random.choice(dice) % 2 != 0:  # if our dice value is not divisible by 2 (is odd)
        count += 1                       # then add 1 to our count

print(count / 10000)                     # print the number of times A occurs divided by n

0.498


In [8]:
def odd_roll(n):                            # define a function with one argument, n 
    count = 0                               # where we'll store our count
    for i in range(n):                      # let's run our experiment n times
        if np.random.choice(dice) % 2 != 0: # if our dice value is not divisible by 2 (is odd)
            count += 1                      # then add 1 to our count
    return count / n                        # return the number of times A occurs divided by n

In [9]:
odd_roll(10_000) # run our experiment 10,000 times

0.4981

In [10]:
odd_roll(100_000) # run our experiment 100,000 times

0.49956

In [11]:
odd_roll(1_000_000) # run our experiment 1,000,000 times

0.50076

### Problem 2: Suppose I roll two dice. What is the probability that their sum is an odd number?

In [12]:
def odd_two_rolls(n):
    
    # where we'll store our count
    count = 0
    
    # Run experiment n times.
    for i in range(n): 
        
        # Roll two dice; see if the sum is odd.
        if (np.random.choice(dice) + np.random.choice(dice)) % 2 != 0:
            
            # If the sum is odd, add one to count.
            count += 1
            
    # Return the number of times A occurs divided by n.
    return count / n

In [13]:
odd_two_rolls(10_000) # run our experiment 10,000 times

0.4894

### Problem 3: There are 12 red and 12 black balls. If you draw one ball, then a second ball without replacing the first, what is the probability that they are the same color?

In [14]:
# Set up bucket of 12 red balls and 12 black balls.
bag_of_balls = ['red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red',
                'black', 'black', 'black', 'black', 'black', 'black', 'black', 'black', 'black', 'black', 'black', 'black']

In [15]:
def same_color(n):
    
    # Set up counter to see how many successes we get.
    count = 0
    
    # Run experiment n times.
    for i in range(n):
        
        # Pull two balls from bucket *without* replacement.
        draws = np.random.choice(bag_of_balls, size=2, replace = False)
        
        # Check to see if the two chosen balls are the same.
        if draws[0] == draws[1]:
            count += 1
            
    # Evaluate probability.
    return count / n

In [16]:
same_color(10_000)

0.4795

### Problem 4: Suppose you roll three dice. What is the probability that the three dice are rolled in increasing order?

In [17]:
# What is dice again?
dice

[1, 2, 3, 4, 5, 6]

In [18]:
def three_dice(n):
    
    # Set up counter to see how many successes we get.
    count = 0
    
    # Run experiment n times.
    for i in range(n):
        
        # Roll first die.
        roll_1 = np.random.choice(dice)
        
        # Roll second die.
        roll_2 = np.random.choice(dice)
        
        # Roll third die.
        roll_3 = np.random.choice(dice)
        
        # Check to see if the rolls are in increasing order.
        if roll_1 < roll_2 and roll_2 < roll_3:
            count += 1
    
    # Return probability.
    return count / n

In [19]:
three_dice(1000000)

0.092256