## Simulation exercises

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(50)

### 1. How likely is it that you roll doubles when rolling two dice?

In [6]:
n_trials = 10**5
n_rolls = 2

rolls = np.random.choice([1, 2, 3, 4, 5, 6], n_trials * n_rolls).reshape(n_trials, n_rolls)
rolls[0:5]

array([[1, 3],
       [4, 1],
       [6, 1],
       [2, 2],
       [4, 1]])

In [7]:
(pd.DataFrame(rolls)[0] == pd.DataFrame(rolls)[1]).mean()

0.16756

### 2. If you flip 8 coins, what is the probability of getting exactly 3 heads? What is the probability of getting more than 3 heads?

In [38]:
n_flips = columns = 8
n_trials = rows = 10**5
all_flips = np.random.choice(['Heads', 'Tails'], size = (10**5, 8))
all_flips = pd.DataFrame(all_flips)

In [49]:
all_flips["heads_count"] = (all_flips == "Heads").sum(axis=1)
all_flips.head()

Unnamed: 0,0,1,2,3,4,5,6,7,heads_count
0,Heads,Heads,Tails,Heads,Heads,Heads,Heads,Heads,7
1,Tails,Heads,Heads,Heads,Tails,Heads,Tails,Tails,4
2,Tails,Heads,Heads,Tails,Heads,Tails,Tails,Heads,4
3,Tails,Tails,Heads,Heads,Heads,Heads,Tails,Tails,4
4,Tails,Tails,Tails,Tails,Heads,Tails,Tails,Tails,1


In [50]:
# The probability of getting exactly 3 heads is
(all_flips.heads_count == 3).mean()

0.22054

In [51]:
# The probability of getting more than 3 heads is
(all_flips.heads_count > 3).mean()

0.63452

#### 3. There are approximitely 3 web development cohorts for every 1 data science cohort at Codeup. Assuming that Codeup randomly selects an alumni to put on a billboard, what are the odds that the two billboards I drive past both have data science students on them?

### Example with one pick

In [62]:
n_picks = columns = 2
n_trials = rows = 1

In [64]:
picks = np.random.choice(["WD", "DS"], p=[.75, .25], size=(n_trials, n_picks))
picks = pd.DataFrame(picks)
picks

Unnamed: 0,0,1
0,WD,WD


In [69]:
picks["DS_picks"] = (picks == "DS").sum(axis=1)
(picks.DS_picks == 2).mean()

0.0624879

### Example with 10,000,000 picks

In [2]:
n_picks = columns = 2
n_trials = rows = 10**7

In [3]:
picks = np.random.choice(["WD", "DS"], p=[.75, .25], size=(n_trials, n_picks))
picks = pd.DataFrame(picks)
picks.head()

Unnamed: 0,0,1
0,WD,WD
1,WD,WD
2,WD,DS
3,WD,DS
4,DS,WD


In [4]:
picks["DS_picks"] = (picks == "DS").sum(axis=1)
(picks.DS_picks == 2).mean()
# The probability of seeing a DS student on both billboards is super low
# no matter the number of picks (rows) 1 or 10 millions

0.062521

#### 4. Codeup students buy, on average, 3 poptart packages with a standard deviation of 1.5 a day from the snack vending machine. If on monday the machine is restocked with 17 poptart packages, how likely is it that I will be able to buy some poptarts on Friday afternoon? (Remember, if you have mean and standard deviation, use the np.random.normal)

In [37]:
# Likelyhood of getting poptarts Friday afternoon
n_rows = 10**5
n_cols = 5
mean = 3
stdev = 1.5
num_pt = np.random.normal(mean, stdev, size=(n_rows, n_cols))
num_pt = pd.DataFrame(num_pt, columns=["Mon", "Tues", "Wed", "Thur", "Fri"])
num_pt.head()

Unnamed: 0,Mon,Tues,Wed,Thur,Fri
0,4.06489,1.18578,2.084999,4.243944,4.030554
1,1.721323,1.749108,2.232749,1.578581,4.308003
2,2.611411,3.228945,2.942351,6.566066,2.739899
3,4.319741,2.353475,3.461649,3.581399,2.380937
4,1.236008,3.526783,4.324256,6.84769,2.720869


In [38]:
num_pt = num_pt.round().head()
num_pt

Unnamed: 0,Mon,Tues,Wed,Thur,Fri
0,4.0,1.0,2.0,4.0,4.0
1,2.0,2.0,2.0,2.0,4.0
2,3.0,3.0,3.0,7.0,3.0
3,4.0,2.0,3.0,4.0,2.0
4,1.0,4.0,4.0,7.0,3.0


In [43]:
# The likelihood of getting poptarts Friday afternoon
num_pt['week_pt'] = num_pt.sum(axis='columns')
(num_pt['week_pt'] < 17).mean()
# Sad

0.0

#### 5. Compare Heights

- Men have an average height of 178 cm and standard deviation of 8cm.
- Women have a mean of 170, sd = 6cm. Since you have means and standard deviations, you can use np.random.normal to generate observations.
- If a man and woman are chosen at random, what is the likelihood the woman is taller than the man?

In [46]:
m_height = np.random.normal(178, 8, size = 10**5)
f_height = np.random.normal(170, 6, size = 10**5)

In [47]:
# The likelihood that the picked woman is taller than the man
(f_height > m_height).mean()

0.21401

#### 6. When installing anaconda on a student's computer, there's a 1 in 250 chance that the download is corrupted and the installation fails.
- What are the odds that after having 50 students download anaconda, no one has an installation issue? 100 students?

- What is the probability that we observe an installation issue within the first 150 students that download anaconda?

- How likely is it that 450 students all download anaconda without an issue?

In [71]:
n_downloads = columns = 50
n_trials = 10**5
fail_p = fail_probability = 1/250
s_prob = success_probability = 249/250

downloads = np.random.choice([0, 1], p = [fail_p, s_prob], size = (n_trials, n_downloads))
downloads

array([[1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       ...,
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1]])

In [72]:
# The probability that there is not an issue in 50 downloads
(downloads.sum(axis=1) == 50).mean()

0.81933

In [75]:
n_downloads = columns = 100
n_trials = rows = 10**5
fail_p = fail_probability = 1/250
s_prob = success_probability = 249/250

downloads = np.random.choice([0, 1], p = [fail_p, s_prob], size = (n_trials, n_downloads))
downloads

array([[1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       ...,
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1]])

In [76]:
# The probability that there is not an issue in 100 downloads
(downloads.sum(axis=1) == 100).mean()

0.67068

#### 7. There's a 70% chance on any given day that there will be at least one food truck at Travis Park. However, you haven't seen a food truck there in 3 days.
- How unlikely is this?
- How likely is it that a food truck will show up sometime this week?

In [88]:
n_trials = rows = 10**5
n_obs = columns = 3 
fail_p = fail_probability = 3/10
s_prob = success_probability = 7/10

truck_obs = np.random.choice([0, 1], p=[fail_p, s_prob], size=(n_rows, n_cols))
truck_obs = pd.DataFrame(truck_obs)
truck_obs.head()

Unnamed: 0,0,1,2,3,4
0,1,1,1,1,1
1,0,1,1,1,1
2,0,1,1,1,0
3,1,1,0,1,1
4,1,0,0,1,1


In [90]:
# Probability of the trucks not being there
(truck_obs.sum(axis=1) == 0).mean()
# It is very unlikely that the trucks do not show up at Travis Park

0.00241

In [91]:
# Probability that the the trucks show up for the rest of the week (4 days left in the week)
n_trials = rows = 10**5
n_obs = columns = 4 
fail_p = fail_probability = 3/10
s_prob = success_probability = 7/10

truck_obs = np.random.choice([0, 1], p=[fail_p, s_prob], size=(n_rows, n_cols))
truck_obs = pd.DataFrame(truck_obs)
truck_obs.head()

Unnamed: 0,0,1,2,3,4
0,0,1,1,0,1
1,1,1,1,0,0
2,1,1,0,1,0
3,1,1,1,1,0
4,1,1,1,1,1


In [94]:
(truck_obs.sum(axis=1) > 0).mean()
# It is more likely that they come back the rest of the week

0.99763

#### 8. If 23 people are in the same room, what are the odds that two of them share a birthday?
- What if it's 20 people?
- 40?

In [101]:
n_trials = rows = 10**5
n_obs = columns = 23

birthdays = np.random.choice(range(365), size=(n_trials, n_obs))
birthdays = pd.DataFrame(birthdays)
birthdays["b_unique"] = birthdays.nunique(axis=1)

(birthdays.b_unique != 23).mean()
# The odds that 2 people share a birthday in a room containing 23 people

0.50839

In [102]:
# The odds that 2 people share a birthday in a room containing 20 people
n_trials = rows = 10**5
n_obs = columns = 20

birthdays = np.random.choice(range(365), size=(n_trials, n_obs))
birthdays = pd.DataFrame(birthdays)
birthdays["b_unique"] = birthdays.nunique(axis=1)

(birthdays.b_unique != 20).mean()

0.40984

In [103]:
# The odds that 2 people share a birthday in a room containing 40 people
n_trials = rows = 10**5
n_obs = columns = 20

birthdays = np.random.choice(range(365), size=(n_trials, n_obs))
birthdays = pd.DataFrame(birthdays)
birthdays["b_unique"] = birthdays.nunique(axis=1)

(birthdays.b_unique != 40).mean()

1.0

## Bonus exercises

Let's use what we've learned to play a mage duel!

Imagine your wizard has 6d4 health points and you have spells that do 6d4 damage. "6d4" means rolling six 4-sided dice and summing the result.

Your opposing mage has 4d6 health points and spells that do 4d6. "4d6" means rolling four six-sided dice and summing the result.

### Exercises
Simulate mage duels to answer who is the more powerful mage.

- Before running simulations, do you have a hypothesis of which mage will win? Do you have a hunch? Write it down. This is your first exercise.
- Simulate 10 mage duels. Is there a clear winner? Run that 10 duel simulation again. Was the answer similar?
- Do the results change much at 100 duels?
- Now, simulate 10,000 mage duels. Is there a clear winner?

#### a. I feel like the second mage has more potential to win but it also feel like they could be tied. But I don't feel like the first one has more chances to win

#### b. Simulate 10 mage duels. Is there a clear winner? Run that 10 duel simulation again. Was the answer similar?

In [12]:
n_rolls = 6
n_trials = 10
rolls_mage1 = np.random.choice([1, 2, 3, 4], n_trials * n_rolls).reshape(n_trials, n_rolls)
rolls_mage1.sum()

165

In [13]:
n_rolls = 4
n_trials = 10
rolls_mage2 = np.random.choice([1, 2, 3, 4, 5, 6], n_trials * n_rolls).reshape(n_trials, n_rolls)
rolls_mage2.sum()

136

- The first round of ten was won by the first mage 158 -150
- the second round as well 165 - 130

In [16]:
n_rolls = 6
n_trials = 10**4
rolls_mage1 = np.random.choice([1, 2, 3, 4], n_trials * n_rolls).reshape(n_trials, n_rolls)
rolls_mage1.sum()

149936

In [17]:
n_rolls = 4
n_trials = 10**4
rolls_mage2 = np.random.choice([1, 2, 3, 4, 5, 6], n_trials * n_rolls).reshape(n_trials, n_rolls)
rolls_mage2.sum()

140595

- The results don't change for 100 trials 1538 - 1457
- For the last one, at 10,000 rounds the first mage still wins 149936 - 140595