## Simulation Exercises

https://ds.codeup.com/stats/simulation/

<h2 id="exercises">Exercises</h2>
<p>Within your <code>codeup-data-science</code> directory, create a directory named <code>statistics-exercises</code>. This will be where you do your work for this module. Create a repository on GitHub with the same name, and link your local repository to GitHub.</p>
<p>Do your work for this exercise in either a python file named <code>simulation.py</code> or a jupyter notebook named <code>simulation.ipynb</code>.</p>

In [2]:
%matplotlib inline
import numpy as np
import pandas as pd
import random

import viz # curriculum example visualizations

np.random.seed(29)

1. How likely is it that you roll doubles when rolling two dice?

In [5]:
ntrials = nrows = 100_000 # number of trials
ndice = ncols = 2       # number of random events in each trial
die = [1, 2, 3, 4, 5, 6]

rolls = np.random.choice(die, ntrials * ndice).reshape(nrows, ncols)
rolls = pd.DataFrame(rolls)
rolls.head()


Unnamed: 0,0,1
0,2,6
1,6,1
2,5,2
3,5,2
4,6,1


In [9]:
odd_doubles_two = rolls.apply(lambda row: row[0] == row[1], axis=1).mean()

The odds of rolling doubles when rolling two six-sided dice are 17%


2. If you flip 8 coins, what is the probability of getting exactly 3 heads? 

In [10]:
coin = [0, 1]
ntrials = nrows = 100_000
nflips = ncols = 8

In [11]:
np.random.seed(7)
flips = np.random.choice(coin, ntrials * nflips).reshape(nrows, ncols)
flips = pd.DataFrame(flips)
flips.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,1,0,1,0,1,1,1,1
1,0,1,0,1,0,1,0,0
2,0,0,1,0,0,0,1,1
3,0,0,1,1,0,0,1,0
4,0,1,1,0,0,1,1,0


In [13]:
three_heads = flips.apply(lambda row: row.values.sum() == 3, axis=1).mean()

print(f'The odds of flipping exactly three heads when flipping eight fair coins are: {round(three_heads*100)}%')

The odds of flipping exactly three heads when flipping eight fair coins are: 22%


- What is the probability of getting more than 3 heads? 

In [15]:
more_three = flips.apply(lambda row: row.values.sum() > 3, axis=1).mean()
print(f'The odds of flipping more than  three heads when flipping eight fair coins are: {round(more_three*100)}%')

The odds of flipping more than  three heads when flipping eight fair coins are: 64%


3. <p>There are approximitely 3 web development cohorts for every 1 data science cohort at Codeup. Assuming that Codeup randomly selects an alumni to put on a billboard, what are the odds that the two billboards I drive past both have data science students on them?</p>

In [21]:
cohorts = ['WD', 'WD', 'WD', 'DS']
ntrials = nrows = 100_000
npicks = ncols = 2
students = np.random.choice(cohorts, ntrials * npicks).reshape(nrows, ncols)
students = pd.DataFrame(students)
students.head()

Unnamed: 0,0,1
0,WD,WD
1,DS,DS
2,DS,WD
3,WD,WD
4,WD,DS


In [23]:
both_ds = students.apply(lambda row: (row.values[0] == "DS") and (row.values[1] == "DS"), axis=1).mean()
print(f'The odds of both billboards being data science students are: {round(both_ds*100)}%')

The odds of both billboards being data science students are: 6%


4. <p>Codeup students buy, on average, 3 poptart packages with a standard deviation of 1.5 a day from the snack vending machine. If on monday the machine is restocked with 17 poptart packages, how likely is it that I will be able to buy some poptarts on Friday afternoon? (Remember, if you have mean and standard deviation, use the <code>np.random.normal</code>)</p>

In [24]:
nrows = 100_000
ncols = 5 

tarts = np.random.normal(3, 1.5, nrows * ncols).astype(int).reshape(nrows, ncols)

# convert to data frame
tarts = pd.DataFrame(tarts)
tarts.head()

Unnamed: 0,0,1,2,3,4
0,4,2,4,3,3
1,3,2,6,4,5
2,1,5,5,0,4
3,5,0,3,0,4
4,0,2,2,4,4


In [26]:
odds_tarts = tarts.apply(lambda row: row.values.sum() < 17, axis=1).mean()
print(f'The odds of Poptarts being vailable on a Friday afternoon are {round(odds_tarts*100)}%')

The odds of Poptarts being vailable on a Friday afternoon are 88%


5. <p>Compare Heights</p>
<ul>
<li>Men have an average height of 178 cm and standard deviation of 8cm.</li>
<li>Women have a mean of 170, sd = 6cm.</li>
<li>Since you have means and standard deviations, you can use <code>np.random.normal</code> to generate observations.</li>


In [27]:
male_hights = np.random.normal(178, 8, 100_000)
female_hights = np.random.normal(170, 6, 100_000)

heights = pd.DataFrame({"male_hights" : male_hights,
                   "female_hights" : female_hights})

heights['female_taller'] = heights.female_hights > heights.male_hights
heights.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
male_hights,100000.0,177.991698,7.982052,140.1681,172.629426,177.985349,183.410168,211.105757
female_hights,100000.0,170.013496,5.996613,140.753171,165.997618,170.009912,174.068662,196.433181


<li>If a man and woman are chosen at random, P(woman taller than man)?</li>


In [29]:
taller_female = heights['female_taller'].mean()

print(f"The probability a randomly chosen woman is taller than a randomly chosen man,is {round(taller_female*100)}%")

The probability a randomly chosen woman is taller than a randomly chosen man,is 21%


6. <p>When installing anaconda on a student's computer, there's a 1 in 250 chance
   that the download is corrupted and the installation fails. What are the odds
   that after having 50 students download anaconda, no one has an installation
   issue?  

In [32]:
nrows = 100_000 
ncols = 50 

problem_range = [False for r in range(1,250)]
problem_range.append(True)

installs = np.random.choice(problem_range, nrows * ncols).reshape(nrows, ncols)
installs = pd.DataFrame(installs)
installs.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False


In [33]:
corrupt = installs.apply(lambda row: row.values.sum() == 0, axis=1).mean()

print(f'The odds of 50 students dowloading Anaconda without a problem are {100-(round(corrupt*100))}%')

The odds of 50 students dowloading Anaconda without a problem are 18%


- 100 students?</p>

In [34]:
nrows = 100_000 
ncols = 100

problem_range = [False for r in range(1,250)]
problem_range.append(True)

installs = np.random.choice(problem_range, nrows * ncols).reshape(nrows, ncols)
installs = pd.DataFrame(installs)

corrupt = installs.apply(lambda row: row.values.sum() == 0, axis=1).mean()

print(f'The odds of 100 students dowloading Anaconda without a problem are {100-(round(corrupt*100))}%')

The odds of 100 students dowloading Anaconda without a problem are 33%


- <p>What is the probability that we observe an installation issue within the first
150 students that download anaconda?</p>

In [37]:
nrows = 100_000
ncols = 150

problem_range = [False for r in range(1,250)]
problem_range.append(True)

installs = np.random.choice(problem_range, nrows * ncols).reshape(nrows, ncols)
installs = pd.DataFrame(installs)

problems = installs.apply(lambda row: row.values.sum() > 0, axis=1).mean()

print(f'The odds of 150 students downloading Anaconda without a problem are {round(problems*100)}%')


The odds of 150 students downloading Anaconda without a problem are 45%


- <p>How likely is it that 450 students all download anaconda without an issue?</p>


In [38]:
nrows = 100_000
ncols = 450

problem_range = [False for r in range(1,250)]
problem_range.append(True)

installs = np.random.choice(problem_range, nrows * ncols).reshape(nrows, ncols)
installs = pd.DataFrame(installs)

no_problems = installs.apply(lambda row: row.values.sum() == 0, axis=1).mean()

print(f'The odds of 450 students downloading Anaconda without a problem are {round(no_problems*100)}%')


The odds of 450 students downloading Anaconda without a problem are 16%


7. <p>There's a 70% chance on any given day that there will be at least one food
   truck at Travis Park. However, you haven't seen a food truck there in 3 days.
   How unlikely is this?</p>

In [39]:
nrows = 100_000 
ncols = 3

truck = np.random.choice([True, False], nrows * ncols, p=[.7,.3]).reshape(nrows, ncols)
truck = pd.DataFrame(truck)

no_truck = truck.apply(lambda row: row.values.sum() == 0, axis=1).mean()
print(f'The odds of no truck 3 days is {round(no_truck*100)}%')


The odds of no truck 3 days is 3%


- <p>How likely is it that a food truck will show up sometime this week?</p>


In [41]:
nrows = 100_000 
ncols = 7

truck = np.random.choice([True, False], nrows * ncols, p=[.7,.3]).reshape(nrows, ncols)
truck = pd.DataFrame(truck)

yes_truck = truck.apply(lambda row: row.values.sum() > 0, axis=1).mean()
print(f'The odds of 1 truck in 7  days is {round(yes_truck*100)}%')

The odds of 1 truck in 7  days is 100%


8. <p>If 23 people are in the same room, what are the odds that two of them share a birthday? </p>

In [42]:
#from exercise review...

nrows = 100_000
ncols = 23

bdays = np.random.choice([r for r in range(366)], nrows * ncols).reshape(nrows, ncols)
bdays = pd.DataFrame(bdays)

bdays['same'] = bdays.nunique(axis = 1) < ncols

match = bdays['same'].mean()

print(f'The odds 23 people having at least two people that share a birthday are {round(match*100)}%')



The odds 23 people having at least two people that share a birthday are 51%


- What if it's 20 people? 

- 40?

---

<h4 id="bonus-exercises">Bonus Exercises</h4>
<

<li><a href="https://gist.github.com/ryanorsinger/2996446f02c1bf30fcb3f8fdb88bd51d">Mage Duel</a></li>

<li><a href="https://gist.github.com/ryanorsinger/eac1d7b7e978f90b8390bdc056312123">Chuck a Luck</a></li>