## Simulation Exercises

Using the repo setup directions, setup a new local and remote repository named statistics-exercises. The local version of your repo should live inside of ~/codeup-data-science. This repo should be named statistics-exercises

Do your work for this exercise in either a python file named simulation.py or a jupyter notebook named simulation.ipynb.

## Generating Random Numbers with Numpy

The `numpy.random` module provides a number of functions for generating random numbers.

- `np.random.choice`: selects random options from a list
- `np.random.random`: generates numbers between 0 and 1
- `np.random.uniform`: generates numbers between a given lower and upper bound
- `np.random.randn`: generates numbers from the standard normal distribution
- `np.random.normal`: generates numbers from a normal distribution with a specified mean and standard deviation

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'

import viz # curriculum example visualizations (must live in same dir)

np.random.seed(29)

1. How likely is it that you roll doubles when rolling two dice?

In [2]:
n_trials = nrows = 10_000
n_dice = ncols = 1

roll1 = np.random.choice([1, 2, 3, 4, 5, 6], size = (n_trials, n_dice))
roll2 = np.random.choice([1, 2, 3, 4, 5, 6], size = (n_trials, n_dice))
roll = (roll1, roll2)
roll

(array([[6],
        [4],
        [5],
        ...,
        [2],
        [3],
        [3]]),
 array([[2],
        [1],
        [2],
        ...,
        [4],
        [4],
        [6]]))

In [3]:
doubles = (roll1 == roll2)
doubles

array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])

In [4]:
sums_by_trial = doubles.sum(axis=1)
sums_by_trial

array([0, 0, 0, ..., 0, 0, 0])

In [5]:
win_rate = doubles.mean()
win_rate

0.1613

2. If you flip 8 coins, what is the probability of getting exactly 3 heads? What is the probability of getting more than 3 heads?

In [6]:
n_flips = nrows = 10_000
n_coins = ncols = 8

toss = np.random.choice([True, False], size = (n_flips, n_coins))
toss

array([[ True,  True,  True, ...,  True, False, False],
       [ True,  True, False, ..., False, False,  True],
       [False,  True, False, ..., False, False, False],
       ...,
       [ True, False,  True, ..., False, False,  True],
       [False,  True,  True, ..., False, False, False],
       [False,  True,  True, ...,  True, False,  True]])

In [7]:
toss.sum(axis=1)
# adds all 8 flips in each toss

array([5, 3, 2, ..., 5, 2, 6])

In [8]:
(toss.sum(axis=1) == 3).sum()

2203

In [9]:
print(f'Probability of getting exactly 3 Heads: {(toss.sum(axis=1) == 3).sum() / n_flips}')

Probability of getting exactly 3 Heads: 0.2203


In [10]:
print(f'Probability of getting more than 3 Heads: {(toss.sum(axis=1) > 3).sum() / n_flips}')

Probability of getting more than 3 Heads: 0.634


3. There are approximitely 3 web development cohorts for every 1 data science cohort at Codeup. Assuming that Codeup randomly selects an alumni to put on a billboard, what are the odds that the two billboards I drive past both have data science students on them?

In [None]:
# WD = 0 (False), DS = 1 (True)
bb = np.random.choice([0, 1], size = (100000, 2), p = [0.75, 0.25])
bb

4. Codeup students buy, on average, 3 poptart packages with a standard deviation of 1.5 a day from the snack vending machine. If on monday the machine is restocked with 17 poptart packages, how likely is it that I will be able to buy some poptarts on Friday afternoon? (Remember, if you have mean and standard deviation, use the np.random.normal) You'll need to make a judgement call on how to handle some of your values

5. Compare Heights
* Men have an average height of 178 cm and standard deviation of 8cm.
* Women have a mean of 170, sd = 6cm.
* Since you have means and standard deviations, you can use np.random.normal to generate observations.
* If a man and woman are chosen at random, what is the likelihood the woman is taller than the man?

In [11]:
# np.random.normal(loc = 'mean', scale = 'std', size = '(columns, rows)')
np.random.normal(size = 1000, loc = 50, scale = 100)

array([-1.39950818e+02,  7.26099213e+01,  7.90144036e+01,  2.99922036e+01,
        1.61879175e+02,  1.46141418e+02,  1.63613075e+01,  1.22348052e+02,
       -2.62081639e+01,  1.34586370e+01, -5.02435608e+01, -1.60639509e+02,
       -6.53156857e+01, -1.70757529e+01,  1.59669306e+02,  5.81277498e+01,
        2.56648514e+02, -2.73599906e+01,  1.84519857e+02,  1.93120682e+02,
        2.19431993e+02,  1.43966167e+02,  8.76990935e+01,  1.18849396e+02,
       -7.56154132e+01, -1.28682769e+00, -2.15228987e+01,  1.72759526e+02,
        3.31656058e+01, -1.98283665e+01,  2.16083127e+01,  1.77698604e+02,
        1.52849398e+02, -4.53882435e+01, -3.93284930e+01, -8.25649138e+01,
        2.03489003e+01,  3.08553531e+01, -3.79177028e+01,  3.15942726e+02,
        7.53400292e+01, -8.81477245e+01,  2.80128144e+02,  3.88176792e+01,
        4.85247849e+01, -1.52111576e+02, -6.93728095e+01,  7.69049225e+01,
       -8.57512811e+00, -5.50990692e+01,  1.89022557e+01,  9.48349024e+01,
        6.12186775e+01, -

In [12]:
male = np.random.normal(size = 10_000, loc = 178, scale = 8)
female = np.random.normal(size = 10_000, loc = 170, scale = 6)

In [13]:
male

array([181.38064561, 187.92426727, 171.43934312, ..., 187.07021974,
       167.07207298, 178.96254555])

In [14]:
female

array([164.17709879, 168.05938981, 167.27112655, ..., 165.32680489,
       175.3458059 , 172.30014181])

In [15]:
(female > male).mean()

0.209

6. When installing anaconda on a student's computer, there's a 1 in 250 chance that the download is corrupted and the installation fails. 
* What are the odds that after having 50 students download anaconda, no one has an installation issue? 100 students?
* What is the probability that we observe an installation issue within the first 150 students that download anaconda?
* How likely is it that 450 students all download anaconda without an issue?

7. There's a 70% chance on any given day that there will be at least one food truck at Travis Park. However, you haven't seen a food truck there in 3 days. How unlikely is this?
* How likely is it that a food truck will show up sometime this week?

8. If 23 people are in the same room, what are the odds that two of them share a birthday? 
* What if it's 
* 20 people? 40?