In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("JHW2.ipynb")

<h2> Jupyter Homework Week 2 </h2>

### Simulation

<img src="images/dice.jpg" style="float: right; width: 12%">

One of the most powerful tools we have in understanding probability is to *simulate* an experiment many times; this gives us a lot of information about the long-term behavior of a model.

In this notebook, we'll explore how we can simulate dice rolls using Python and generate and analyze random data for this purpose. The code blocks below will simulate throwing multiple dice, recording their outcomes, and doing computations with them.

Let's make a function that simulates _many_ (any amount we want) rolls of one die. We'll use [**numpy**](https://numpy.org)'s package [np.random](https://numpy.org/doc/stable/reference/random/index.html), which is used for **random sampling**. We will use this package extensively in this class. This is a python package that will help us simulate random variables.

In [13]:
# just run me
import numpy as np

In [14]:
def roll_die(num_sides,num_trials):
    simulated_rolls = np.random.randint(low = 1, high = num_sides+1, size = num_trials)
    return simulated_rolls

We're now going to roll the die ten times.

In [15]:
# set the number of trials and sides
num_trials = 10
num_sides =  6

# simulate
results = roll_die(num_sides,num_trials)

print(results)

[3 2 5 2 6 2 4 3 5 5]


Now suppose we want to roll **two** dice, *add their rolls*, and count how many times their sum is eight. Let's also do $10,000$ experiments.

In [22]:
num_trials = 10000
num_sides =  6

# simulate each die's rolls
first_die = roll_die(num_sides,num_trials)
second_die = roll_die(num_sides,num_trials)

# add the two vectors
sum_of_dice = first_die+second_die

# count the number of times the sum is equal to 8
num = (sum_of_dice == 8).sum()

print('The dice added to 8 in ',num,' trials out of ',num_trials)

The dice added to 8 in  1375  trials out of  10000


## Explaining the code

To be clear as to what's going on, let's look more closely at the variables we've created.

In [23]:
first_die[:5] # first 5 rolls of die 1

array([1, 6, 6, 6, 3])

In [24]:
second_die[:5] # first 5 rolls of die 2

array([4, 4, 1, 2, 3])

In [25]:
sum_of_dice[:5] # sum of first five rolls

array([ 5, 10,  7,  8,  6])

Notice that the **sum_of_dice** is the *entry-by-entry* sum of the **first_die** and **second_die** combined. Now let's check how often the sum is equal to **8**.

In [26]:
# check each entry if it is equal to 8
(sum_of_dice == 8)[:5]

array([False, False, False,  True, False])

The above creates what is called a boolean array. This is an array, or vector, of *True* or *False* variables (in our case, *True* means the sum was 8, and *False* means the sum was not 8). If we use the **.sum()** method on this array, we get the total amount of times we rolled an 8.

In [27]:
# count the number of times it is equal to 8
(sum_of_dice == 8)[:5].sum()

1

## Making this into a function

We can make the above into our own function, so we can use it again.

In [28]:
def calculate_frequency(num_sides,num_trials,r):
    
    # simulate each die's rolls
    first_die = roll_die(num_sides,num_trials)
    second_die = roll_die(num_sides,num_trials)
    
    # add the two vectors
    sum_of_dice = first_die+second_die
    
    # calculate the frequency at which the sum is equal to r
    freq = (sum_of_dice == r).mean()

    return freq


## Analyzing our results

Running $10000$ trials, I had $1384$ successes -- meaning that the probability of the two dice adding to $8$ is about $0.1384$. On the other hand, using the ideas from class, there are $5$ possible dice rolls out of the $36$ total which have a sum of $5$ (2-6, 3-5, 4-4, 5-3, and 6-2). This leads to a probability of $5/36 \approx 0.139$; so our simulation was only about $.0006$ off from the truth. This is pretty great!

We can build a more elaborate table that counts the frequency of *all possible sums*.

In [29]:
num_trials = 1000
num_sides = 6
for r in range(2, 13):
    frequency = calculate_frequency(num_sides,num_trials,r)
    print(f'Probability of summing to ', r,' = ', frequency)

Probability of summing to  2  =  0.024
Probability of summing to  3  =  0.057
Probability of summing to  4  =  0.079
Probability of summing to  5  =  0.109
Probability of summing to  6  =  0.147
Probability of summing to  7  =  0.154
Probability of summing to  8  =  0.157
Probability of summing to  9  =  0.117
Probability of summing to  10  =  0.097
Probability of summing to  11  =  0.062
Probability of summing to  12  =  0.034


Based on this, the most likely outcomes were $6, 7, $ and $8$. We would need to do more trials to get more certainty -- this is a pretty small simulation!

<h3> Questions </h3>


### Question 1

If you have *three six-sided dice*, use a simulation to estimate the probability the sum would equal 7.

<!-- BEGIN QUESTION -->



In [34]:
# Fill in the ... with your answers

def sum_of_seven(num_trials):
    """Count the number of times the sum of three dice rolls equals 7"""
    num_sides = 6 # SOLUTION
    first_die = roll_die(num_sides,num_trials) # SOLUTION
    second_die = roll_die(num_sides,num_trials) # SOLUTION
    third_die = roll_die(num_sides,num_trials) # SOLUTION
    sum_of_dice = first_die+second_die+third_die # SOLUTION
    frequency_equal_to_seven = (sum_of_dice ==7).mean() # SOLUTION
    return frequency_equal_to_seven

In [33]:
grader.check("q1")

<!-- END QUESTION -->


### Question 2

What is/are the most common sum(s) of the three six sided dice?


<!-- BEGIN QUESTION -->



In [None]:
# Fill in the ... with your answer
most_common_sum = ...

In [None]:
grader.check("q2")

<!-- END QUESTION -->

### Question 3

Let's change the experiment: roll two dice and multiply their values instead of adding. Make a table of the probabilities for each outcome in $\{1, 2, ..., 36\}$.

<!-- BEGIN QUESTION -->



In [None]:
# Fill in the ... with your answers

def simulate_product(num_sides,num_trials):
    """Return an array of simulations of the product of two dice rolls"""
    first_die = ...
    second_die = ...
    product = ...
    return product


product_of_dice = simulate_product(6,10000)
for r in range(1, 37):
    frequency = (product_of_dice == r).mean()
    print(f'Probability of summing to ', r,' = ', frequency)

In [None]:
grader.check("q3")

<!-- END QUESTION -->



## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)