# Probability 2: Loaded dice 

In this assignment you will be reinforcening your intuition about the concepts covered in the lectures by taking the example with the dice to the next level. 

This assignment will not evaluate your coding skills but rather your intuition and analytical skills. You can answer any of the exercise questions by any means necessary, you can take the analytical route and compute the exact values or you can alternatively create some code that simulates the situations at hand and provide approximate values (grading will have some tolerance to allow approximate solutions). It is up to you which route you want to take! 

Note that every exercise has a blank cell that you can use to make your calculations, this cell has just been placed there for you convenience but **will not be graded** so you can leave empty if you want to.

In [98]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import utils

## Some concept clarifications 🎲🎲🎲

During this assignment you will be presented with various scenarios that involve dice. Usually dice can have different numbers of sides and can be either fair or loaded.

- A fair dice has equal probability of landing on every side.
- A loaded dice does not have equal probability of landing on every side. Usually one (or more) sides have a greater probability of showing up than the rest.

Let's get started!

## Exercise 1:



Given a 6-sided fair dice (all of the sides have equal probability of showing up), compute the mean and variance for the probability distribution that models said dice. The next figure shows you a visual represenatation of said distribution:

<img src="./images/fair_dice.png" style="height: 300px;"/>

**Submission considerations:**
- Submit your answers as floating point numbers with three digits after the decimal point
- Example: To submit the value of 1/4 enter 0.250

Hints: 
- You can use [np.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) to simulate a fair dice.
- You can use [np.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html) and [np.var](https://numpy.org/doc/stable/reference/generated/numpy.var.html) to compute the mean and variance of a numpy array.

In [100]:
# Run this cell to submit your answer
utils.exercise_1()

FloatText(value=0.0, description='Mean:')

FloatText(value=0.0, description='Variance:')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 2:

Now suppose you are throwing the dice (same dice as in the previous exercise) two times and recording the sum of each throw. Which of the following `probability mass functions` will be the one you should get?

<table><tr>
<td> <img src="./images/hist_sum_6_side.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_5_side.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_6_uf.png" style="height: 300px;"/> </td>
</tr></table>


Hints: 
- You can use numpy arrays to hold the results of many throws.
- You can sum to numpy arrays by using the `+` operator like this: `sum = first_throw + second_throw`
- To simulate multiple throws of a dice you can use list comprehension or a for loop

In [102]:
# Run this cell to submit your answer
utils.exercise_2()

ToggleButtons(description='Your answer:', options=('left', 'center', 'right'), value='left')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 3:

Given a fair 4-sided dice, you throw it two times and record the sum. The figure on the left shows the probabilities of the dice landing on each side and the right figure the histogram of the sum. Fill out the probabilities of each sum (notice that the distribution of the sum is symetrical so you only need to input 4 values in total):

<img src="./images/4_side_hists.png" style="height: 300px;"/>

**Submission considerations:**
- Submit your answers as floating point numbers with three digits after the decimal point
- Example: To submit the value of 1/4 enter 0.250

In [103]:
# # You can use this cell for your calculations (not graded)

# probs = [0.250, 0.250, 0.250, 0.250]
# throws = np.random.choice(dice[0:4], size=(5000, 2), p=probs)
# sum_throws = np.sum(throws, axis=1)
# unique, n = np.unique(sum_throws, return_counts=True)
# pmf = n / len(throws)
# sns.barplot(x=unique, y=pmf)
# print(pmf)
# print(pmf[0]+pmf[-1])
# print(pmf[1]+pmf[5])
# print(pmf[2]+pmf[4])
# print(pmf[3])


NameError: name 'dice' is not defined

In [104]:
# Run this cell to submit your answer
utils.exercise_3()

FloatText(value=0.0, description='P for sum=2|8', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=3|7:', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=4|6:', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=5:', style=DescriptionStyle(description_width='initial'))

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 4:

Using the same scenario as in the previous exercise. Compute the mean and variance of the sum of the two throws  and the covariance between the first and the second throw:

<img src="./images/4_sided_hist_no_prob.png" style="height: 300px;"/>


Hints:
- You can use [np.cov](https://numpy.org/doc/stable/reference/generated/numpy.cov.html) to compute the covariance of two numpy arrays (this may not be needed for this particular exercise).

In [105]:
# You can use this cell for your calculations (not graded)



In [106]:
# Run this cell to submit your answer
utils.exercise_4()

FloatText(value=0.0, description='Mean:')

FloatText(value=0.0, description='Variance:')

FloatText(value=0.0, description='Covariance:')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 5:


Now suppose you are have a loaded 4-sided dice (it is loaded so that it lands twice as often on side 2 compared to the other sides): 


<img src="./images/4_side_uf.png" style="height: 300px;"/>

You are throwing it two times and recording the sum of each throw. Which of the following `probability mass functions` will be the one you should get?

<table><tr>
<td> <img src="./images/hist_sum_4_4l.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_4_3l.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_4_uf.png" style="height: 300px;"/> </td>
</tr></table>

Hints: 
- You can use the `p` parameter of [np.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) to simulate a loaded dice.

In [107]:
# You can use this cell for your calculations (not graded)



In [108]:
# Run this cell to submit your answer
utils.exercise_5()

ToggleButtons(description='Your answer:', options=('left', 'center', 'right'), value='left')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 6:

You have a 6-sided dice that is loaded so that it lands twice as often on side 3 compared to the other sides:

<img src="./images/loaded_6_side.png" style="height: 300px;"/>

You record the sum of throwing it twice. What is the highest value (of the sum) that will yield a cumulative probability lower or equal to 0.5?

<img src="./images/loaded_6_cdf.png" style="height: 300px;"/>

Hints:
- The probability of side 3 is equal to $\frac{2}{7}$

In [109]:
# You can use this cell for your calculations (not graded)



In [110]:
# Run this cell to submit your answer
utils.exercise_6()

IntSlider(value=2, continuous_update=False, description='Sum:', max=12, min=2)

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 7:

Given a 6-sided fair dice you try a new game. You only throw the dice a second time if the result of the first throw is **lower** or equal to 3. Which of the following `probability mass functions` will be the one you should get given this new constraint?

<table><tr>
<td> <img src="./images/6_sided_cond_green.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_blue.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_red.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_brown.png" style="height: 250px;"/> </td>

</tr></table>

Hints:
- You can simulate the second throws as a numpy array and then make the values that met a certain criteria equal to 0 by using [np.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html)

In [111]:
# You can use this cell for your calculations (not graded)



In [112]:
# Run this cell to submit your answer
utils.exercise_7()

ToggleButtons(description='Your answer:', options=('left-most', 'left-center', 'right-center', 'right-most'), …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 8:

Given the same scenario as in the previous exercise but with the twist that you only throw the dice a second time if the result of the first throw is **greater** or equal to 3. Which of the following `probability mass functions` will be the one you should get given this new constraint?

<table><tr>
<td> <img src="./images/6_sided_cond_green2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_blue2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_red2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_brown2.png" style="height: 250px;"/> </td>

</tr></table>


In [113]:
# You can use this cell for your calculations (not graded)



In [114]:
# Run this cell to submit your answer
utils.exercise_8()

ToggleButtons(description='Your answer:', options=('left-most', 'left-center', 'right-center', 'right-most'), …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 9:

Given a n-sided fair dice. You throw it twice and record the sum. How does increasing the number of sides `n` of the dice impact the mean and variance of the sum and the covariance of the joint distribution?

In [115]:
# You can use this cell for your calculations (not graded)
t1 = np.random.choice(100, (10000, 2))
sums_1 = np.sum(t1, axis=1)
print(np.mean(sums_1), np.var(sums_1), np.cov(sums_1))

t2 = np.random.choice(200, (10000, 2))
sums_2 = np.sum(t2, axis=1)
print(np.mean(sums_2), np.var(sums_2), np.cov(sums_2))

t3 = np.random.choice(300, (10000, 2))
sums_3 = np.sum(t3, axis=1)
print(np.mean(sums_3), np.var(sums_3), np.cov(sums_3))

98.9088 1659.10988256 1659.275810141014
197.6631 6700.944198390001 6701.614359825982
299.4899 15153.06229799 15154.577755765577


In [116]:
# Run this cell to submit your answer
utils.exercise_9()

As the number of sides in the die increases:


ToggleButtons(description='The mean of the sum:', options=('stays the same', 'increases', 'decreases'), value=…

ToggleButtons(description='The variance of the sum:', options=('stays the same', 'increases', 'decreases'), va…

ToggleButtons(description='The covariance of the joint distribution:', options=('stays the same', 'increases',…

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 10:

Given a 6-sided loaded dice. You throw it twice and record the sum. Which of the following statemets is true?

In [92]:
# You can use this cell for your calculations (not graded)

t = np.random.choice(6, (10000, 2), p=[0.143, 0.143, 0.286, 0.143, 0.143, 0.142])
sums = np.sum(t, axis=1)
print(np.mean(sums), np.var(sums), np.cov(sums))

4.8723 4.99239271 4.992891999199911


In [93]:
# Run this cell to submit your answer
utils.exercise_10()

RadioButtons(layout=Layout(width='max-content'), options=('the mean and variance is the same regardless of whi…

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 11:

Given a n-sided dice (could be fair or not). You throw it twice and record the sum (there is no dependance between the throws). If you are only given the histogram of the sums can you use it to know which are the probabilities of the dice landing on each side?

In other words, if you are provided with only the histogram of the sums like this one:
<td> <img src="./images/hist_sum_6_side.png" style="height: 300px;"/> </td>

Could you use it to know the probabilities of the dice landing on each side? Which will be equivalent to finding this histogram:
<img src="./images/fair_dice.png" style="height: 300px;"/>


In [94]:
# You can use this cell for your calculations (not graded)



In [97]:
# Run this cell to submit your answer
utils.exercise_11()

RadioButtons(layout=Layout(width='max-content'), options=('yes, but only if one of the sides is loaded', 'no, …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Before Submitting Your Assignment

Run the next cell to check that you have answered all of the exercises

In [117]:
utils.check_submissions()

All answers saved, you can submit the assignment for grading!


**Congratulations on finishing this assignment!**

During this assignment you tested your knowledge on probability distributions, descriptive statistics and visual interpretation of these concepts. You had the choice to compute everything analytically or create simulations to assist you get the right answer. You probably also realized that some exercises could be answered without any computations just by looking at certain hidden queues that the visualizations revealed.

**Keep up the good work!**
