# Probability 2: Loaded dice 

In this assignment you will be reinforcening your intuition about the concepts covered in the lectures by taking the example with the dice to the next level. 

This assignment will not evaluate your coding skills but rather your intuition and analytical skills. You can answer any of the exercise questions by any means necessary, you can take the analytical route and compute the exact values or you can alternatively create some code that simulates the situations at hand and provide approximate values (grading will have some tolerance to allow approximate solutions). It is up to you which route you want to take! 

Note that every exercise has a blank cell that you can use to make your calculations, this cell has just been placed there for you convenience but **will not be graded** so you can leave empty if you want to.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import utils

## Some concept clarifications 🎲🎲🎲

During this assignment you will be presented with various scenarios that involve dice. Usually dice can have different numbers of sides and can be either fair or loaded.

- A fair dice has equal probability of landing on every side.
- A loaded dice does not have equal probability of landing on every side. Usually one (or more) sides have a greater probability of showing up than the rest.

Let's get started!

## Exercise 1:



Given a 6-sided fair dice (all of the sides have equal probability of showing up), compute the mean and variance for the probability distribution that models said dice. The next figure shows you a visual represenatation of said distribution:

<img src="./images/fair_dice.png" style="height: 300px;"/>

**Submission considerations:**
- Submit your answers as floating point numbers with three digits after the decimal point
- Example: To submit the value of 1/4 enter 0.250

Hints: 
- You can use [np.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) to simulate a fair dice.
- You can use [np.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html) and [np.var](https://numpy.org/doc/stable/reference/generated/numpy.var.html) to compute the mean and variance of a numpy array.

In [44]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Simulate throwing the fair dice many times
throws = np.random.choice([1, 2, 3, 4, 5, 6], size=1000000)

# Compute the mean and variance
mean = np.mean(throws)
variance = np.var(throws)

# Print the results
print(f"Mean: {mean:.3f}")
print(f"Variance: {variance:.3f}")

Mean: 3.503
Variance: 2.918


In [57]:
# Run this cell to submit your answer
utils.exercise_1()

FloatText(value=0.0, description='Mean:')

FloatText(value=0.0, description='Variance:')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 2:

Now suppose you are throwing the dice (same dice as in the previous exercise) two times and recording the sum of each throw. Which of the following `probability mass functions` will be the one you should get?

<table><tr>
<td> <img src="./images/hist_sum_6_side.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_5_side.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_6_uf.png" style="height: 300px;"/> </td>
</tr></table>


Hints: 
- You can use numpy arrays to hold the results of many throws.
- You can sum to numpy arrays by using the `+` operator like this: `sum = first_throw + second_throw`
- To simulate multiple throws of a dice you can use list comprehension or a for loop

In [42]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Simulate throwing the dice two times
num_throws = 100000
first_throw = np.random.choice([1, 2, 3, 4, 5, 6], size=num_throws)
second_throw = np.random.choice([1, 2, 3, 4, 5, 6], size=num_throws)

# Calculate the sums
sums = first_throw + second_throw

# Calculate the frequencies of each sum
pmf = np.bincount(sums, minlength=13) / num_throws

# Print the PMF
for i, probability in enumerate(pmf):
    print(f"Sum {i}: {probability:.3f}")

Sum 0: 0.000
Sum 1: 0.000
Sum 2: 0.027
Sum 3: 0.057
Sum 4: 0.084
Sum 5: 0.111
Sum 6: 0.139
Sum 7: 0.166
Sum 8: 0.138
Sum 9: 0.111
Sum 10: 0.083
Sum 11: 0.056
Sum 12: 0.027


In [58]:
# Run this cell to submit your answer
utils.exercise_2()

ToggleButtons(description='Your answer:', options=('left', 'center', 'right'), value='left')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 3:

Given a fair 4-sided dice, you throw it two times and record the sum. The figure on the left shows the probabilities of the dice landing on each side and the right figure the histogram of the sum. Fill out the probabilities of each sum (notice that the distribution of the sum is symetrical so you only need to input 4 values in total):

<img src="./images/4_side_hists.png" style="height: 300px;"/>

**Submission considerations:**
- Submit your answers as floating point numbers with three digits after the decimal point
- Example: To submit the value of 1/4 enter 0.250

In [39]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Define the probabilities of each side of the dice
probabilities = [1/4, 1/4, 1/4, 1/4]

# Calculate the probabilities of each sum
sum_probabilities = []
for i in range(2, 9):
    count = 0
    for j in range(1, 5):
        if i - j >= 1 and i - j <= 4:
            count += probabilities[j-1] * probabilities[i-j-1]
    sum_probabilities.append(count)

# Print the probabilities of each sum
for i, prob in enumerate(sum_probabilities):
    print(f"Probability of sum {i+2}: {prob:.3f}")

Probability of sum 2: 0.062
Probability of sum 3: 0.125
Probability of sum 4: 0.188
Probability of sum 5: 0.250
Probability of sum 6: 0.188
Probability of sum 7: 0.125
Probability of sum 8: 0.062


In [41]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Define the probabilities of each side of the dice
probabilities = [1/4, 3/7, 2/3, 5/]

# Calculate the probabilities of each sum
sum_probabilities = []
for i in range(2, 9):
    count = 0
    for j in range(1, 5):
        if i - j >= 1 and i - j <= 4:
            count += probabilities[j-1] * probabilities[i-j-1]
    sum_probabilities.append(count)

# Print the probabilities of each sum
for i, prob in enumerate(sum_probabilities):
    print(f"Probability of sum {i+2}: {prob:.3f}")

Probability of sum 2: 0.062
Probability of sum 3: 0.214
Probability of sum 4: 0.517
Probability of sum 5: 3.071
Probability of sum 6: 4.730
Probability of sum 7: 6.667
Probability of sum 8: 25.000


In [60]:
# Run this cell to submit your answer
utils.exercise_3()

FloatText(value=0.0, description='P for sum=2|8', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=3|7:', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=4|6:', style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='P for sum=5:', style=DescriptionStyle(description_width='initial'))

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 4:

Using the same scenario as in the previous exercise. Compute the mean and variance of the sum of the two throws  and the covariance between the first and the second throw:

<img src="./images/4_sided_hist_no_prob.png" style="height: 300px;"/>


Hints:
- You can use [np.cov](https://numpy.org/doc/stable/reference/generated/numpy.cov.html) to compute the covariance of two numpy arrays (this may not be needed for this particular exercise).

In [37]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Define the probabilities of each side of the dice
probabilities = [1/4, 1/4, 1/4, 1/4]

# Create an array to represent the possible outcomes of a single throw
outcomes = np.arange(1, 5)

# Compute the sum of two throws and store the results in an array
sum_of_throws = np.add.outer(outcomes, outcomes).flatten()

# Compute the mean of the sum
mean_sum = np.mean(sum_of_throws)

# Compute the variance of the sum
variance_sum = np.var(sum_of_throws)

# Compute the covariance between the first and second throw
covariance = np.cov(outcomes, outcomes)[0, 1]

# Print the mean, variance, and covariance
print(f"Mean of the sum: {mean_sum:.3f}")
print(f"Variance of the sum: {variance_sum:.3f}")
print(f"Covariance between the first and second throw: {covariance:.3f}")

Mean of the sum: 5.000
Variance of the sum: 2.500
Covariance between the first and second throw: 1.667


In [59]:
# Run this cell to submit your answer
utils.exercise_4()

FloatText(value=0.0, description='Mean:')

FloatText(value=0.0, description='Variance:')

FloatText(value=0.0, description='Covariance:')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 5:


Now suppose you are have a loaded 4-sided dice (it is loaded so that it lands twice as often on side 2 compared to the other sides): 


<img src="./images/4_side_uf.png" style="height: 300px;"/>

You are throwing it two times and recording the sum of each throw. Which of the following `probability mass functions` will be the one you should get?

<table><tr>
<td> <img src="./images/hist_sum_4_4l.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_4_3l.png" style="height: 300px;"/> </td>
<td> <img src="./images/hist_sum_4_uf.png" style="height: 300px;"/> </td>
</tr></table>

Hints: 
- You can use the `p` parameter of [np.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) to simulate a loaded dice.

In [35]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Define the adjusted probabilities of each side of the loaded dice
probabilities = [1, 2, 1, 1]

# Normalize the probabilities
probabilities = np.array(probabilities) / np.sum(probabilities)

# Create an array to represent the possible outcomes of a single throw
outcomes = np.arange(1, 5)

# Simulate throwing the loaded dice two times and record the sum of each throw
throws = np.random.choice(outcomes, size=(10000, 2), p=probabilities)
sum_of_throws = np.sum(throws, axis=1)

# Calculate the probability mass function (PMF) for the sum of two throws
pmf, _ = np.histogram(sum_of_throws, bins=np.arange(2, 10), density=True)

# Print the calculated PMF
for i, value in enumerate(pmf):
    print(f"PMF of sum {i+2}: {value:.3f}")

PMF of sum 2: 0.041
PMF of sum 3: 0.161
PMF of sum 4: 0.235
PMF of sum 5: 0.242
PMF of sum 6: 0.196
PMF of sum 7: 0.083
PMF of sum 8: 0.043


In [61]:
# Run this cell to submit your answer
utils.exercise_5()

ToggleButtons(description='Your answer:', options=('left', 'center', 'right'), value='left')

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 6:

You have a 6-sided dice that is loaded so that it lands twice as often on side 3 compared to the other sides:

<img src="./images/loaded_6_side.png" style="height: 300px;"/>

You record the sum of throwing it twice. What is the highest value (of the sum) that will yield a cumulative probability lower or equal to 0.5?

<img src="./images/loaded_6_cdf.png" style="height: 300px;"/>

Hints:
- The probability of side 3 is equal to $\frac{2}{7}$

In [33]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Define the probabilities of each side of the loaded dice
probabilities = [1, 1, 27, 1, 1, 1]

# Normalize the probabilities
probabilities = np.array(probabilities) / np.sum(probabilities)

# Create an array to represent the possible outcomes of a single throw
outcomes = np.arange(1, 7)

# Simulate throwing the loaded dice two times and record the sum of each throw
throws = np.random.choice(outcomes, size=(10000, 2), p=probabilities)
sum_of_throws = np.sum(throws, axis=1)

# Calculate the cumulative probability distribution for the sum of two throws
cumulative_prob = np.cumsum(np.histogram(sum_of_throws, bins=np.arange(2, 14), density=True)[0])

# Find the highest value (sum) with a cumulative probability lower or equal to 0.5
highest_sum = np.argmax(cumulative_prob <= 0.5) + 2

print(f"Highest sum with cumulative probability <= 0.5: {highest_sum}")

Highest sum with cumulative probability <= 0.5: 2


In [62]:
# Run this cell to submit your answer
utils.exercise_6()

IntSlider(value=2, continuous_update=False, description='Sum:', max=12, min=2)

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 7:

Given a 6-sided fair dice you try a new game. You only throw the dice a second time if the result of the first throw is **lower** or equal to 3. Which of the following `probability mass functions` will be the one you should get given this new constraint?

<table><tr>
<td> <img src="./images/6_sided_cond_green.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_blue.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_red.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_brown.png" style="height: 250px;"/> </td>

</tr></table>

Hints:
- You can simulate the second throws as a numpy array and then make the values that met a certain criteria equal to 0 by using [np.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html)

In [31]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Define the probabilities of each side of the fair dice
probabilities = [1, 1, 1, 1, 1, 1]

# Normalize the probabilities
probabilities = np.array(probabilities) / np.sum(probabilities)

# Create an array to represent the possible outcomes of a single throw
outcomes = np.arange(1, 7)

# Simulate throwing the dice twice
first_throw = np.random.choice(outcomes, size=10000, p=probabilities)
second_throw = np.where(first_throw <= 3, np.random.choice(outcomes, size=10000, p=probabilities), 0)

# Calculate the PMF for the second throw given the constraint
pmf = np.histogram(second_throw, bins=np.arange(1, 8), density=True)[0]

print(f"Probability mass function (PMF) for the second throw: {pmf}")

Probability mass function (PMF) for the second throw: [0.17508553 0.17850674 0.16140068 0.16260817 0.16119944 0.16119944]


In [63]:
# Run this cell to submit your answer
utils.exercise_7()

ToggleButtons(description='Your answer:', options=('left-most', 'left-center', 'right-center', 'right-most'), …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 8:

Given the same scenario as in the previous exercise but with the twist that you only throw the dice a second time if the result of the first throw is **greater** or equal to 3. Which of the following `probability mass functions` will be the one you should get given this new constraint?

<table><tr>
<td> <img src="./images/6_sided_cond_green2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_blue2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_red2.png" style="height: 250px;"/> </td>
<td> <img src="./images/6_sided_cond_brown2.png" style="height: 250px;"/> </td>

</tr></table>


In [29]:
# You can use this cell for your calculations (not graded)

import numpy as np

# Define the probabilities of each side of the fair dice
probabilities = [1, 1, 1, 1, 1, 1]

# Normalize the probabilities
probabilities = np.array(probabilities) / np.sum(probabilities)

# Create an array to represent the possible outcomes of a single throw
outcomes = np.arange(1, 7)

# Simulate throwing the dice twice
first_throw = np.random.choice(outcomes, size=10000, p=probabilities)
second_throw = np.where(first_throw >= 3, np.random.choice(outcomes, size=10000, p=probabilities), 0)

# Calculate the PMF for the second throw given the constraint
pmf = np.histogram(second_throw, bins=np.arange(1, 8), density=True)[0]

print(f"Probability mass function (PMF) for the second throw: {pmf}")

Probability mass function (PMF) for the second throw: [0.16769532 0.16844799 0.16859852 0.17070601 0.16137287 0.16317929]


In [64]:
# Run this cell to submit your answer
utils.exercise_8()

ToggleButtons(description='Your answer:', options=('left-most', 'left-center', 'right-center', 'right-most'), …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 9:

Given a n-sided fair dice. You throw it twice and record the sum. How does increasing the number of sides `n` of the dice impact the mean and variance of the sum and the covariance of the joint distribution?

In [27]:
# You can use this cell for your calculations (not graded)

import numpy as np

def throw_dice(n, num_throws):
    throws = np.random.randint(1, n+1, size=(num_throws, 2))
    sum_of_throws = np.sum(throws, axis=1)
    return throws, sum_of_throws

# Parameters
num_sides = [4, 6, 8, 10]  # Different number of sides for the dice
num_throws = 10000

# Compute mean, variance, and covariance for each number of sides
for sides in num_sides:
    throws, sum_of_throws = throw_dice(sides, num_throws)
    mean = np.mean(sum_of_throws)
    variance = np.var(sum_of_throws)
    covariance = np.cov(throws.T)[0, 1]

    print(f"Number of Sides: {sides}")
    print(f"Mean of the Sum: {mean:.3f}")
    print(f"Variance of the Sum: {variance:.3f}")
    print(f"Covariance of the Joint Distribution: {covariance:.3f}")
    print()

Number of Sides: 4
Mean of the Sum: 4.964
Variance of the Sum: 2.429
Covariance of the Joint Distribution: -0.019

Number of Sides: 6
Mean of the Sum: 6.966
Variance of the Sum: 5.879
Covariance of the Joint Distribution: 0.007

Number of Sides: 8
Mean of the Sum: 9.033
Variance of the Sum: 10.443
Covariance of the Joint Distribution: 0.006

Number of Sides: 10
Mean of the Sum: 11.033
Variance of the Sum: 16.522
Covariance of the Joint Distribution: -0.017



In [65]:
# Run this cell to submit your answer
utils.exercise_9()

As the number of sides in the die increases:


ToggleButtons(description='The mean of the sum:', options=('stays the same', 'increases', 'decreases'), value=…

ToggleButtons(description='The variance of the sum:', options=('stays the same', 'increases', 'decreases'), va…

ToggleButtons(description='The covariance of the joint distribution:', options=('stays the same', 'increases',…

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 10:

Given a 6-sided loaded dice. You throw it twice and record the sum. Which of the following statemets is true?

In [66]:
# Run this cell to submit your answer
utils.exercise_10()

RadioButtons(layout=Layout(width='max-content'), options=('the mean and variance is the same regardless of whi…

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 11:

Given a fair n-sided dice. You throw it twice and record the sum but the second throw depends on the result of the first one such as in exercises 7 and 8. Which of the following statements is true?

In [67]:
# Run this cell to submit your answer
utils.exercise_11()

RadioButtons(layout=Layout(width='max-content'), options=('changing the direction of the inequality will chang…

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Exercise 12:

Given a n-sided dice (could be fair or not). You throw it twice and record the sum (there is no dependance between the throws). If you are only given the histogram of the sums can you use it to know which are the probabilities of the dice landing on each side?

In [None]:
# You can use this cell for your calculations (not graded)



In [68]:
# Run this cell to submit your answer
utils.exercise_12()

RadioButtons(layout=Layout(width='max-content'), options=('yes, but only if one of the sides is loaded', 'no, …

Button(button_style='success', description='Save your answer!', style=ButtonStyle())

Output()

## Before Submitting Your Assignment

Run the next cell to check that you have answered all of the exercises

In [69]:
utils.check_submissions()

All answers saved, you can submit the assignment for grading!


**Congratulations on finishing this assignment!**

During this assignment you tested your knowledge on probability distributions, descriptive statistics and visual interpretation of these concepts. You had the choice to compute everything analytically or create simulations to assist you get the right answer. You probably also realized that some exercises could be answered without any computations just by looking at certain hidden queues that the visualizations revealed.

**Keep up the good work!**
