# lab_simulation : Into the Matrix

As you learned in lecture, **simulation** is an extremely powerful tool to estimate the probability of an event occurring by simulating many observations an event and determining the successful observations.

This lab will have you build increasingly interesting simulations and find the results.

## Simulation 1: Pre-Quiz Dice Rolls

The pre-quiz question that was asked by Karle and Wade in lecture was as follows:

> You roll two different fair six-sided dice at the same time.  One die is colored blue, one is colored red.  What is the probability that the blue die lands on 4 or the red die lands on 2?

Simulate the above problem 1,000 times and storing your observations of the value of the red die and blue die into `df1`.

In [1]:
# Step 0: Import any libraries you need:
import pandas as pd
import random as rd



In [2]:
# Step 1: Always start with an empty list to store our simulation data:
data = []

# Step 2: Write the simulation inside of a for-loop
for i in range(1000):
    blue = rd.randint(1,6)
    red = rd.randint(1,6)
    d = {"blue":blue, "red":red}
    data.append(d)
    
# Step 3: Store the simulation data into a DataFrame
df1 = pd.DataFrame(data)



# ...and show a few random rows:
df1.sample(5)

Unnamed: 0,blue,red
818,1,5
842,3,6
141,6,4
267,1,4
280,2,4


### Puzzle 1.1: Probability Calculations

Find our estimation of the probability that the blue die lands on 4 or the red die lands on 2?

- To do this, create a `df1_success` DataFrame with only the rows that were successful.
- Use `df1` and `df1_success` to find the probability of success using your simulation and store that value in `P_puzzle1` below.

In [3]:
# Create a DataFrame that contains only the subset of observations that were successful:
df1_success = df1[(df1["blue"]==4) | (df1["red"]==2)]



# ...and show a few random rows:
df1_success.sample(5)

Unnamed: 0,blue,red
332,4,5
533,4,6
67,4,6
914,1,2
158,2,2


In [4]:
# Find the value of P_puzzle1, the probability of success:
P_puzzle1 = len(df1_success)/len(df1)
P_puzzle1



0.337

### Puzzle 1.2: Finding the Exact Answer

This simulation simulated a pretty easy example that you can find an exact answer to!  Using the probability learned from lecture, calculate `P_puzzle1_exact`, the **exact** probability of the blue die landing on a 4 **or** the red die landing on a 2.

In [9]:
P_red = 1/6
P_blue = 1/6
P_both = 1/36
P_puzzle1_exact = P_red + P_blue - P_both
P_puzzle1_exact



0.3055555555555555

### Puzzle 1.3: Finding the Error

The **error** in a simulation is the difference between the exact value and value found from the simulation.  Subtract the estimated value (`P_puzzle1`) from the exact value (`P_puzzle1_exact`) to find the total error and store it in `puzzle1_error`.

In [10]:
puzzle1_error = P_puzzle1_exact - P_puzzle1
puzzle1_error



-0.0314444444444445

In [11]:
## == TEST CASES for Simulation 1 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df1) == 1000), "Make sure your df1 has exactly 1,000 observations"
assert(len(df1_success) < 1000), "Make sure your df1_success only has successes"
assert(P_puzzle1 > 0 and P_puzzle1 < 1), "Make sure your P_puzzle1 is a probability of success"
assert(round(P_puzzle1_exact, 3) == 0.306), "Make sure your P_puzzle1_exact contains the exact probability of success"
assert(puzzle1_error < 1), "Make sure to calculate the error by subtraction"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")
print()
print(f"Simulated Probability: P(blue == 4 | red == 2) = {round(100 * P_puzzle1, 2)}%")
print(f"    Exact Probability:                         = {round(100 * P_puzzle1_exact, 2)}%")

🎉 All tests passed! 🎉

Simulated Probability: P(blue == 4 | red == 2) = 33.7%
    Exact Probability:                         = 30.56%


## Simulation 2: Rolling Three Die

Let's add another die into the mix.  Suppose we roll three dice: a **white**, a **red**, and a **blue** die.

Write a 1,000-run simulation of that event and store the observations in `df2`:

In [12]:
# Step 1: Always start with an empty list to store our simulation data:
data = []

# Step 2: Write the simulation inside of a for-loop
for i in range(1000):
    blue = rd.randint(1,6)
    red = rd.randint(1,6)
    white = rd.randint(1,6)
    d = {"blue":blue, "red":red, "white":white}
    data.append(d)
    
# Step 3: Store the simulation data into a DataFrame
df2 = pd.DataFrame(data)



# ...and show a few random rows:
df2.sample(5)

Unnamed: 0,blue,red,white
810,3,5,1
973,4,1,4
656,3,3,2
525,6,6,3
880,5,5,2


### Puzzle 2.1: Probability Calculations

Find our estimation of the probability that the **sum of all three die** is equal to exactly 9.

- To do this, create a `df2_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle2` below.

In [13]:
# Create a DataFrame that contains only the subset of observations that were successful:
df2["nine"] = df2["blue"]+df2["red"]+df2["white"]
df2_success = df2[df2["nine"]==9]



# ...and show a few random rows:
df2_success.sample(5)

Unnamed: 0,blue,red,white,nine
840,3,5,1,9
310,2,5,2,9
784,5,3,1,9
480,1,4,4,9
365,4,4,1,9


In [14]:
# Find the value of P_puzzle2, the probability of success:
P_puzzle2 = len(df2_success)/len(df2)
P_puzzle2



0.121

In [15]:
## == TEST CASES for Simulation 2 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df2) == 1000), "Make sure your df2 has exactly 1,000 observations"
assert(len(df2_success) < 1000), "Make sure your df2_success only has successes"
assert(P_puzzle2 > 0 and P_puzzle2 < 1), "Make sure your P_puzzle2 is a probability of success"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 3: Fliping Four Coins

Supoose we flip **four coins**, one coin at a time, one after another.  Each coin has two sides, "Heads" and "Tails".

Write a simulation of that event and store the observations in `df3` and run the simulation **50,000** times:

In [16]:
# Refer to the previous simulations if needed, but write the code yourself (don't just copy/paste and edit it)L
data = []

for i in range(50000):
    one = rd.randint(1,2)
    two = rd.randint(1,2)
    three = rd.randint(1,2)
    four = rd.randint(1,2)
    d = {"one":one, "two":two, "three":three, "four":four}
    data.append(d)



df3 = pd.DataFrame(data)



# ...and show a few random rows:
df3.sample(5)

Unnamed: 0,four,one,three,two
16578,1,2,1,1
30657,1,1,1,2
8492,2,2,2,2
15349,1,2,2,2
18090,1,1,1,1


### Puzzle 3.1: Probability Calculations

Find our estimation of the probability that your first two coin flips were both heads and your last two coin flips were both tails?

- To do this, create a `df3_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle3` below.

In [17]:
# Create a DataFrame that contains only the subset of observations that were successful:
df3_success = df3[(df3["one"]==1)&(df3["two"]==1)&(df3["three"]==2)&(df3["four"]==2)]



# ...and show a few random rows:
df3_success.sample(5)

Unnamed: 0,four,one,three,two
31785,2,1,2,1
21009,2,1,2,1
4178,2,1,2,1
37259,2,1,2,1
30506,2,1,2,1


In [18]:
# Find the value of P_puzzle3, the probability of success:
P_puzzle3 = len(df3_success)/len(df3)
P_puzzle3



0.0637

In [19]:
## == TEST CASES for Simulation 3 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df3) == 50000), "Make sure your df3 has exactly 50,000 observations"
assert(len(df3_success) < 10000), "Make sure your df3_success only has successes"
assert(P_puzzle3 > 0.03 and P_puzzle3 < 0.125), "Make sure your P_puzzle3 is a probability of success"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 4: A short exam

Suppose you take a short exam with **four quesitons**:

- Two multiple choice questions with five possible responses, AND
- Two true/false questions

Write a simulation of that randomly guess on each question and store the observations in `df4`.  Run the simulation **107,000** times:

In [23]:
data = []
for i in range(107000):
    score = 0
    mult1 = rd.randint(1,5)
    if(mult1 == 2):
        score += 1
    mult2 = rd.randint(1,5)
    if(mult2 == 5):
        score += 1
    tf1 = rd.randint(1,2)
    if(tf1 == 2):
        score += 1
    tf2 = rd.randint(1,2)
    if(tf2 == 1):
        score += 1
    d = {"mult1":mult1, "mult2":mult2, "tf1":tf1, "tf2":tf2, "score":score}
    data.append(d)




df4 = pd.DataFrame(data)



# ...and show a few random rows:
df4.sample(5)

Unnamed: 0,mult1,mult2,score,tf1,tf2
76338,1,2,2,2,1
104968,2,2,3,2,1
93712,5,3,0,1,2
50384,5,2,1,2,2
49715,4,3,1,1,1


### Puzzle 4.1: Probability Calculations

Suppose you have a solution for the exam (the solution itself can be anything, you just need to make sure each question only has one correct answer).  Find an estimation of the probability that a student, who randomly guesses on each question, **earned a 100%** on the exam.

- To do this, create a `df4_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle4` below.

In [24]:
# Create a DataFrame that contains only the subset of observations that were successful:
#Solution: Mult1 = 2, Mult2 = 5, TF1 = 2, TF2 = 1
df4_success = df4[(df4["mult1"]==2)&(df4["mult2"]==5)&(df4["tf1"]==2)&(df4["tf2"]==1)]



# ...and show a few random rows:
df4_success.sample(5)

Unnamed: 0,mult1,mult2,score,tf1,tf2
93456,2,5,4,2,1
18150,2,5,4,2,1
27014,2,5,4,2,1
6152,2,5,4,2,1
84696,2,5,4,2,1


In [26]:
# Find the value of P_puzzle4, the probability of success:
P_puzzle4 = len(df4_success)/len(df4)
P_puzzle4



0.00994392523364486

### Puzzle 4.2: Probability Calculations

Suppose you have a solution for the exam (the solution itself can be anything, you just need to make sure each question only has one correct answer).  Find an estimation of the probability that a student, who randomly guesses on each question, **earned a passing grade** on the exam.  *(Each question is worth the same amount, so a passing grade means you got at least 3 of the four quesitons correct.)*

- To do this, create a `df4_passing` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle4_passing` below.

In [27]:
# Create a DataFrame that contains only the subset of observations that were successful:
df4_passing = df4[df4["score"]>=3]



# ...and show a few random rows:
df4_passing.sample(5)

Unnamed: 0,mult1,mult2,score,tf1,tf2
103489,2,4,3,2,1
13096,2,5,3,1,1
10926,2,3,3,2,1
12665,2,5,4,2,1
2222,2,5,3,2,2


In [28]:
# Find the value of P_puzzle4_passing, the probability of success:
P_puzzle4_passing = len(df4_passing)/len(df4)
P_puzzle4_passing



0.11042990654205607

In [29]:
## == TEST CASES for Simulation 4 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df4) == 107000), "Make sure your df4 has exactly 107,000 observations"
assert(len(df4_success) < (107000 * 0.05)), "Make sure your df4_success only has students scoring 100%"
assert(len(df4_passing) < (107000 * 0.2)), "Make sure your df4_passing has all students passing"

assert(P_puzzle4 > 0 and P_puzzle4 < 0.05), "Make sure your P_puzzle4 is a probability of earning a 100%"
assert(P_puzzle4_passing > 0.05 and P_puzzle4_passing < 0.2), "Make sure your P_puzzle4_passing is a probability of earning a passing grade"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 5: Marbles in a Bag

Suppose you have a bag of 12 marbles.  The bag contains:

- Three red marbles,
- Four blue marbles, and
- Five clear marbles

Write a simulation of that randomly draws a total of two marbles from the bag **with replacement** after drawing each one.  Run the simulation **50,000** times and store your observations in `df5`:

In [35]:
data=[]
for i in range(50000):
    marbles = ["red", "red", "red", "blue", "blue", "blue", "blue", "clear", "clear", "clear", "clear", "clear"]
    rando1 = rd.randint(0,11)
    rando2 = rd.randint(0,10)
    pick1 = marbles[rando1]
    marbles.remove(pick1)
    pick2 = marbles[rando2]
    d = {"Marble1":pick1, "Marble2":pick2}
    data.append(d)

df5 = pd.DataFrame(data)



# ...and show a few random rows:
df5.sample(5)

Unnamed: 0,Marble1,Marble2
27695,blue,clear
30350,blue,clear
16545,red,blue
45225,clear,red
26579,red,blue


### Puzzle 5.1: Probability Calculations

Find an estimation of the probability that you draw exactly one red marble and exactly one blue marble.

- To do this, create a `df5_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle5` below.

In [36]:
# Create a DataFrame that contains only the subset of observations that were successful:
df5_success = df5[((df5["Marble1"]=="red")^(df5["Marble2"]=="red")) & ((df5["Marble1"]=="blue")^(df5["Marble2"]=="blue"))]



# ...and show a few random rows:
df5_success.sample(5)

Unnamed: 0,Marble1,Marble2
29496,red,blue
38925,red,blue
35555,blue,red
31162,red,blue
31861,blue,red


In [37]:
# Find the value of P_puzzle5, the probability of success:
P_puzzle5 = len(df5_success)/len(df5)
P_puzzle5



0.17836

In [38]:
## == TEST CASES for Simulation 5 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df5) == 50000), "Make sure your df5 has exactly 50,000 observations"
assert(len(df5_success) < (107000 * 0.2)), "Make sure your df5_success only has students scoring 100%"

assert(P_puzzle5 > 0.02 and P_puzzle5 < 0.2), "Make sure your P_puzzle5 is a probability of earning a 100%"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Submit Your Work!

Make sure to **Save and Checkpoint** your notebook, exit Jupyter, and submit your work! :)