# lab_simulation : Into the Matrix

As you learned in lecture, **simulation** is an extremely powerful tool to estimate the probability of an event occurring by simulating many observations an event and determining the successful observations.

This lab will have you build increasingly interesting simulations and find the results.

## Simulation 1: Pre-Quiz Dice Rolls

The pre-quiz question that was asked by Karle and Wade in lecture was as follows:

> You roll two different fair six-sided dice at the same time.  One die is colored blue, one is colored red.  What is the probability that the blue die lands on 4 or the red die lands on 2?

Simulate the above problem 1,000 times and storing your observations of the value of the red die and blue die into `df1`.

In [10]:
# Step 0: Import any libraries you need:
import pandas as pd
import random


In [11]:
# Step 1: Always start with an empty list to store our simulation data:
data = []

# Step 2: Write the simulation inside of a for-loop
for i in range(1000):
    blue = random.randint(1,6)
    red = random.randint(1,6)
    d = { "blue": blue, "red": red}
    data.append(d)
    
# Step 3: Store the simulation data into a DataFrame
df1 = pd.DataFrame(data)



# ...and show a few random rows:
df1.sample(5)

Unnamed: 0,blue,red
399,6,6
803,5,6
729,4,5
469,2,6
494,6,6


### Puzzle 1.1: Probability Calculations

Find our estimation of the probability that the blue die lands on 4 or the red die lands on 2?

- To do this, create a `df1_success` DataFrame with only the rows that were successful.
- Use `df1` and `df1_success` to find the probability of success using your simulation and store that value in `P_puzzle1` below.

In [12]:
# Create a DataFrame that contains only the subset of observations that were successful:
df1_success = df1[((df1["blue"] != 4) & df1["red"] != 2) & (df1["blue"] == 4) | (df1["red"] == 2)]


# ...and show a few random rows:
df1_success.sample(5)

Unnamed: 0,blue,red
214,1,2
298,4,2
928,4,4
50,4,5
351,4,5


In [13]:
# Find the value of P_puzzle1, the probability of success:
P_puzzle1 = len(df1_success) / len(df1)
P_puzzle1



0.291

### Puzzle 1.2: Finding the Exact Answer

This simulation simulated a pretty easy example that you can find an exact answer to!  Using the probability learned from lecture, calculate `P_puzzle1_exact`, the **exact** probability of the blue die landing on a 4 **or** the red die landing on a 2.

In [23]:
P_puzzle1_exact = 11/36 #(6/36 + 6/36) - 1/36 
P_puzzle1_exact



0.3055555555555556

### Puzzle 1.3: Finding the Error

The **error** in a simulation is the difference between the exact value and value found from the simulation.  Subtract the estimated value (`P_puzzle1`) from the exact value (`P_puzzle1_exact`) to find the total error and store it in `puzzle1_error`.

In [24]:
puzzle1_error = P_puzzle1 - P_puzzle1_exact
puzzle1_error



-0.0145555555555556

In [25]:
## == TEST CASES for Simulation 1 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df1) == 1000), "Make sure your df1 has exactly 1,000 observations"
assert(len(df1_success) < 1000), "Make sure your df1_success only has successes"
assert(P_puzzle1 > 0 and P_puzzle1 < 1), "Make sure your P_puzzle1 is a probability of success"
assert(P_puzzle1_exact == (11/36)), "Make sure your P_puzzle1_exact contains the exact probability of success"
assert(puzzle1_error < 1), "Make sure to calculate the error by subtraction"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")
print()
print(f"Simulated Probability: P(blue == 4 | red == 2) = {round(100 * P_puzzle1, 2)}%")
print(f"    Exact Probability:                         = {round(100 * P_puzzle1_exact, 2)}%")

🎉 All tests passed! 🎉

Simulated Probability: P(blue == 4 | red == 2) = 29.1%
    Exact Probability:                         = 30.56%


## Simulation 2: Rolling Three Die

Let's add another die into the mix.  Suppose we roll three dice: a **white**, a **red**, and a **blue** die.

Write a 1,000-run simulation of that event and store the observations in `df2`:

In [26]:
# Step 1: Always start with an empty list to store our simulation data:
data2 = []

# Step 2: Write the simulation inside of a for-loop
for i in range(1000):
    blue = random.randint(1,6)
    red = random.randint(1,6)
    white = random.randint(1,6)
    d = { "blue": blue, "red": red, "white": white}
    data2.append(d)
    
# Step 3: Store the simulation data into a DataFrame
df2 = pd.DataFrame(data2)



# ...and show a few random rows:
df2.sample(5)

Unnamed: 0,blue,red,white
93,4,4,3
261,4,3,1
481,6,3,3
855,1,3,5
812,3,2,4


### Puzzle 2.1: Probability Calculations

Find our estimation of the probability that the **sum of all three die** is equal to exactly 9.

- To do this, create a `df2_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle2` below.

In [27]:
# Create a DataFrame that contains only the subset of observations that were successful:
df2_success = df2[df2["blue"] + df2["red"] + df2["white"] == 9]



# ...and show a few random rows:
df2_success.sample(5)

Unnamed: 0,blue,red,white
681,1,5,3
658,4,3,2
266,6,1,2
291,2,4,3
495,5,3,1


In [28]:
# Find the value of P_puzzle2, the probability of success:
P_puzzle2 = len(df2_success) / len(df2)
P_puzzle2



0.114

In [29]:
## == TEST CASES for Simulation 2 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df2) == 1000), "Make sure your df2 has exactly 1,000 observations"
assert(len(df2_success) < 1000), "Make sure your df2_success only has successes"
assert(P_puzzle2 > 0 and P_puzzle2 < 1), "Make sure your P_puzzle2 is a probability of success"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 3: Fliping Four Coins

Supoose we flip **four coins**, one coin at a time, one after another.  Each coin has two sides, "Heads" and "Tails".

Write a simulation of that event and store the observations in `df3` and run the simulation **50,000** times:

In [30]:
# Refer to the previous simulations if needed, but write the code yourself (don't just copy/paste and edit it)L
data3 = []

for i in range(50000):
    coin1 = random.randint(0,1) #1 is heads
    coin2 = random.randint(0,1) 
    coin3 = random.randint(0,1) 
    coin4 = random.randint(0,1) 
    d = { "Coin 1": coin1, "Coin 2": coin2, "Coin 3": coin3, "Coin 4": coin4,}
    data3.append(d)

df3 = pd.DataFrame(data3)



# ...and show a few random rows:
df3.sample(5)

Unnamed: 0,Coin 1,Coin 2,Coin 3,Coin 4
37152,1,1,0,0
36810,1,1,0,0
41144,0,1,0,0
4348,1,1,0,0
10140,1,1,0,1


### Puzzle 3.1: Probability Calculations

Find our estimation of the probability that your first two coin flips were both heads and your last two coin flips were both tails?

- To do this, create a `df3_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle3` below.

In [31]:
# Create a DataFrame that contains only the subset of observations that were successful:
df3_success = df3[ (df3["Coin 1"] == 1) & (df3["Coin 2"] == 1) & (df3["Coin 3"] == 0) & (df3["Coin 4"] == 0)]



# ...and show a few random rows:
df3_success.sample(5)

Unnamed: 0,Coin 1,Coin 2,Coin 3,Coin 4
46387,1,1,0,0
764,1,1,0,0
36875,1,1,0,0
1040,1,1,0,0
20215,1,1,0,0


In [32]:
# Find the value of P_puzzle3, the probability of success:
P_puzzle3 = len(df3_success) / len(df3)
P_puzzle3



0.06288

In [33]:
## == TEST CASES for Simulation 3 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df3) == 50000), "Make sure your df3 has exactly 50,000 observations"
assert(len(df3_success) < 10000), "Make sure your df3_success only has successes"
assert(P_puzzle3 > 0.03 and P_puzzle3 < 0.125), "Make sure your P_puzzle3 is a probability of success"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 4: A short exam

Suppose you take a short exam with **four quesitons**:

- Two multiple choice questions with five possible responses, AND
- Two true/false questions

Write a simulation of that randomly guess on each question and store the observations in `df4`.  Run the simulation **107,000** times:

In [34]:
data4 = []

for i in range(107000):
    mc1 = random.randint(0,4) #1 is success
    mc2 = random.randint(0,4) 
    tf1 = random.randint(0,1) 
    tf2 = random.randint(0,1) 
    d = { "MC 1": mc1, "MC 2": mc2, "TF 1": tf1, "TF 2": tf2,}
    data4.append(d)




df4 = pd.DataFrame(data4)



# ...and show a few random rows:
df4.sample(5)

Unnamed: 0,MC 1,MC 2,TF 1,TF 2
79089,1,3,1,0
69781,2,2,1,0
55209,2,4,0,0
81899,1,0,1,0
12479,4,4,0,0


### Puzzle 4.1: Probability Calculations

Suppose you have a solution for the exam (the solution itself can be anything, you just need to make sure each question only has one correct answer).  Find an estimation of the probability that a student, who randomly guesses on each question, **earned a 100%** on the exam.

- To do this, create a `df4_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle4` below.

In [35]:
# Create a DataFrame that contains only the subset of observations that were successful:
df4_success = df4[(df4["MC 1"] == 1) & (df4["MC 2"] == 1) & (df4["TF 1"] == 1) & (df4["TF 2"] == 1)]



# ...and show a few random rows:
df4_success.sample(5)

Unnamed: 0,MC 1,MC 2,TF 1,TF 2
19079,1,1,1,1
52531,1,1,1,1
61191,1,1,1,1
34217,1,1,1,1
79198,1,1,1,1


In [36]:
# Find the value of P_puzzle4, the probability of success:
P_puzzle4 = len(df4_success) / len(df4)
P_puzzle4



0.009757009345794392

### Puzzle 4.2: Probability Calculations

Suppose you have a solution for the exam (the solution itself can be anything, you just need to make sure each question only has one correct answer).  Find an estimation of the probability that a student, who randomly guesses on each question, **earned a passing grade** on the exam.  *(Each question is worth the same amount, so a passing grade means you got at least 3 of the four quesitons correct.)*

- To do this, create a `df4_passing` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle4_passing` below.

In [37]:
# Create a DataFrame that contains only the subset of observations that were successful:
df4_passing = df4[((df4["MC 1"] == 1) & (df4["MC 2"] == 1) & (df4["TF 1"] == 1) & (df4["TF 2"] == 1)) |
                 ((df4["MC 1"] != 1) & (df4["MC 2"] == 1) & (df4["TF 1"] == 1) & (df4["TF 2"] == 1)) |
                 ((df4["MC 1"] == 1) & (df4["MC 2"] != 1) & (df4["TF 1"] == 1) & (df4["TF 2"] == 1)) |
                 ((df4["MC 1"] == 1) & (df4["MC 2"] == 1) & (df4["TF 1"] == 0) & (df4["TF 2"] == 1)) |
                 ((df4["MC 1"] == 1) & (df4["MC 2"] == 1) & (df4["TF 1"] == 1) & (df4["TF 2"] == 0))]  

#all 4, all 3 - first, all 3 - second, all 3 - third, all 3 - fourth



# ...and show a few random rows:
df4_passing.sample(15)

Unnamed: 0,MC 1,MC 2,TF 1,TF 2
10504,3,1,1,1
103151,1,1,0,1
67469,1,3,1,1
30756,1,1,1,0
106580,4,1,1,1
79394,0,1,1,1
4098,1,4,1,1
59821,0,1,1,1
92927,1,2,1,1
99025,1,0,1,1


In [38]:
# Find the value of P_puzzle4_passing, the probability of success:
P_puzzle4_passing = len(df4_passing) / len(df4)
P_puzzle4_passing



0.11061682242990654

In [39]:
## == TEST CASES for Simulation 4 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df4) == 107000), "Make sure your df4 has exactly 107,000 observations"
assert(len(df4_success) < (107000 * 0.05)), "Make sure your df4_success only has students scoring 100%"
assert(len(df4_passing) < (107000 * 0.2)), "Make sure your df4_passing has all students passing"

assert(P_puzzle4 > 0 and P_puzzle4 < 0.05), "Make sure your P_puzzle4 is a probability of earning a 100%"
assert(P_puzzle4_passing > 0.05 and P_puzzle4_passing < 0.2), "Make sure your P_puzzle4_passing is a probability of earning a passing grade"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 5: Marbles in a Bag

Suppose you have a bag of 12 marbles.  The bag contains:

- Three red marbles,
- Four blue marbles, and
- Five clear marbles

Write a simulation of that randomly draws a total of two marbles from the bag **with replacement** after drawing each one.  Run the simulation **50,000** times and store your observations in `df5`:

In [40]:
data5 = []

for i in range(50000):
    marble1 = random.randint(1,12) #1-3 is red, 4-7 is blue, 8-12 red
    marble2 = random.randint(1,12) 
    d = { "Marble 1": marble1, "Marble 2": marble2}
    data5.append(d)



df5 = pd.DataFrame(data5)



# ...and show a few random rows:
df5.sample(5)

Unnamed: 0,Marble 1,Marble 2
739,5,5
10239,9,7
7565,5,11
15514,10,3
45169,3,11


### Puzzle 5.1: Probability Calculations

Find an estimation of the probability that you draw exactly one red marble and exactly one blue marble.

- To do this, create a `df5_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle5` below.

In [41]:
# Create a DataFrame that contains only the subset of observations that were successful:
df5_success = df5[ ((df5["Marble 1"] <= 3) & (df5["Marble 2"] >=4) & (df5["Marble 2"] <=7)) |
                   ((df5["Marble 2"] <= 3) & (df5["Marble 1"] >=4) & (df5["Marble 1"] <=7))]

#marble 1 <= 3 & marble 2 >=4 & marble 2 <= 7 
#marble 1 >=4 & marble 1 <= 7 & marble 2 <=3
#df1[((df1["blue"] != 4) & df1["red"] != 2) & (df1["blue"] == 4) | (df1["red"] == 2)]

# ...and show a few random rows:
df5_success.sample(5)

Unnamed: 0,Marble 1,Marble 2
23469,3,4
48215,5,2
22702,4,1
25309,5,2
38573,3,6


In [42]:
# Find the value of P_puzzle5, the probability of success:
P_puzzle5 = len(df5_success) / len(df5)
P_puzzle5

#bad copy paste on the test case comments, wade. 


0.1656

In [43]:
## == TEST CASES for Simulation 5 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df5) == 50000), "Make sure your df5 has exactly 50,000 observations"
assert(len(df5_success) < (107000 * 0.2)), "Make sure your df5_success only has students scoring 100%"

assert(P_puzzle5 > 0.02 and P_puzzle5 < 0.2), "Make sure your P_puzzle5 is a probability of earning a 100%"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Submit Your Work!

Make sure to **Save and Checkpoint** your notebook, exit Jupyter, and submit your work! :)