In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab10.ipynb")

# Lab 10 - Randomness and Simulation

In [1]:
# Just run this cell to load in the relevant dependencies
from IPython.display import display, HTML
from datascience import *
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

## Part 1: Randomness

As you've seen in lecture, computers give the illusion of randomness by using **pseudo-random** processes. We, as data scientists, can *control* the randomness of our computer by using a *seed*. 

Run the following cell **multiple times** to set a random seed value of 42 and then call the `np.random.randint` function. You'll quickly notice that the cell will always output the same integer every time.

In [2]:
np.random.seed(42)
np.random.randint(0, 100)

**Question 1:** Play around with the seed value assigned to the `my_seed_val` variable. What happens to the resulting call to `randint` when you change the seed value? What stays the same?

In [3]:
my_seed_val = 500
np.random.seed(my_seed_val)
np.random.randint(0, 100)

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q1
points: 0
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### `np.random.randint`

The `randint` function takes in three arguments: a **start**, **stop**, and an optional **size**. In the cells above, we generated a single number in the range $[0,99)$. For more detail on the `randint` function, visit the Lecture 25 Slides.

**Question 2**: In the following cell, generate an array of 100 integers between -500 and 500, inclusive on **both ends**. Assign the resulting array to `hundo_ints`.

<!--
BEGIN QUESTION
name: q2
points: 0
manual: true
-->

In [4]:
# Seed set for testing purposes -- don't touch!
np.random.seed(42)

hundo_ints = np.random.randint(-500,501, 100)
hundo_ints

In [None]:
grader.check("q2")

<!-- END QUESTION -->

**Question 3 (Up or Down)**: In the world of tennis, it's commonplace to start a friendly match with the question: "*Up or down?*". One participant will spin their tennis racket on its head and let go -- whichever way the tennis racket is facing (i.e. up or down) determines who will serve first. The following is considered **up**.
<figure>
    <img src="images/up.jpeg" alt='missing' width=150 height=150/>
</figure>

**Task:** Using `np.random.choice`, create an array called `ups_and_downs` containing the results of 1,000 spins of the tennis racket, assuming that *up* and *down* are both equally likely outcomes.

<!--
BEGIN QUESTION
name: q3
points: 0
-->

In [9]:
# Seed set for testing purposes -- don't touch!
np.random.seed(42)

choices = ...
ups_and_downs = ...
ups_and_downs

In [None]:
grader.check("q3")

## Part 2: The Monty Hall Problem
<br>

<img src="images/monty_hall.png" width=500, height=400></img>

You may already be familiar with the classic conundrum of the *Monty Hall Problem*, but here's a brief explanation of it. From [Wolfram](https://mathworld.wolfram.com/MontyHallProblem.html):
> Assume that a room is equipped with **three doors**. Behind two are goats, and behind the third is a **shiny new car**. You are asked to pick a door, and will win whatever is behind it. Let's say you pick door 1. Before the door is opened, however, someone who knows what's behind the doors (Monty Hall) opens one of the other two doors, revealing a goat, and asks you if you wish to change your selection to the third door (i.e., the door which neither you picked nor he opened). The Monty Hall problem is deciding **whether or not you should switch doors**.

In this section of the lab, you'll run a **simulation** to answer the question raised above.

### Playing the Game

Before we dive into the simulation itself, let's get a feel for the actual game as described above. Below, we've defined the `monty_hall` function which simulates an actual game of the **Monty Hall Problem**. 

In [14]:
def monty_hall():
    choice = int(input("What door would you like to choose?\n"))
    
    new_order_doors = np.random.permutation(["goat", "goat", "car"])
    picked = new_order_doors.item(choice-1)
    new_options = ["goat", "car"]
    print("Monty reveals one of the goats!")
    second_choice = input("Would you like to 'stay' or 'switch'?\n")
    
    if second_choice == "stay":
        if picked == "car":
            display(HTML("<h1>🎉 You Win! 🎉</h1>"))
        else:
            display(HTML("<h1>👎 You Lose. 👎</h1>"))
        return picked
    elif second_choice == "switch":
        new_options.remove(picked)
        if new_options[0] == "car":
            display(HTML("<h1>🎉 You Win! 🎉</h1>"))
        else:
            display(HTML("<h1>👎 You Lose. 👎</h1>"))
        return new_options[0]

In [15]:
# Just run this cell and follow the prompts
monty_hall()

## What's the Optimal Strategy?

In order to run a simulation, we'll provide you with several helpful variables. The first is `goats`, an array of the two goats. The second is a function, `other_goat`, which, given a goat, it will return a string representing the **other goat**. This is important to distinguish the two goats from one another.

In [16]:
goats = make_array('first goat', 'second goat')

In [17]:
def other_goat(x):
    if x == 'first goat':
        return 'second goat'
    elif x == 'second goat':
        return 'first goat'

Run the following cells to see the `other_goat` function in action:

In [18]:
other_goat('first goat')

In [19]:
other_goat('second goat')

### All Three Items

Let's store both goats and the car into the new `hidden_behind_doors`.

In [20]:
hidden_behind_doors = np.append(goats, 'car')
hidden_behind_doors

### The Investigation

Now it's time for the actual game. Once again, we'll define another helpful function called `run_monty_hall_simulation` which will simulate **one game** in a slightly different way than above. The function returns an **array of three items**: the contestant's guess, the door revealed by Monty Hall, and the item that is behind the remaining door.

In [21]:
def run_monty_hall_simulation():
    """Return 
    [contestant's guess, what Monty reveals, what remains behind the other door]"""
    
    contestant_guess = np.random.choice(hidden_behind_doors)
    
    if contestant_guess == 'first goat':
        return make_array(contestant_guess, 'second goat', 'car')
    
    if contestant_guess == 'second goat':
        return make_array(contestant_guess, 'first goat', 'car')
    
    if contestant_guess == 'car':
        revealed = np.random.choice(goats)
        return make_array(contestant_guess, revealed, other_goat(revealed))

Run the following cell to simulate one game (you don't need to pass any arguments into the `monty_hall_game` function):

In [22]:
run_monty_hall_simulation()

We can interpret these results as: the original choice of door had the `"first goat"` behind it, Monty revealed the `"second goat"`, and the remaining door had the `"car"` behind it.

### Collecting the Data

In order to accurately answer the question of *switching*, the best course of action is to simulate *many, many games* and see the results in aggregate. Then, we'll be able analyze the results to make our decision. The following cell will run 10,000 games and store the results in a new table called `games`.

In [23]:
games = Table(make_array('Guess', 'Revealed', 'Remaining'))

for i in np.arange(10000):
    games.append(run_monty_hall_simulation())

games.show(5)

### What's in the Remaining Door?

Let's now aggregate the results by our **original choice of door** and the item in the **remaining door**. You'll notice that we originally chose each door around the same number of times (~1/3 for each door), but the car is in the remaining door ~2/3rds of the time.

In [24]:
original_choice = games.group('Guess')
original_choice

In [25]:
remaining_door = games.group('Remaining')
remaining_door

Let's combine our results into one, cleanly-formatted table:

In [26]:
joined = original_choice.join('Guess', remaining_door, 'Remaining')
combined = joined.relabeled(0, 'Item').relabeled(1, 'Original Door').relabeled(2, 'Remaining Door')
combined

<!-- BEGIN QUESTION -->

### Visualizing Our Results

While staring at the numbers in the table may be helpful, the most effective way to understand the results of our simulation is via a visualization.

**Question 4:** In the following cell, produce a horizontal bar chart which plots each of **car**, **first goat**, and **second goat** as categories, each with two bars: one representing the count of *Original Door* from the simulation and one representing the count of *Remaining Door* from the simulation.

<!--
BEGIN QUESTION
name: q3
points: 0
manual: true
-->

In [27]:
combined.barh("Item")

<!-- END QUESTION -->



Assuming you've plotted the bar chart above correctly, you should see clear evidence that **switching doors** is the better move. Switching doors gives you nearly a **66% chance** that you'll win the car. 

## Done! 😇

That's it! There's nowhere for you to submit this, as labs are not assignments. However, please ask any questions you have with this notebook in lab or on Ed. 

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)