## The Monty Hall Problem


Here's a fun and perhaps a surprising statistical riddle and an excellent way to get some practice writing python functions.

In a game show, contestants try to guess which of the three closed doors contain a cash prize (goats are behind the other two doors). Of course, the odds of choosing the correct door are 1 in 3. 

However, there is a twist! The host of the show, Mr. Monty Hall, always opens a door after a contestant makes their choice. This door is always one of the two that the contestant did not pick. Behind the opened door, there is a goat. Leaving only two doors unopened, with one of them being the original door that the contestant picked. 

Afterwards, the host asks the contestant if they would like to switch their door of choice to the other unopened door. They have the option to keep their original choice or switch to the other unopened door.

The question is, should the contestant switch to the other door? Is there any statistical benefit? 

------------

We can answer the problem by running simulations in Python. We'll do it in several parts.

First, write a function called `simulate_prizedoor`. This function will simulate the location of the prize in many games -- see the detailed specification below:

In [26]:
# the first function requires that the user specify how many games to play,
# hence simulations
import numpy as np
import pandas as pd

In [27]:
def simulate_prizedoor(nsim):
    """
    Function
    --------
    Generate a random array of 0s, 1s, and 2s, representing
    hiding a prize between door 0, door 1, and door 2

    Parameters
    ----------
    nsim : int
        The number of simulations to run

    Returns
    -------
    answers : array
        Random array of 0s, 1s, and 2s

    Example
    -------
    print simulate_prizedoor(3)
    array([0, 0, 2])
    """

    # variables
    answers = []

    #compute here
    for _ in range(nsim):
        answers.append(np.random.randint(0, 3))
    return answers

In [28]:
def guess_doors(nguess):
    """
    Function
    --------
    Generate a random array of 0s, 1s, and 2s, representing
    a contestant's guesses between door 0, door 1, and door 2

    Parameters
    ----------
    nguess : int
        The number of guesses to run

    Returns
    -------
    guesses : array
        Random array of 0s, 1s, and 2s

    Example
    -------
    print guess_doors(3)
    array([0, 0, 2])
    """

    # variables
    guesses = []

    #compute here
    for _ in range(nguess):
        guesses.append(np.random.randint(0, 3))
    return guesses

In [29]:
number_of_simulations = 10

print(simulate_prizedoor(number_of_simulations))
print(guess_doors(number_of_simulations))

[1, 1, 2, 2, 2, 0, 1, 1, 2, 2]
[0, 1, 1, 2, 1, 2, 2, 2, 0, 2]


Next, write a function, `goat_door`, to simulate randomly revealing one of the goat doors that a contestant didn't pick.

In [30]:
def goat_door(prizedoors, guesses):
    """
    Function
    --------
    Simulate the opening of a "goat door" that doesn't contain the prize,
    and is different from the contestants guess

    Parameters
    ----------
    prizedoors : array
        The door that the prize is behind in each simulation
    guesses : array
        THe door that the contestant guessed in each simulation

    Returns
    -------
    goats : array
        The goat door that is opened for each simulation. Each item is 0, 1, or 2, and is different
        from both prizedoors and guesses

    Examples
    --------
    >>> print goat_door(np.array([0, 1, 2]), np.array([1, 1, 1]))
    >>> array([2, 2, 0])
    """

    # variables
    goat = []

    #your code here
    for prize, guess in zip(prizedoors, guesses):
        # enumerate the doors
        doors = [0, 1, 2]

        # case when the guess is incorrect
        # revealed door must be different from the prize and guess
        if prize != guess:
            doors.remove(prize)
            doors.remove(guess)

        # case when the guess is correct
        # revealed door doesn't matter
        else:
            doors.remove(prize)
            del doors[np.random.randint(0, 1)] # randomly choose one of the remaining doors
        goat.extend(doors)

    return goat

In [31]:
goat_door(
    simulate_prizedoor(number_of_simulations),
    guess_doors(number_of_simulations)
)

[0, 1, 1, 2, 1, 1, 1, 2, 2, 1]

Write a function, `switch_guess`, that represents the strategy of always switching a guess after the goat door is opened.

In [32]:
def switch_guess(guesses, goatdoors):
     """
     Function
     --------
     The strategy that always switches a guess after the goat door is opened

     Parameters
     ----------
     guesses : array
          Array of original guesses, for each simulation
     goatdoors : array
          Array of revealed goat doors for each simulation

     Returns
     -------
     The new door after switching. Should be different from both guesses and goatdoors

     Examples
     --------
     >>> print switch_guess(np.array([0, 1, 2]), np.array([1, 2, 1]))
     >>> array([2, 0, 0])
     """
     # variables
     new_guess = []

     #your code here
     for guess, goat in zip(guesses, goatdoors):
          doors = [0, 1, 2]
          doors.remove(guess)
          doors.remove(goat)
          new_guess.extend(doors)

     return new_guess

In [33]:
prizes = simulate_prizedoor(number_of_simulations)
guesses = guess_doors(number_of_simulations)

goats = goat_door(prizes, guesses)

print("Guesses: \n", guesses)
print("Goats: \n", goats)
print("New Doors: \n", switch_guess(guesses, goats))

Guesses: 
 [0, 0, 1, 0, 0, 1, 2, 1, 2, 0]
Goats: 
 [1, 2, 2, 2, 2, 2, 1, 2, 1, 2]
New Doors: 
 [2, 1, 0, 1, 1, 0, 0, 0, 0, 1]


The Last function: write a `win_percentage` function that takes an array of `guesses` and `prizedoors`, and returns the percent of correct guesses

In [34]:
def win_percentage(guesses, prizedoors):
    """
    Function
    --------

    Calculate the percent of times that a simulation of guesses is correct

    Parameters
    -----------
    guesses : array
        Guesses for each simulation
    prizedoors : array
        Location of prize for each simulation

    Returns
    --------
    percentage : number between 0 and 100
        The win percentage

    Examples
    ---------
    print win_percentage(np.array([0, 1, 2]), np.array([0, 0, 0]))
    33.333
    """
    # variables
    wins = 0
    losses = 0

    #your code here
    for guess, prize in zip(guesses, prizedoors):
        if guess==prize:
            wins += 1

    return wins/len(guesses)

----------
Now, put it all together. Simulate 10000 games where the contestant keeps their original guess, and 10000 games where the contestant switches their door after the  goat door is revealed. 

Compute the percentage of time the contestant wins under either strategy. Is one strategy better than the other?

In [35]:
#your code here
number_of_simulations = 1000

# original state
prizes = simulate_prizedoor(number_of_simulations)
guesses = guess_doors(number_of_simulations)

print("Win percentage for original guess: ", win_percentage(guesses, prizes), "%")

# door reveal
goats = goat_door(prizes, guesses)

# switch guess
switch = switch_guess(guesses, goats)
print("Win percentage for new door: ", win_percentage(switch, prizes), "%")

Win percentage for original guess:  0.355 %
Win percentage for new door:  0.645 %


-------------
Many people will find this answer counter-intuitive (famously, PhD mathematicians have incorrectly claimed the result must be wrong. Clearly, none of them knew Python). 

One of the best ways to build intuition about why opening a goat door affects the odds is to re-run the experiment with 100 doors and one prize. If the game show host opens 98 goat doors after you make your initial selection, would you want to keep your first pick or switch? Can you generalize your simulation code to handle the case with `n` doors?