## The Monty Hall Problem


Here's a fun and perhaps surprising statistical riddle, and a good way to get some practice writing python functions

In a gameshow, contestants try to guess which of 3 closed doors contain a cash prize (goats are behind the other two doors). Of course, the odds of choosing the correct door are 1 in 3. As a twist, the host of the show occasionally opens a door after a contestant makes his or her choice. This door is always one of the two the contestant did not pick, and is also always one of the goat doors (note that it is always possible to do this, since there are two goat doors). At this point, the contestant has the option of keeping his or her original choice, or swtiching to the other unopened door. The question is: is there any benefit to switching doors? The answer surprises many people who haven't heard the question before.

We can answer the problem by running simulations in Python. We'll do it in several parts.

First, write a function called `simulate_prizedoor`. This function will simulate the location of the prize in many games -- see the detailed specification below:

In [1]:
import pandas as pd
import numpy as np
# package with hypothesis tests
import scipy.stats as st
import matplotlib.pyplot as plt

import random
from numpy.random import seed
from numpy.random import randint
from numpy import mean

In [11]:
nsim = 5

In [22]:
def simulate_prizedoor(nsim):
    prize_door = [random.randint(0, 2) for x in range(nsim)]
    return prize_door

prize = simulate_prizedoor(nsim)

Next, write a function that simulates the contestant's guesses for `nsim` simulations. Call this function `simulate_guess`. The specs:

In [23]:
def guess(nsim):
    random_guess = [random.randint(0, 2) for x in range(nsim)]
    #static_guess = [0] * nsim
    return(random_guess)

guesses = guess(nsim)

Next, write a function, `goat_door`, to simulate randomly revealing one of the goat doors that a contestant didn't pick.

In [29]:
def goat(prize, guesses):
    goat = []
    for i in range(nsim):
        doors = [0,1,2]
        if prize[i] == guesses[i]:
            doors.remove(prize[i])
        else:
            doors.remove(prize[i])
            doors.remove(guesses[i])
        goat.append(doors[0])
    return(goat)

goats = goat(prize, guesses)

Write a function, `switch_guess`, that represents the strategy of always switching a guess after the goat door is opened.

In [34]:
def switch(guesses, goats):
    switched = []
    for i in range(nsim):
        doors = [0,1,2]
        doors.remove(goats[i])
        doors.remove(guesses[i])
        switched.append(doors[0])
    return(switched)
switched= switch(guesses, goats)

Last function: write a `win_percentage` function that takes an array of `guesses` and `prizedoors`, and returns the percent of correct guesses

In [36]:
def win_percentage(guesses, prizedoors):
    correct = 0
    for i in range(nsim):
        if guesses[i] == prizedoors[i]:
            correct += 1
    return(correct/nsim)

win_percent = win_percentage(switched, prize)

0.6

Now, put it together. Simulate 10000 games where contestant keeps his original guess, and 10000 games where the contestant switches his door after a  goat door is revealed. Compute the percentage of time the contestant wins under either strategy. Is one strategy better than the other?

In [40]:
nsim = 10000
# dont switch
prize = simulate_prizedoor(nsim)
guesses = guess(nsim)
goats = goat(prize, guesses)
win_percentage(guesses, prize)

0.3391

In [41]:
# switch
prize = simulate_prizedoor(nsim)
guesses = guess(nsim)
goats = goat(prize, guesses)
switched = switch(guesses, goats)
win_percentage(switched, prize)

0.6663

In [None]:
# for debugging
#print('prize:', prize[:10])
#print('guesses:', guesses[:10])
#print('goats:', goats[:10])
#print('switched:', switched[:10])

Many people find this answer counter-intuitive (famously, PhD mathematicians have incorrectly claimed the result must be wrong. Clearly, none of them knew Python). 

One of the best ways to build intuition about why opening a Goat door affects the odds is to re-run the experiment with 100 doors and one prize. If the game show host opens 98 goat doors after you make your initial selection, would you want to keep your first pick or switch? Can you generalize your simulation code to handle the case of `n` doors?