<img src="./ccsf.png" alt="CCSF Logo" width=200px style="margin:0px -5px">

# Lecture 15: Chance

Associated Textbook Sections: [9.4, 9.5](https://ccsf-math-108.github.io/textbook/chapters/09/4/Monty_Hall_Problem.html)

---

## Overview

* [The Monty Hall Problem](#The-Monty-Hall-Problem)
* [Probability](#Probability)
* [Problem-Solving Method](#Problem-Solving-Method)

---

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

import ipywidgets as widgets
from IPython.display import display, clear_output

---

## The Monty Hall Problem

---

### Summary of the Problem

* There are 3 closed doors.
* One door has a prize and two doors have what is considered not to be a prize.
* The contestant selects a door.
* The host reveals what is behind the remaining door that has not been selected without the prize.
* The contestant has the chance to change doors.
* Are the contestant's chance of winning improved by switching doors?

<a href="https://en.wikipedia.org/wiki/Monty_Hall_problem" title="Wikipedia - Monty Hall Problem"><img src="./Monty_open_door.png" width = 40%><a/>

---

### Demo: Monty Hall

Create a simulation of the Monty Hall game.

* You can optionally learn more about the case analysis for this problem consider reviewing [part 1](https://youtu.be/e7c6_h0Zf6U) and [part 2](https://youtu.be/e3cCUAGHIOI) from the OLI Probability and Statistics course materials.
* To provide you with more insight about simulating the game play, engage with [a simulation produced produced by rossmanchance.com](https://www.rossmanchance.com/applets/2021/montyhall/Monty.html).

In [None]:
def other_goat(a_goat):
    '''other_goat accepts either the string 'first goat' or the string 'second goat' and returns the other goat as a string.'''
    if a_goat == 'first goat':
        return 'second goat'
    elif a_goat == 'second goat':
        return 'first goat'
    else:
        print("a_goat should name 'first goat' or 'second goat'.")

In [None]:
...

In [None]:
def monty_hall():
    '''
    monty_hall runs a simulation of the monty hall problem 
    where the three doors are represented as the strings 'first goat',
    'second goat', and 'car'. This function returns a list with random 
    choice from the contestant, the goat revealed by the host, and the remaining car/goat.
    '''
    doors = make_array('car', 'first goat', 'second goat')
    goats = make_array('first goat', 'second goat')
    contestant_choice = np.random.choice(doors)
    
    if contestant_choice == 'first goat':
        monty_choice = 'second goat'
        remaining_door = 'car'
        
    elif contestant_choice == 'second goat':
        monty_choice = 'first goat'
        remaining_door = 'car'
        
    elif contestant_choice == 'car':
        monty_choice = np.random.choice(goats)
        remaining_door = other_goat(monty_choice)
        
    return [contestant_choice, monty_choice, remaining_door]

In [None]:
...

---

Store the results of several random simulations of the Monty Hall game in a Table.

In [None]:
games = Table(['Guess', 'Revealed', 'Remaining'])

In [None]:
...

In [None]:
games = Table(['Guess', 'Revealed', 'Remaining'])

for _ in np.arange(3000):
    games = games.with_row(monty_hall())
    
games

---

Determine the proportion of times that the player would have won if they switched doors.

In [None]:
def switch_to_win(remaining):
    if remaining == 'car':
        return True
    else:
        return False

In [None]:
games = games.with_column(
    'Switch to Win', 
    games.apply(switch_to_win, 'Remaining')
)
games

In [None]:
switch_to_win_prob = ...
switch_to_win_prob

---

## Probability

---

### Basics

* Lowest value: 0 (or 0%) --- Chance of event that is impossible.
* Highest value: 1 (or 100%) --- Chance of event that is certain.
* Complement: If an event has chance 70%, then the chance that it doesn’t happen is:
    * 100% - 70% = 30%
    * 1 - 0.7 = 0.3


---

### Equally Likely Outcomes

Assuming all outcomes are equally likely, the probability of an event $A$ is:
                
$$P(A) = \frac{\text{number of outcomes that make $A$ happen}}{\text{total number of outcomes}}$$
                             
The terms chance and probability are often interchangeable.

---

### A Question

Set Up:
* There are three cards: ace of hearts, king of diamonds, and queen of spades.
* The cards are shuffled and two cards are drawn at random without replacement.

What is the chance that I get the Queen followed by the King?

In [None]:
prob_QK = ...
prob_QK

---

### An Interpretation

One way to interpret the approximately 16.67% chance of selecting a Queen followed by a King is through a [frequentist perspective](https://en.wikipedia.org/wiki/Frequentist_probability): if the situation were repeated many times, the proportion of Queen-King outcomes would get closer and closer to 0.1667.

1. **Run** the following code cell.  
2. **Click** *Generate New Data* to simulate the scenario.  
3. **Adjust** the *Reveal Proportion* slider to see how the proportion of Queen-King outcomes changes as repetitions increase.  
4. **Regenerate** new data and continue to explore the trend in the proportion.

In [None]:
cards = ['A', 'K', 'Q']
steps = 25

simulation_data = None

def run_simulation(max_reps):
    reps_array = np.arange(1, max_reps + 1, steps)
    successes = np.zeros_like(reps_array, dtype=float)
    
    for i, reps in enumerate(reps_array):
        count = 0
        for _ in range(reps-1):
            first, second = np.random.choice(cards, 2, replace=False)
            if first=='Q' and second=='K':
                count += 1
        successes[i] = count / reps
    return reps_array, successes

def plot_simulation(change=None):
    if simulation_data is None:
        return
    
    reps_array, successes = simulation_data
    cutoff = max(1, int(len(reps_array) * reveal_slider.value))
    
    out_plot.clear_output(wait=True)
    with out_plot:
        plt.figure(figsize=(10,5))
        plt.plot(reps_array[:cutoff], successes[:cutoff], label='Simulated Proportion')
        plt.plot([0, reps_array[-1]], [1/6, 1/6], 'r--', linewidth=2, label='Expected 1/6')
        plt.xlabel('Repetitions')
        plt.ylabel('Queen-King Proportion')
        plt.ylim(0, 0.5)
        plt.title('Simulation of Queen-King Proportion')
        plt.legend()
        plt.show()

def generate_new_data(b):
    global simulation_data
    simulation_data = run_simulation(max_reps_slider.value)
    reveal_slider.value = 0.0
    plot_simulation()

max_reps_slider = widgets.IntSlider(
    value=2500, min=100, max=5000, step=100, 
    description="Max Reps:",
    style={'description_width': '120px'}, 
    layout=widgets.Layout(width='300px')
)

reveal_slider = widgets.FloatSlider(
    value=0.0, min=0.0, max=1.0, step=0.01, 
    description="Reveal Proportion:",
    style={'description_width': '120px'},  
    layout=widgets.Layout(width='300px')
)

generate_button = widgets.Button(description="Generate New Data")
generate_button.on_click(generate_new_data)
reveal_slider.observe(plot_simulation, names='value')
out_plot = widgets.Output()
ui = widgets.VBox([max_reps_slider, reveal_slider, generate_button])
display(ui, out_plot)

---

### Multiplication Rule

* Chance that two events $A$ and $B$ both happen is $P(\text{$A$ happens}) \times P(\text{$B$ happens given that $A$ has happened})$
* The answer is less than or equal to each of the two chances being multiplied
* The more conditions you have to satisfy, the less likely you are to satisfy them all


---

### Another Question

* Set up:
    * There are three cards: ace of hearts, king of diamonds, and queen of spades.
    * The cards are shuffled and two cards are drawn at random without replacement.
* What is the chance that one of the cards I draw is a King and the other is Queen?


---

### Demo: Addition Rule

In [None]:
...

In [None]:
outcomes = make_array('AK', 'AQ', 'KQ', 'KA', 'QA', 'QK')
first_card = make_array('A', 'A', 'K', 'K', 'Q', 'Q')
second_card = make_array('K', 'Q', 'Q', 'A', 'A', 'K')
Table().with_columns('First Card', first_card,
                     'First Card Chance', np.ones(6) / 3,
                     'Second Card', second_card,
                     'Second Card Chance', np.ones(6) / 2,
                     'Outcome', outcomes,
                     'Outcome Chance', np.ones(6) / 6
                    )

---

Notice that there are two rows (possibilities) with the outcome of interest.

In [None]:
prob_KQ_QK = ...
prob_KQ_QK

---

### Addition Rule

* If event $A$ can happen in exactly one of two ways, then $P(A)  =   P(\text{first way})  +  P(\text{second way})$
* The answer is greater than or equal to the chance of each individual way
* Note: There is a more general version of this formula that covers other cases, but you won't use it in this course.

---

### Complement: At Least ...

What is the chance of getting at least one head in a certain number of flips of a fair coin?
* In 3 tosses:
    At least one head means any outcome except $TTT$
    * $P(TTT)  =  (1/2) \times (1/2) \times (1/2)  =  (1/2)^{3}$
    * $P(\text{at least one head}) = 1 - P(TTT) = 1 - (1/2)^{3} = 87.5\% $                                           
* In 10 tosses: $P(\text{at least one head}) = 1 - (1/2)^{10} \approx 99.9\%$


In [None]:
prob_at_least_one_head_in_ten = ...
prob_at_least_one_head_in_ten 

---

## Problem-Solving Method

---

Ask yourself what event must happen on the first trial. 
* If there's a clear answer (e.g. "not a six") whose probability you know, you can most likely use the **multiplication rule**.
* If there's no clear answer (e.g. "could be K or Q, but then the next one would have to be Q or K ..."), list all the **distinct ways** your event could occur and **add up their chances**.
* If the list above is long and complicated, look at the **complement**. If the complement is simpler (e.g. the complement of "at least one" is "none"), you can find its chance and subtract that from 1.

--- 

## Attribution

This content is licensed under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)</a> and derived from the <a href="https://www.data8.org/">Data 8: The Foundations of Data Science</a> offered by the University of California, Berkeley.

<img src="./by-nc-sa.png" width=100px>