# Chapter 2: Conditional probability
 
This Jupyter notebook is the Python equivalent of the R code in section 2.10 R, pp. 80 - 83, [Introduction to Probability, Second Edition](https://www.crcpress.com/Introduction-to-Probability-Second-Edition/Blitzstein-Hwang/p/book/9781138369917), Blitzstein & Hwang.

In [2]:
import numpy as np

## Simulating the frequentist interpretation 

Recall that the frequentist interpretation of conditional probability based on a large number `n` of repetitions of an experiment is $P(A|B) ≈ n_{AB}/n_{B}$, where $n_{AB}$ is the number of times that $A \cap B$ occurs and $n_{B}$ is the number of times that $B$ occurs. Let's try this out by simulation, and verify the results of Example 2.2.5. So let's use [`numpy.random.choice`](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html) to simulate `n` families, each with two children.

In [3]:
np.random.seed(42)

n = 10**5

child1 = np.random.choice([1,2],n, replace = True)
child2 = np.random.choice([1,2],n, replace = True)

print('child1:\n{}\n'.format(child1))
print('child2:\n{}\n'.format(child2))

child1:
[1 2 1 ... 1 2 2]

child2:
[1 2 2 ... 2 1 2]



Here `child1` is a NumPy `array` of length `n`, where each element is a 1 or a 2. Letting 1 stand for "girl" and 2 stand for "boy", this `array` represents the gender of the elder child in each of the `n` families. Similarly, `child2` represents the gender of the younger child in each family. 

Alternatively, we could have used

In [4]:
np.random.choice(['girl','boy'],n, replace = True)

array(['boy', 'boy', 'boy', ..., 'boy', 'boy', 'boy'], dtype='<U4')

but it is more convenient working with numerical values.

Let $A$ be the event that both children are girls and $B$ the event that the elder is a girl. Following the frequentist interpretation, we count the number of repetitions where $B$ occurred and name it `n_b`, and we also count the number of repetitions where $A \cap B$ occurred and name it `n_ab`. Finally, we divide `n_ab` by ` n_b` to approximate $P(A|B)$.

In [5]:
n_b = np.sum(child1==1)
n_ab = np.sum((child1==1) & (child2==1))
print('P(both girls| elder is a girl) = {:0.2F} '.format(n_ab/n_b))

P(both girls| elder is a girl) = 0.50 


The ampersand `&` is an elementwise $AND$, so `n_ab` is the number of families where both the first child and the second child are girls. When we ran this code, we got 0.50, confirming our answer $P(\text{both girls | elder is a girl}) = 1/2$. 

Now let $A$ be the event that both children are girls and $B$ the event that at least one of the children is a girl. Then $A \cap B$ is the same, but `n_b` needs to count the number of families where at least one child is a girl. This is accomplished with the elementwise $OR$ operator `|` (this is not a conditioning bar; it is an inclusive $OR$, returning `True` if at least one element is `True`).

In [6]:
n_b = np.sum((child1==1) | (child2==1))
n_ab = np.sum((child1==1) & (child2==1))
print('P(both girls | at least one is a girl) = {:0.5F}'.format(n_ab/n_b))

P(both girls | at least one is a girl) = 0.33508


For us, the result was 0.33, confirming that $P(\text{both girls | at least one girl}) = 1/3$.

## Monty Hall simulation

Many long, bitter debates about the Monty Hall problem could have been averted by trying it out with a simulation. To study how well the never-switch strategy performs, let's generate 10<sup>5</sup> runs of the Monty Hall game. To simplify notation, assume the contestant always chooses door 1. Then we can generate a vector specifying which door has the car for each repetition:

In [8]:
np.random.seed(42)

n = 10**5
cardoor = np.random.choice([1,2,3],n, replace = True)
print('the probability of never changing door is: {:.3F}'.format(np.sum(cardoor == 1)/n))

the probability of never changing door is: 0.334


At this point we could generate the vector specifying which doors Monty opens, but that's unnecessary since the never-switch strategy succeeds if and only if door 1 has the car! So the fraction of times when the never-switch strategy succeeds is `numpy.sum(cardoor==1)/n`, which was 0.334 in our simulation. This is very close to 1/3.

What if we want to play the Monty Hall game interactively? We can do this by programming a Python class that would let us play interactively or let us run a simulation across many trials.

In [9]:
class Monty():
    
    def __init__(self):
        """ Object creation function. """
        self.state = 0
        self.doors = np.array([1,2,3])
        self.prepare_game()
        
    def get_success_rate(self):
        """ Returns the success rate in this series of plays: num. wins/ num. games."""
        if self.num_plays > 0:
            return 1.0 * self.num_wins / self.num_plays
        else:
            return 0.0
        
    def prepare_game(self):
        """ Prepare initial values for game play, and randomly choose the door with the car"""
        self.num_plays = 0
        self.num_wins = 0
        self.cardoor = np.random.choice(self.doors)
        self.players_choice = None
        self.montys_choice = None 
        
    def choose_door(self,door):
        """ Player chooses a door at state 0, Monty will choose a remaining door to reveal a goat."""
        self.state = 1 
        self.players_choice = door
        self.montys_choice = np.random.choice(self.doors[(self.doors != self.players_choice) & (self.doors != self.cardoor)])
        
    def switch_door(self, do_switch):
        """ Player has the choice to switch from the door he has originally chosen to the remaining unopened door.
        
            If the door the player selected is the same as the cardoor, then num. of wins is incremented.
            
            Finally, number of plays will be incremented.
        """
        self.state = 2
        if do_switch:
            self.players_choice = self.doors[(self.doors != self.players_choice) & (self.doors != self.montys_choice)][0]
        if self.players_choice == self.cardoor:
            self.num_wins += 1
        self.num_plays += 1
    
    def continue_play(self):
        """ Players opts to continue playing the game.
        
            The game returns to state 0, but the num. wins and num.plays stay untouched.
            
            A new cardoor is randomly choosen.
        """
        self.state = 0
        self.cardoor = np.random.choice(self.doors)
        self.players_choice = None
        self.montys_choice = None
        
    def reset(self):
        """ The entire game state is returned to its initial state.
        
            All counters and variable holding state are re-initialized.      
        
        """
        self.state = 0
        self.prepare_game()
        

In brief:
* The `Monty` class represents a simple state model for the game.
* When an instance of the `Monty` game is created, game state-holding variables are initialized and a `cardoor` randomly chosen.
* After the player initially picks a door, `Monty` will choose a remaining door that does not have car behind it.
* The player can then choose to switch to the other, remaining unopened door, or stick with her initial choice.
* `Monty` will then see if the player wins or not, and updates the state-holding variables for num. wins and num. plays.
* The player can continue playing, or stop and reset the game to its original state.

### As a short simulation program

Here is an example showing how to use the `Monty` class above to run a simulation to see how often the switching strategy succeeds.

In [11]:
np.random.seed(42)

trials = 10 ** 5

game = Monty()
for _ in range(trials):
    game.choose_door(np.random.choice([1,2,3]))
    game.switch_door(True)
    game.continue_play()
    
print('In {} trials, the switching strategy won {} times.'.format(game.num_plays, game.num_wins))
print('Success rate is: {:.3F}'.format(game.get_success_rate()))

In 100000 trials, the switching strategy won 66532 times.
Success rate is: 0.665


----

Joseph K. Blitzstein and Jessica Hwang, Harvard University and Stanford University, &copy; 2019 by Taylor and Francis Group, LLC