# Chapter 6 - SCM Modeling the Monte Hall Problem

The notebook is a code companion to chapter 6 of the book [Causal AI](https://www.manning.com/books/causal-ai) by [Robert Osazuwa Ness](https://www.linkedin.com/in/osazuwa/).

<a href="https://colab.research.google.com/github/altdeep/causalML/blob/master/book/chapter%206/Monte_Hall.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This tutorial runs through the SCM implementation of the Monty Hall problem as a probablistic model using pgmpy. Once we've implemented the model, we'll just do basic probabilistic inference. In chapter 9, we'll use this SCM model to do more advanced inferences. 

In [None]:
#!pip install pgmpy==0.1.19

The SCM has a set of exogenous variables, endogenous variables, and "assignment functions" that deterministically map the exogenous variables to the endogenous variables. Those assignment functions induce a causal DAG, where the root nodes are the exogenous variables. Knowing this, we'll use pgmpy to implement the model as a causal DAG. We'll start by building the DAG.

In [1]:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete.CPD import TabularCPD
from pgmpy.inference import VariableElimination

monty_hall_model = BayesianNetwork([
    ('Host Inclination', 'Host Door Selection'),
    ('Door with Car', 'Host Door Selection'),
    ('Player First Choice', 'Host Door Selection'),
    ('Player First Choice', 'Player Second Choice'),
    ('Host Door Selection', 'Player Second Choice'),
    ('Strategy', 'Player Second Choice'),
    ('Player Second Choice', 'Win or Lose'),
    ('Door with Car', 'Win or Lose')
])

Next, we use the `TabularCPD` ("tabular conditional pobability distribution") object to specify the probability distribution of each exogenous variable.

First we build a CPD for the variable Host Inclination. In cases when the player chooses the door with the car, the host has a choice between the two other doors. This variable is "left" when the host is inclined to choose the left-most door, and "right" if the host is inclined to choose the right-most door.

In [2]:
p_host_inclination = TabularCPD(
    variable='Host Inclination',
    variable_card=2,
    values=[[.5], [.5]],
    state_names={'Host Inclination': ['left', 'right']}
)

Next, we build a CPD for the variable representing which door has the prize car.  Assume each door has equal probability of having the car.

In [3]:
p_door_with_car = TabularCPD(
    variable='Door with Car',
    variable_card=3,
    values=[[1/3], [1/3], [1/3]],
    state_names={'Door with Car': ['1st', '2nd', '3rd']}
)

A CPD for variable representing the player's first door choice.  Each door has equal probability of being chosen.

In [4]:
p_player_first_choice = TabularCPD(
    variable='Player First Choice',
    variable_card=3,
    values=[[1/3], [1/3], [1/3]],
    state_names={'Player First Choice': ['1st', '2nd', '3rd']}
)

Next is a CPD for the variable representing the player's strategy.  "Stay" is the strategy of staying with the first choice, and "switch" is the strategy of switching doors.

In [5]:
p_host_strategy = TabularCPD(
    variable='Strategy',
    variable_card=2,
    values=[[.5], [.5]],
    state_names={'Strategy': ['stay', 'switch']}
)

To implement the deterministic assignment functions, we'll use `TabularCPD` only with values of 0 and 1, i.e., outcomes either have a probability of 1 or 0. This turns `TabularCPD` into look-up table.

We start with a assignment function for the host's door selection.

In [6]:
f_host_door_selection = TabularCPD(
    variable='Host Door Selection',
    variable_card=3,
    values=[
        [0,0,0,0,1,1,0,1,1,0,0,0,0,0,1,0,1,0],
        [1,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1],
        [0,1,0,1,0,0,0,0,0,1,1,0,1,1,0,0,0,0]
    ],
    evidence=['Host Inclination', 'Door with Car', 'Player First Choice'],
    evidence_card=[2, 3, 3],
    state_names={
        'Host Door Selection':['1st', '2nd', '3rd'],
        'Host Inclination': ['left', 'right'],
        'Door with Car': ['1st', '2nd', '3rd'],
        'Player First Choice': ['1st', '2nd', '3rd']
    }
)

  f_host_door_selection = TabularCPD(


Next we have the structural assignment function for the player's second choice.

In [7]:
f_second_choice = TabularCPD(
    variable='Player Second Choice',
    variable_card=3,
    values=[
        [1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0],
        [0,1,0,0,1,0,0,1,0,1,0,1,0,1,0,1,0,1],
        [0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,0,0]
    ],
    evidence=['Strategy', 'Host Door Selection', 'Player First Choice'],
    evidence_card=[2, 3, 3],
    state_names={
        'Player Second Choice': ['1st', '2nd', '3rd'],
        'Strategy': ['stay', 'switch'],
        'Host Door Selection': ['1st', '2nd', '3rd'],
        'Player First Choice': ['1st', '2nd', '3rd']
    }
)

  f_second_choice = TabularCPD(


Next is the assignment function for whether the player wins or loses.

In [8]:
f_win_or_lose = TabularCPD(
    variable='Win or Lose',
    variable_card=2,
    values=[
        [1,0,0,0,1,0,0,0,1],
        [0,1,1,1,0,1,1,1,0],
    ],
    evidence=['Player Second Choice', 'Door with Car'],
    evidence_card=[3, 3],
    state_names={
        'Win or Lose': ['win', 'lose'],
        'Player Second Choice': ['1st', '2nd', '3rd'],
        'Door with Car': ['1st', '2nd', '3rd']
    }
)

  f_win_or_lose = TabularCPD(


Finally, we add the probability distributions and the assignment functions to the causal DAG to build the causal graphical model.

In [9]:
monty_hall_model.add_cpds(
    p_host_inclination,
    p_door_with_car,
    p_player_first_choice,
    p_host_strategy,
    f_host_door_selection,
    f_second_choice,
    f_win_or_lose
)

Since this is a probabilistic graphical model, we can do normal graphical-model-based probabilistic inference. For example, we can look at the probability of winning or losing given a stay strategy, and the probability of winning or losing given a switch strategy.

In [10]:
inference_engine = VariableElimination(monty_hall_model)
print(inference_engine.query(['Win or Lose'], evidence={'Strategy': 'stay'}))
print(inference_engine.query(['Win or Lose'], evidence={'Strategy': 'switch'}))

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

+-------------------+--------------------+
| Win or Lose       |   phi(Win or Lose) |
| Win or Lose(win)  |             0.3333 |
+-------------------+--------------------+
| Win or Lose(lose) |             0.6667 |
+-------------------+--------------------+


  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

+-------------------+--------------------+
| Win or Lose       |   phi(Win or Lose) |
| Win or Lose(win)  |             0.6667 |
+-------------------+--------------------+
| Win or Lose(lose) |             0.3333 |
+-------------------+--------------------+


We can confirm that we infer that given someone won the car, the probability is higher that they took a switch strategy.

In [11]:
print(inference_engine.query(['Strategy'], evidence={'Win or Lose': 'win'}))

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

+------------------+-----------------+
| Strategy         |   phi(Strategy) |
| Strategy(stay)   |          0.3333 |
+------------------+-----------------+
| Strategy(switch) |          0.6667 |
+------------------+-----------------+


In chapter 9, we'll use this SCM model to do more advanced inferences.