# Causal Inference

Pearl, Glymour, and Jewell (2016). Causal Inferece in Statistics, A Primer

In [1]:
import numpy as np
import matplotlib

import umucv.prob as pr

def S(k):
    return lambda x: x[k]

def equal(k,v):
    return lambda x: x[k] == v

## Simpson Paradox

In [2]:
exper = pr.P({
    ('Men','Drug','Recover'):81,
    ('Men','Drug','Fail'):    87-81,
    ('Men','NoDrug','Recover'): 234,
    ('Men','NoDrug','Fail'): 270-234,
    ('Women','Drug','Recover'):192,
    ('Women','Drug','Fail'):    263-192,
    ('Women','NoDrug','Recover'): 55,
    ('Women','NoDrug','Fail'): 80-55})

In [3]:
exper

 11.6%  ('Men', 'Drug', 'Recover')
  0.9%  ('Men', 'Drug', 'Fail')
 33.4%  ('Men', 'NoDrug', 'Recover')
  5.1%  ('Men', 'NoDrug', 'Fail')
 27.4%  ('Women', 'Drug', 'Recover')
 10.1%  ('Women', 'Drug', 'Fail')
  7.9%  ('Women', 'NoDrug', 'Recover')
  3.6%  ('Women', 'NoDrug', 'Fail')

In [4]:
exper >> S(0)

 51.0%  Men
 49.0%  Women

In [5]:
exper >> S(1)

 50.0%  Drug
 50.0%  NoDrug

In [6]:
exper >> S(2)

 80.3%  Recover
 19.7%  Fail

The drug is good for men:

In [7]:
exper | equal(1,'Drug') | equal(0,'Men')

 93.1%  ('Men', 'Drug', 'Recover')
  6.9%  ('Men', 'Drug', 'Fail')

In [8]:
exper | equal(1,'NoDrug') | equal(0,'Men')

 86.7%  ('Men', 'NoDrug', 'Recover')
 13.3%  ('Men', 'NoDrug', 'Fail')

And the drug is good for women:

In [9]:
exper | equal(1,'Drug') | equal(0,'Women')

 73.0%  ('Women', 'Drug', 'Recover')
 27.0%  ('Women', 'Drug', 'Fail')

In [10]:
exper | equal(1,'NoDrug') | equal(0,'Women')

 68.8%  ('Women', 'NoDrug', 'Recover')
 31.2%  ('Women', 'NoDrug', 'Fail')

But the drug is worse for people of unknown gender:

In [11]:
exper | equal(1,'Drug')

 23.1%  ('Men', 'Drug', 'Recover')
  1.7%  ('Men', 'Drug', 'Fail')
 54.9%  ('Women', 'Drug', 'Recover')
 20.3%  ('Women', 'Drug', 'Fail')

In [12]:
(exper | equal(1,'Drug')) >> S(2)

 78.0%  Recover
 22.0%  Fail

In [13]:
(exper | equal(1,'NoDrug')) >> S(2)

 82.6%  Recover
 17.4%  Fail

## Adjustment formula

Section 3.2

The causal model says that we must control for gender:

In [14]:
# P(recover | drug and man)
p11 = (exper | equal(1,'Drug') | equal(0,'Men') ).prob(equal(2,'Recover'))
p11

0.9310344827586207

In [15]:
# P(recover | drug and woman)
p21 = (exper | equal(1,'Drug') | equal(0,'Women') ).prob(equal(2,'Recover'))
p21

0.7300380228136882

In [16]:
# P(recover | nodrug and man)
p12 = (exper | equal(1,'NoDrug') | equal(0,'Men') ).prob(equal(2,'Recover'))
p12

0.8666666666666667

In [17]:
# P(recover | nodrug and woman)
p22 = (exper | equal(1,'NoDrug') | equal(0,'Women') ).prob(equal(2,'Recover'))
p22

0.6875

In [18]:
# P(man)
p1 = exper.prob(equal(0,'Men'))
p1

0.51

In [19]:
# P(woman)
p2 = exper.prob(equal(0,'Women'))
p2

0.49

In [20]:
# P(recover | do(Drug))
d1 = p11*p1 + p21*p2
d1

0.8325462173856037

In [21]:
# P(recover | do(NoDrug))
d2 = p12*p1 + p22*p2
d2

0.778875

In [22]:
print(f"ACE = {100*(d1-d2):.0f}%")

ACE = 5%


> "A more informal interpretation of ACE here is that it is simply the difference in the fraction of the population that would recover if everyone took the drug compared to when no one takes the drug."