# Introduction to Causal Inference

In [6]:
import pandas as pd
import numpy as np
%matplotlib inline

Why Study Causal Inference? 
- What are some mistakes that could happen if we do a naive analysis?
  - Confounding example
  - The definition of the do-operator

RCTs
    To find causal effects, one could randomize the treatment assignment. 

Chain

Fork

Collider

Descendant of a collider



Backdoor Criterion - 
  - When is seeing = doing?
  - assumes you know the structure of the problem well enough
  
do-calculus

Frontdoor Criterion
  - lets us deal with partial knowledge of the graph / unmeasured variables
  
Crazy example

Transportability

Resources

Book Club

### Hypothetical Scenario

We want to improve students' graduation rates. Let's say we have some treatment T and we're trying to analyze the efficacy of treatment T. We find that on average, the graduation rates of those who get treatment T are 26 percentage points worse than those who didn't get treatment T. In other words,
\begin{equation}
\begin{aligned}
    P(G=1 \mid T=1) - P(G=0 \mid T=0) \approx -26\%
\end{aligned}
\end{equation}

### Question:
Should we recommend treatment T to students?

### Answer:

It depends on the data-generating mechanism!

## Correlation Does Not Necessarily Imply Causation
Here's an example where looking at associations naively leads to the wrong conclusion. 

Treatment is negatively associated with graduation rate, even though the treatment is actually beneficial, on average!

How could that be? Let's say in a hypothetical world, there are only three variables that matter: treatment $T$, graduation rate $G$. Students classified as "at-risk" $R$ are more likely to be sent to receive tutoring. At-risk students by definition are less likely to graduate. In other words, $R$ is a common cause of $T$ and $G$. Another way to say that is $T$ and $G$ are **confounded** by $R$. So if there's a lot of people at-risk who receive treatment, then it is quite possible that there is a negative association between the treatment and outcome, *even though the treatment actually improves the outcomes for most/all individuals*.

![Risk is a common cause of treatment and graduation](./img/risk-tutoring-graduate.png)

### Confounding Example Code

In [59]:
sample_size = 1000000

In [60]:
def risk(sample_size=10000):
    """
      We generate two types of people: High risk vs. Low-risk people. 
      Risk, for example, could be in relation to dropping out of high school.
    """
    
    return np.random.binomial(n=1, p=0.3, size=sample_size)

In [61]:
def tutoring(riskiness, proba_tutor_given_not_risky=0.1, proba_tutor_given_risky=0.9):
    """
        Non-risky people (riskiness == 0) have a 10% chance of receiving a tutoring.
        However, risky people have a 90% chance of receiving the tutoring.
    """
    
    probability_of_receiving_tutoring = \
        (riskiness == 0) * proba_tutor_given_not_risky + \
        (riskiness == 1) * proba_tutor_given_risky
    
    return np.random.binomial(n=1, p=probability_of_receiving_tutoring)

In [71]:
def graduate(riskiness, tutored):
    """
        If risky and tutoring, graduation rate = 0.3
        If risky and not tutoring, graduation rate = 0.2
        If not-risky and tutoring, graduation rate = 0.9
        If not-risky and not tutoring, graduation rate = 0.8
        
        Notice that tutoring increases graduation rates by 10 percentage points.
    """
    
    risky_and_tutored_grad_rate = 0.3
    risky_and_not_tutored_grad_rate = 0.2
    not_risky_and_tutored_grad_rate = 0.9
    not_risky_and_not_tutored_grad_rate = 0.8
    
    graduation_probas = (riskiness == 1) * (tutored == 1) * risky_and_tutored_grad_rate + \
        (riskiness == 1) * (tutored == 0) * risky_and_not_tutored_grad_rate + \
        (riskiness == 0) * (tutored == 1) * not_risky_and_tutored_grad_rate + \
        (riskiness == 0) * (tutored == 0) * not_risky_and_not_tutored_grad_rate
    
    return np.random.binomial(n=1, p=graduation_probas)

In [63]:
riskiness = risk(sample_size)
tutored = drug(riskiness)
graduated = graduate(riskiness, tutored)

In [64]:
df = pd.DataFrame({
    'risk': riskiness,
    'tutored': tutored,
    'graduated': graduated
})

Associated Risk Difference: $P(G=1 \mid T=1) - P(G=0 \mid T=0) \approx -26\%$

In [69]:
round(
    df[df['tutored'] == 1].graduated.mean() - df[df['tutored'] == 0].graduated.mean(),
    2
)

-0.26

However, if we do a randomized control trial (i.e. we randomize the assignment of the treatment), we see that the treatment actually improves the outcomes by 10%, on average!

In [72]:
# Ri
treated_sample = risk(sample_size)
untreated_sample = risk(sample_size)

round(
    graduate(treated_sample, tutored=1).mean() - graduate(untreated_sample, tutored=0).mean(),
    2
)

0.1

Causal Risk Difference: $P(G=1 \mid do(T=1)) - P(G=0 \mid do(T=0)) = 10\%$

If we were to only look at the associated risk difference (i.e. make a comparison of graduation rates of those who got treated vs. those who didn't get treated), we would erroneously conclude that the treatment, on average, is bad (i.e. decreases graduation rate by 26%), **but in actuality, it boosts graduation rate, on average, by 10 percentage points!**

## Randomized Control Trials (A/B Testing)


Pros:
- The only thing that's different between the treatment group and the control group is the presence or lack of treatment. Therefore, the difference in outcomes between the two could be attributed to the treatment!
- Good [*internal validity*](https://en.wikipedia.org/wiki/Internal_validity).
- Easy to understand, easy to interpret.

Fig 1: Treatment Group in RCT (one square = 100k of that animal)

|*|*|*|
|-|-|-|
| ![Apple-dog](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/dog-face_1f436.png) | ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) | ![Apple-dog](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/dog-face_1f436.png) |
| ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) | ![Apple-dog](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/dog-face_1f436.png) | ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) |
| ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) | ![Apple-dog](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/dog-face_1f436.png) | ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) |

Fig 2: Control Group in RCT (one square = 100k of that animal)

|*|*|*|
|-|-|-|
| ![Apple-dog](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/dog-face_1f436.png) | ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) | ![Apple-dog](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/dog-face_1f436.png) |
| ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) | ![Apple-dog](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/dog-face_1f436.png) | ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) |
| ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) | ![Apple-dog](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/dog-face_1f436.png) | ![Apple-cat](https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/225/cat_1f408.png) |


Cons: 

- Might be unethical / really expensive.
- Still susceptible to Selection Bias.
- Questionable transportability: (e.g. RCT on mice. Would the results of that study be applicable to humans?)


## Confounding

### Chains

### Forks

### Colliders

### Descendants of Colliders

### Backdoor Criterion

### $do$-calculus

### Front-door Criterion

### Complex Arbitrary DAG with Unmeasured Variables Example

### Mediation & Direct Effects

### Transportability

# Resources


| Image | Notes | Link |
| - | - | - |
| <img src='https://prodimage.images-bn.com/pimages/9781541698963_p0_v1_s600x595.jpg' alt='Book of Why: The New Science of Cause & Effect cover' width=500> | An introduction meant for the more general public. It still is technical, has some math, but focuses more on stories and anecdotes instead of derivations. | [Book of Why: The New Science of Cause & Effect](https://www.amazon.com/Book-Why-Science-Cause-Effect/dp/046509760X) | 
| <img alt='Causal Inference in Statistics: A Primer' src='https://s3.amazonaws.com/vh-woo-images/causal-inference-in-statistics-a-primer-1st-edition.jpg' width=500> | Recommended by Pearl to be read after the Book of Why. Dives more into the math. Has end-of-chapter exercises. *Note: I have the solutions manual! I told Pearl I was self-studying and he graciously gave me a copy!* | [Causal Inference in Statistics: A Primer](https://www.amazon.com/Causal-Inference-Statistics-Judea-Pearl/dp/1119186846) |
| <img src='https://images-na.ssl-images-amazon.com/images/I/511aGcbGLyL._SX343_BO1,204,203,200_.jpg' alt='Causality' width=500> | Goes more in-depth than the Primer book. | [Causality](https://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X) |
| <img alt='Causal Inference: The Mixtape' src='https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1566276665i/47867837._UY630_SR1200,630_.jpg' width=500> | Has a section on DAGs, but focused more on causal inference techniques that are more commonly being used in Economics. *FREELY AVAILABLE*. | [Causal Inference: The Mixtape](https://www.scunning.com/mixtape.html) |
| <img alt='Causal Diagrams: Draw your Assumptions Before Conclusions' src='./img/causal-diagrams-draw-your-assumptions.png' width=500> | *FREE* course on EDX. Makes use of Epidemiological case studies to serve as context as to why drawing your assumptions is important.  | [Causal Diagrams: Draw your Assumptions Before Conclusions](https://online-learning.harvard.edu/course/causal-diagrams-draw-your-assumptions-your-conclusions) |
| not applicable | "Causal Inference: What If" is a book that dives into Hernan & Robins' Potential Outcomes with DAGs approach. *FREE*. | [Causal Inference: What if](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) |