# Getting started with causal inference: A more advanced example

We will load in a sample dataset and estimate causal effect from a (pre-specified)treatment variable to a (pre-specified) outcome variable.

First, let us add required path for python to find DoWhy code and load required packages.

In [1]:
import numpy as np
import pandas as pd
from collections import namedtuple
import datasets 

Let us first load a dataset. For simplicity, we simulate a dataset with linear relationships between common causes and treatment, and common causes and outcome. 

Beta is the true causal effect. 

In [2]:
data = datasets.linear_dataset(beta=10,
        num_common_causes=5,
        num_instruments=2,
        num_samples=10000, 
        treatment_is_binary=True)
df = data["df"]
print(df[["v", "y", "Z0"]].head())  

     v         y   Z0
0  0.0 -4.023765  1.0
1  0.0 -2.948458  1.0
2  0.0 -7.349936  1.0
3  0.0 -2.757042  1.0
4  0.0 -3.890549  1.0


Note that we are using a pandas dataframe to load the data.

## Step 1, Model: Input causal graph

We now input a causal graph in the DOT graph format.

<img src="advanced_causal_model.png">

We get a more complicated causal graph. Now identification and estimation needs to be done.

## Step 2, Identify: Use graph criteria

Estimand type: ate
### Estimand : 1
**Estimand name**: backdoor

**Estimand expression**:

d                                      
──(Expectation(y|X2,Z0,X1,X4,X0,Z1,X3))

dv                                     

**Estimand assumption 1**, 

Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X1,X4,X0,Z1,X3,U) = P(y|v,X2,Z0,X1,X4,X0,Z1,X3)

### Estimand : 2
**Estimand name**: iv

**Estimand expression**:

Expectation(Derivative(y, Z1)/Derivative(v, Z1))

**Estimand assumption 1**, 

Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)

**Estimand assumption 2**, 

As-if-random: If U→→y then ¬(U →→Z1,Z0)

## Step 3: Estimate: Use instrumental variable estimator

In [3]:
    def instrumental_variables_estimator(df, treatment_name="v"):
        instrument = df["Z0"]
        num_unique_values = len(np.unique(instrument))
        instrument_is_binary= (num_unique_values <=2)
        if instrument_is_binary:
            # Obtain estimate by Wald Estimator
            y1_z =np.mean(df["y"][instrument==1])
            y0_z = np.mean(df["y"][instrument==0])
            x1_z = np.mean(df[treatment_name][instrument==1])
            x0_z = np.mean(df[treatment_name][instrument==0])
            num = y1_z - y0_z
            deno = x1_z - x0_z
            iv_est = num/deno
        else:
            # Obtain estimate by Pearl (1995) ratio estimator.
            # y = x+ u; multiply both sides by z and take expectation.
            num_yz = np.dot(df["y"], instrument)
            deno_xz = np.dot(df[treatment_name], instrument)
            iv_est = num_yz/deno_xz

        CausalEstimate = namedtuple('CausalEstimate', ['value'])
        estimate = CausalEstimate(value= iv_est)
        return estimate
    
causal_estimate = instrumental_variables_estimator(df)
print("Causal Estimate is " + str(causal_estimate.value))

Causal Estimate is 4.72282685699


## Step 4, Refute: Sensitivity analysis

Now refuting the obtained estimate.

### Adding a random common cause variable

In [4]:
    def refute_estimate_random(df, causal_estimate):
        num_rows = df.shape[0]
        new_data = df.assign(w_random=np.random.randn(num_rows))
        new_effect = instrumental_variables_estimator(new_data)
        refute = new_effect.value
        return(refute)

res_random = refute_estimate_random(df, causal_estimate)
print("Causal estimate was: {0}".format(causal_estimate.value))
print("New refuted estimate is: {0}".format(res_random))

Causal estimate was: 4.722826856987969
New refuted estimate is: 4.722826856987969


### Replacing treatment with a random (placebo) variable

In [5]:
 def refute_estimate_placebo(df, estimate, placebo_type="permte", sims=100):                                                      
     num_rows = df.shape[0]
     refute =0
     for i in range(sims):
         if placebo_type == "permute":                                         
             new_treatment =df["v"].sample(frac=1).values
         else:                                                                       
             new_treatment = np.random.randn(num_rows)                               
         new_data = df.assign(placebo=new_treatment)                         

         new_estimate = instrumental_variables_estimator(new_data, treatment_name="placebo")

         refute += new_estimate.value
     refute = refute/sims
     return(refute)                                                              

res_placebo = refute_estimate_placebo(df, causal_estimate)
print("Causal estimate was: {0}".format(causal_estimate.value))
print("New refuted estimate is: {0}".format(res_placebo))

Causal estimate was: 4.722826856987969
New refuted estimate is: -2.556359767707989


### Removing a random subset of the data

In [6]:
   def refute_estimate_subset(df, causal_estimate, subset_fraction=0.8):
        new_data = df.sample(frac=subset_fraction)
        new_effect = instrumental_variables_estimator(new_data)
        refute = new_effect.value
        return(refute)

res_subset = refute_estimate_subset(df, causal_estimate)
print("Causal estimate was: {0}".format(causal_estimate.value))
print("New refuted estimate is: {0}".format(res_subset))


Causal estimate was: 4.722826856987969
New refuted estimate is: 9.706927626426125


As you can see, our estimate is somewhat sensitive to simple refutations.