## DoWhyWrapper
Use case to test the DoWhy single estimator pipeline. 
Causal inference pipeline using DoWhy follows these 4 steps:
    
Step 1: Model the problem as a causal graph <br>
Step 2: Identify causal effect using properties of the causal graph <br>
Step 3: Estimate the causal effect <br>
Step 4: Refute the estimate <br>

This signle estimator pipeline builds the model and identify causal effect in __init__ function, and estimate causal effect by calling function estimate_effect(). 

In [9]:
import numpy as np
import pandas as pd
import logging
import dowhy.datasets
import warnings
warnings.filterwarnings('ignore')

import DoWhyWrapper as dw

In [6]:
BETA=10
data = dowhy.datasets.linear_dataset(BETA, num_common_causes=4, num_samples=10000,
                                    num_instruments=2, num_effect_modifiers=2,
                                     num_treatments=1,
                                    treatment_is_binary=False,
                                    num_discrete_common_causes=2,
                                    num_discrete_effect_modifiers=0,
                                    one_hot_encode=False)
df=data['df']
print(df.head())
print("True causal estimate is", data["ate"])

         X0        X1   Z0        Z1        W0        W1 W2 W3         v0  \
0  0.622146  0.313640  0.0  0.728865 -0.744456 -0.870246  1  0   6.730920   
1  0.323380 -0.480560  0.0  0.344802 -0.114414 -1.345009  0  1  -0.498094   
2  1.513350 -1.954286  1.0  0.322154 -0.735807 -0.773046  0  0   7.762403   
3  0.090779  0.114228  0.0  0.002952 -0.893687 -1.831622  0  2  -6.740592   
4 -0.170646 -1.859661  0.0  0.551105 -0.934392 -0.097023  1  2  10.350205   

           y  
0  84.793741  
1  -5.022070  
2  95.102932  
3 -69.694987  
4  78.979495  
True causal estimate is 11.679071359688782


In [7]:
data["treatment_name"]

['v0']

In [10]:
singlePipeline = dw.DoWhyWrapper(df=df, treatment=data["treatment_name"], outcome=data["outcome_name"],
                    graph=data["gml_graph"])

------Step 2: Identifying causal effect------
Estimand type: nonparametric-ate

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d                                    
─────(Expectation(y|W2,X0,X1,W1,W3,W0))
d[v₀]                                  
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W2,X0,X1,W1,W3,W0,U) = P(y|v0,W2,X0,X1,W1,W3,W0)

### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, [Z0, Z1])*Derivative([v0], [Z0, Z1])**(-1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0,Z1})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→{v0}, then ¬({Z0,Z1}→y)

### Estimand : 3
Estimand name: frontdoor
No such variable found!



### Linear Model as the default estimator

In [11]:
_ = singlePipeline.estimate_effect()

------Step 3: Estimating causal effect------
Using method: backdoor.linear_regression
DoWhy Causal Estimate is 11.678987415622561


### Estimating CATE using Linear Model
Below the estimated effect of changing treatment from 0 to 1

In [12]:
_ = singlePipeline.estimate_effect(method_name="backdoor.linear_regression",
                                       control_value=0,
                                       treatment_value=1)

------Step 3: Estimating causal effect------
Using method: backdoor.linear_regression
DoWhy Causal Estimate is 11.678987415622561


### Estimating CATE using EconML

In [13]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor


_ = singlePipeline.estimate_effect(method_name="backdoor.econml.dml.DML",
                                     control_value = 0,
                                     treatment_value = 1,
                                 target_units = lambda df: df["X0"]>1,  # condition used for CATE
                                 confidence_intervals=False,
                                method_params={"init_params":{'model_y':GradientBoostingRegressor(),
                                                              'model_t': GradientBoostingRegressor(),
                                                              "model_final":LassoCV(fit_intercept=False),
                                                              'featurizer':PolynomialFeatures(degree=1, include_bias=False)},
                                               "fit_params":{}})

------Step 3: Estimating causal effect------
Using method: backdoor.econml.dml.DML
DoWhy Causal Estimate is 15.22819072957848


### CATE and confidence intervals
EconML provides its own methods to compute confidence intervals. Using BootstrapInference in the example below.

In [14]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor
from econml.inference import BootstrapInference
_ = singlePipeline.estimate_effect(  method_name="backdoor.econml.dml.DML",
                                     target_units = "ate",
                                     confidence_intervals=True,
                                     method_params={"init_params":{'model_y':GradientBoostingRegressor(),
                                                              'model_t': GradientBoostingRegressor(),
                                                              "model_final": LassoCV(fit_intercept=False),
                                                              'featurizer':PolynomialFeatures(degree=1, include_bias=True)},
                                               "fit_params":{
                                                               'inference': BootstrapInference(n_bootstrap_samples=100, n_jobs=-1),
                                                            }
                                              })

------Step 3: Estimating causal effect------
Using method: backdoor.econml.dml.DML
DoWhy Causal Estimate is 11.61936772623915


### Refuting the estimate

In [15]:
# Adding a random common cause variable
_ = singlePipeline.refute_estimate(estimator_name="backdoor.econml.dml.DML", method_name="random_common_cause")

------Step 4: Refuting the estimate------
Refute: Add a Random Common Cause
Estimated effect:11.61936772623915
New effect:11.602466316417575



In [None]:
# Replacing treatment with a random (placebo) variable
_ = singlePipeline.refute_estimate(estimator_name="backdoor.econml.dml.DML",  method_name="data_subset_refuter", 
                                   subset_fraction=0.8, num_simulations=10)

In [None]:
# Removing a random subset of the data
_ = singlePipeline.refute_estimate(estimator_name="backdoor.econml.dml.DML",  method_name="placebo_treatment_refuter", 
                                   placebo_type="permute", num_simulations=10)