# Example Code: Estimating Causal Effect of Grad School on Income



DoWhy Library: https://microsoft.github.io/dowhy/ <br>
Data from: https://archive.ics.uci.edu/ml/datasets/census+income

### Import modules

In [14]:
pip install dowhy



In [15]:
pip install econml



In [16]:
import pickle

import econml
import dowhy

from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor

### Load data

In [17]:
import pandas as pd

# Load the pickle file
file_path = "/content/df_causal_effects.p"  # Replace with the correct file path
df = pd.read_pickle(file_path)

# Display the first few rows of the DataFrame
print(df.head())


   age  hasGraduateDegree  greaterThan50k
0   39              False           False
1   50              False           False
2   38              False           False
3   53              False           False
5   37               True           False


### Define causal model

In [None]:
model = dowhy.CausalModel(
        data = df,
        treatment= "hasGraduateDegree",
        outcome= "greaterThan50k",
        common_causes="age",
        )



#### Linear Regression
First we try linear regression.

In [18]:
estimand = model.identify_effect(proceed_when_unidentifiable=True)

LR_estimate = model.estimate_effect(estimand, method_name="backdoor.linear_regression")

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`


In [19]:
print(LR_estimate)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
         d                                 
────────────────────(E[greaterThan50k|age])
d[hasGraduateDegree]                       
Estimand assumption 1, Unconfoundedness: If U→{hasGraduateDegree} and U→greaterThan50k then P(greaterThan50k|hasGraduateDegree,age,U) = P(greaterThan50k|hasGraduateDegree,age)

## Realized estimand
b: greaterThan50k~hasGraduateDegree+age
Target units: ate

## Estimate
Mean value: 0.2976051357033036



#### Double Machine Learning

Next, we try Double ML which is a bit overkill for this simple example, espeically with the treatment and outcome variable only taking values of 0 or 1.

Note that the models we use in the DML process are all linear regression for this example, however more sophisticated techniques can be used for more complex problems.

In [None]:
DML_estimate = model.estimate_effect(estimand,
                                     method_name="backdoor.econml.dml.DML",
                                     method_params={"init_params":{
                                         'model_y':LinearRegression(),
                                         'model_t':LinearRegression(),
                                         'model_final':LinearRegression()
                                                                  },
                                                   "fit_params":{}
                                              })

The final model has a nonzero intercept for at least one outcome; it will be subtracted, but consider fitting a model without an intercept if possible.


In [None]:
print(DML_estimate)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
         d                                 
────────────────────(E[greaterThan50k|age])
d[hasGraduateDegree]                       
Estimand assumption 1, Unconfoundedness: If U→{hasGraduateDegree} and U→greaterThan50k then P(greaterThan50k|hasGraduateDegree,age,U) = P(greaterThan50k|hasGraduateDegree,age)

## Realized estimand
b: greaterThan50k~hasGraduateDegree+age | 
Target units: ate

## Estimate
Mean value: 0.29738715968521423
Effect estimates: [[0.29738716]]



#### X-learner
Finally we try the X-learner making use of decision trees for our sub-models.

In [None]:
Xlearner_estimate = model.estimate_effect(estimand,
                                method_name="backdoor.econml.metalearners.XLearner",
                                method_params={"init_params":{
                                                    'models': DecisionTreeRegressor()
                                                    },
                                               "fit_params":{}
                                              })

A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().


In [None]:
print(Xlearner_estimate)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
         d                                 
────────────────────(E[greaterThan50k|age])
d[hasGraduateDegree]                       
Estimand assumption 1, Unconfoundedness: If U→{hasGraduateDegree} and U→greaterThan50k then P(greaterThan50k|hasGraduateDegree,age,U) = P(greaterThan50k|hasGraduateDegree,age)

## Realized estimand
b: greaterThan50k~hasGraduateDegree+age
Target units: ate

## Estimate
Mean value: 0.20232049378002753
Effect estimates: [[ 0.31037666]
 [ 0.21099013]
 [ 0.36363636]
 ...
 [ 0.16049383]
 [-0.00342775]
 [ 0.2008029 ]]

