# Major assumptions of Cauasl Inference:
1. There are no unobserved confounding factors (Affects BOTH treatment and outcomes)
2. Every class of people (X) should be distributed across both treatment and control groups. (Need not be equally) Otherwise, counterfactual cannot be estimated.

The fundamental problem in Causal Inference is the impossibility of simultaneously observing the same unit (e.g., a person, a group, or an entity) in both the treated and untreated states. And this is solved by building counter factuals for each unit. There are two common ways of building these counter factuals:

1. Co-variate adjustment (Machine Learning Models - Depending on the model used, it can be parametric)
2. Propensity Scores (Non-Parametric)
3. Matching - Exact Neighbour | Nearest Neighbour.

In [2]:
import dowhy
import pandas as pd
import numpy as np

In [5]:
df = pd.read_csv('lalonde.csv')

In [6]:
df

Unnamed: 0,ID,Training,Age,Education_years,Married,No_Degree,Wage_1974,Wage_1975,Wage_1978
0,0,False,23,10,0,1,0.00,0.00,0.000
1,1,False,26,12,0,0,0.00,0.00,12383.680
2,2,False,22,9,0,1,0.00,0.00,0.000
3,3,False,18,9,0,1,0.00,0.00,10740.080
4,4,False,45,11,0,1,0.00,0.00,11796.470
...,...,...,...,...,...,...,...,...,...
440,440,True,33,12,1,0,20279.95,10941.35,15952.600
441,441,True,25,14,1,0,35040.07,11536.57,36646.950
442,442,True,35,9,1,1,13602.43,13830.64,12803.970
443,443,True,35,8,1,1,13732.07,17976.15,3786.628


## Does completion of Training increase wages?

In [7]:
#Create DAG - Causal Model

g = """digraph {
Training;
No_Degree;
Age;
Education_years;
Married;
Wage_1974;
Wage_1978;

Training -> Wage_1978;
Education_years -> Wage_1978;
Married -> Wage_1978;
Wage_1974 -> Wage_1978;
No_Degree -> Wage_1978;
Age -> Wage_1978;


Education_years -> Training;
No_Degree -> Training;
Married -> Training;

}"""

In [6]:
from dowhy import CausalModel

In [10]:
treatment_col = "Training"
outcome_col = "Wage_1978"

model = CausalModel(data=df,
                    treatment = treatment_col,
                   outcome=outcome_col,
                   graph=g)

In [11]:
identified_estimand = model.identify_effect(proceed_when_unidentifiable = True)
print(identified_estimand)

Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
     d                                                     
───────────(E[Wage_1978|Education_years,Married,No_Degree])
d[Training]                                                
Estimand assumption 1, Unconfoundedness: If U→{Training} and U→Wage_1978 then P(Wage_1978|Training,Education_years,Married,No_Degree,U) = P(Wage_1978|Training,Education_years,Married,No_Degree)

### Estimand : 2
Estimand name: iv
No such variable(s) found!

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!



In [13]:
method = "backdoor.linear_regression"
desired_effect = "ate" #Average Treatment Effect

estimate = model.estimate_effect(identified_estimand,
                                method_name=method,
                                target_units=desired_effect,
                                method_params={"weighting_scheme":"ips_weight"})

print("Causal Estimate is " + str(estimate.value))

Causal Estimate is 1627.257614677108


  by_effect_mods = data.groupby(effect_modifier_names)
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]


In [14]:
#Refutation
refute_placebo_treatment = model.refute_estimate(
    identified_estimand,
    estimate,
    method_name="placebo_treatment_refuter",
    placebo_type="permute"
)

print(refute_placebo_treatment)

  by_effect_mods = data.groupby(effect_modifier_names)
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  by_effect_mods = data.groupby(effect_modifier_names)
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.

Refute: Use a Placebo Treatment
Estimated effect:1627.257614677108
New effect:-156.95214367672534
p value:0.8



  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]


# Counter Factuals

In [15]:
treatment_value_treated = True
treatment_value_control = False

dataset_copy = model._data.copy()  # copy because doWhy adds columns

mean_outcome = dataset_copy.loc[:, outcome_col].mean()
print(f'Actual mean outcome [all participants]: {mean_outcome}')

def get_cohort_outcome(
    df: pd.DataFrame,
    col_selector,
    cohort_value,
):
    cohort_rows = df[df[col_selector] == cohort_value]
    mean_outcome = cohort_rows.loc[:, outcome_col].mean()    
    return mean_outcome

mean_outcome_control = get_cohort_outcome(
    dataset_copy,
    treatment_col,
    treatment_value_control,
)
print(f'Actual mean outcome [control group]: {mean_outcome_control}')

mean_outcome_treated = get_cohort_outcome(
    dataset_copy,
    treatment_col,
    treatment_value_treated,
)
print(f'Actual mean outcome [treated group]: {mean_outcome_treated}')

Actual mean outcome [all participants]: 5300.763698561798
Actual mean outcome [control group]: 4554.801126
Actual mean outcome [treated group]: 6349.143530270271


In [18]:
# "do(x): Given a value x for the treatment, returns the 
# expected value of the outcome when the treatment is 
# intervened to a value x."

cf_estimate_control = estimate.estimator.do(
    x=treatment_value_control,
    data_df=dataset_copy,
)
print(f'Mean outcome if all control [all participants]: {cf_estimate_control}')

cf_estimate_treated = estimate.estimator.do(
    x=treatment_value_treated,
    data_df=dataset_copy,
)

print(f'Mean outcome if all treated [all participants]: {cf_estimate_treated}')

AttributeError: 'bool' object has no attribute 'copy'

# References
1. [Explainer - 2 Videos](https://www.youtube.com/watch?v=Od6oAz1Op2k)
2. [MIT-OCW Part1](https://www.youtube.com/watch?v=gRkUhg9Wb-I)
3. [MIT-OCW Part2](https://www.youtube.com/watch?v=g5v-NvNoJQQ)

In [4]:
df = pd.read_csv("CORIS.txt")

# encode famhist with dummy 0-1 variable
df['famhist'] = pd.get_dummies(df['famhist'])['Present']
df['famhist'] = df['famhist'].astype(int)
target = 'chd'
features = ['sbp', 'tobacco', 'ldl', 'famhist', 'obesity', 'alcohol', 'age']
#features = ['famhist','ldl','age']

df.drop('row.names', axis=1, inplace=True)
df.head()

Unnamed: 0,sbp,tobacco,ldl,adiposity,famhist,typea,obesity,alcohol,age,chd
0,160,12.0,5.73,23.11,1,49,25.3,97.2,52,1
1,144,0.01,4.41,28.61,0,55,28.87,2.06,63,1
2,118,0.08,3.48,32.28,1,52,29.14,3.81,46,0
3,170,7.5,6.41,38.03,1,51,31.99,24.26,58,1
4,134,13.6,3.5,27.78,1,60,25.99,57.34,49,1


In [40]:
#Create DAG - Causal Model

g = """digraph {
sbp;
tobacco;
age;
adiposity;
famhist;
ldl;
typea;
obesity;
alcohol;
chd;

obesity -> chd;
sbp -> chd;
age -> chd;
tobacco -> chd;
ldl -> chd;
adiposity -> chd;
famhist -> chd;
typea -> chd;
alcohol -> chd;

alcohol -> obesity;
}"""

In [41]:
treatment_col = "obesity"
outcome_col = "chd"

model = CausalModel(data=df,
                    treatment = treatment_col,
                   outcome=outcome_col,
                   graph=g)

In [42]:
identified_estimand = model.identify_effect(proceed_when_unidentifiable = True)
print(identified_estimand)

Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
    d                     
──────────(E[chd|alcohol])
d[obesity]                
Estimand assumption 1, Unconfoundedness: If U→{obesity} and U→chd then P(chd|obesity,alcohol,U) = P(chd|obesity,alcohol)

### Estimand : 2
Estimand name: iv
No such variable(s) found!

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!



In [43]:
method = "backdoor.linear_regression"
desired_effect = "ate" #Average Treatment Effect

estimate = model.estimate_effect(identified_estimand,
                                method_name=method,
                                target_units=desired_effect,
                                method_params={"weighting_scheme":"ips_weight"})

  by_effect_mods = data.groupby(effect_modifier_names)
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]
  intercept_parameter = self.model.params[0]


In [44]:
print("Causal Estimate is " + str(estimate.value))

Causal Estimate is -0.009994145695698675
