In [None]:
pip install dowhy

# Example 1 - Applying DoWhy On Simulated Dataset

In this example, we will create dummy dataset using DoWhy library and apply CausalModel on it. Furthermore, we will go through the 4 major steps of Causal AI.

## Importing the Libraries

In [None]:
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
from dowhy import CausalModel
import dowhy.datasets

## Loading Sample Data From DoWhy

In [None]:
data = dowhy.datasets.linear_dataset(
            beta=10,
            num_common_causes=1,
            num_instruments=1,
            num_samples=10000,
            treatment_is_binary=True)
data

## Step 1 Of Causal AI: Create a causal model from the data and given graph.

In [None]:
model = CausalModel(
            data=data["df"],
            treatment=data["treatment_name"],
            outcome=data["outcome_name"],
            graph=data["gml_graph"])

In [None]:
model.view_model()

## Step 2 Of Causal AI: Identify causal effect and return target estimands

In [None]:
identified_estimand = model.identify_effect()
print(identified_estimand)

## Step 3 Of Causal AI: Estimate the target estimand using a statistical method.

In [None]:
estimate = model.estimate_effect(identified_estimand,
                                 method_name="backdoor.propensity_score_matching")
print(estimate)

## Step 4 Of Causal AI: Refute the obtained estimate using multiple robustness checks.

There are different methods supported by DoWhy package to refute (prove or disaprove) obtained results.

We will use two most popular methods:

1) Adding Random Common Cause

2) Using Subset of Data

In method 1, randomly-generated cofounder (e.g. independent variable) is added to data to check whether estimation remains almost same or gets changed after data is changed.

In method 2, random subset of data is removed to check whether estimation remains almost same after data is reduced.

Furthermore, P-Value plays important role in validating whether refute test passes or fails. If P-Value is less than 0.05 then it means test has failed.

In [None]:
refute_random_cause = model.refute_estimate(identified_estimand, estimate, method_name="random_common_cause")
print(refute_random_cause)

In [None]:
refute_data_subset = model.refute_estimate(identified_estimand, estimate, method_name="data_subset_refuter")
print(refute_data_subset)

As we can see in both refute tests, new effect is almost similar or nearer to estimated effect and p-value is also greater than 0.05 so we can conclude that both refute tests have passed.