### Causal AI
<img src="https://s7d1.scene7.com/is/content/dmqualcommprod/AI%20Causality%20Cow_OnQ_Inline_1" alt="qualcomm" width="400" align="left"/>

Image Source:
    
https://www.qualcomm.com/news/onq/2022/09/is-causality-the-missing-piece-of-the-ai-puzzle-

#### Libraries and config

In [None]:
import numpy as np
import pandas as pd

from dowhy import CausalModel
import dowhy.datasets

In [None]:
# Avoid printing dataconversion warnings from sklearn and numpy
import warnings
from sklearn.exceptions import DataConversionWarning
warnings.filterwarnings(action='ignore', category=DataConversionWarning)
warnings.filterwarnings(action='ignore', category=FutureWarning)

#### Logging

In [None]:
# Config dict to set the logging level
import logging
import logging.config
DEFAULT_LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'loggers': {
        '': {
            'level': 'WARN',
        },
    }
}

logging.config.dictConfig(DEFAULT_LOGGING)
logging.info("Getting started with DoWhy. Running notebook...")

#### Load data

In [None]:
data = dowhy.datasets.linear_dataset(beta=10,
        num_common_causes=5,
        num_instruments = 2,
        num_effect_modifiers=1,
        num_samples=5000,
        treatment_is_binary=True,
        stddev_treatment_noise=10,
        num_discrete_common_causes=1)
df = data["df"]
print(df.head())
print(data["dot_graph"])
print("\n")
print(data["gml_graph"])

In [None]:
df.head() # nicer

In [None]:
len(df)

#### Interface 1 (recommended): Input causal graph

In [None]:
# With graph
model=CausalModel(
        data = df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        graph=data["gml_graph"]
        )

model.view_model()

In [None]:
from IPython.display import Image, display
display(Image(filename="causal_model.png"))

#### Identification

DoWhy philosophy: Keep identification and estimation separate

Identification can be achieved without access to the data, acccesing only the graph. 

In [None]:
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True) # parameter here means ignore any unobserved confounding
print(identified_estimand)

#### Estimation

In [None]:
causal_estimate = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_stratification")
print(causal_estimate)
print("Causal Estimate is " + str(causal_estimate.value))

#### Causal effect on the control group (ATC)

difference between here and last code is just the target units are the control group (ATC)

In [None]:
causal_estimate_att = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_stratification",
        target_units = "atc")
print(causal_estimate_att)
print("Causal Estimate is " + str(causal_estimate_att.value))

#### Interface 2: Specify common causes and instruments

NB below results in same graph just that input is expressed as causes and instruments rather than input graph

In [None]:
# Without graph
model= CausalModel(
        data=df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        common_causes=data["common_causes_names"],
        effect_modifiers=data["effect_modifier_names"])

model.view_model()

In [None]:
from IPython.display import Image, display
display(Image(filename="causal_model.png"))

#### Identification & estimation

In [None]:
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)

In [None]:
estimate = model.estimate_effect(identified_estimand,
                                 method_name="backdoor.propensity_score_stratification")
print(estimate)
print("Causal Estimate is " + str(estimate.value))

same result as before (applied to all groups, not just control group...)

#### Refuting the estimate

Refutation methods provide tests that every correct estimator should pass. 

So if an estimator fails the refutation test (p-value is <0.05), then it means that there is some problem with the estimator.

##### 1) Invariant transformations

these are changes in the data that should not change the estimate (c. 12.72) 
    
Any estimator whose result varies significantly between the original data and the modified data fails the test;

###### Adding a random common cause variable

In [None]:
# NB this may take 2 mins to run...

res_random=model.refute_estimate(identified_estimand, estimate, 
                                 method_name="random_common_cause") # NB use random_seed = 1 if need to reproduce results
print(res_random)

###### Removing a random subset of the data

In [None]:
# NB this may take 2 mins to run...

res_subset=model.refute_estimate(identified_estimand, estimate,
        method_name="data_subset_refuter", subset_fraction=0.9)  # NB use random_seed = 1 if need to reproduce results
print(res_subset)


...in both cases effect is similar

##### 2) Nullifying transformations

after the data change, the causal true estimate is zero. 
    
Any estimator whose result varies significantly from zero on the new data fails the test.

###### Replacing treatment with a random (placebo) variable

In [None]:
res_placebo=model.refute_estimate(identified_estimand, estimate,
        method_name="placebo_treatment_refuter", placebo_type="permute")
print(res_placebo)

effect close to zero as you should expect from a placebo treatment

Reference:

https://www.pywhy.org/dowhy/v0.8/example_notebooks/dowhy_simple_example.html