In [1]:
import statsmodels.api as sm
from statsmodels.stats.mediation import Mediation
import pandas as pd
import numpy as np
from sklearn.decomposition import FactorAnalysis

In addition to the ML models and propensity score matching analysis, for this thesis we also proposed using new causal inference modeling techniques to mine the UK Biobank data to better infer the causal link between heart
and brain diseases. For that purpose, we used causal mediation analysis, we assembled several graphs of potential relationships between each of the three datasets, and measured the strength of the connections in these graphs to simultaneously estimate the causal connection between brain structures, heart structures, and vascular health. 

# 0. Preparing dataset

In [2]:
# Reading and filtering datasets
data = pd.read_csv("casuality_data_final_factor_analyzer.csv")
heart_df = data.filter(regex='heart')
cardio_cmr_df = data.filter(regex='cardio_cmr')
X1 = pd.concat([heart_df, cardio_cmr_df], axis=1)
X2 = data.filter(regex='brain')
agg_score = data["agg_score"]
X1.shape, X2.shape, agg_score.shape

((2065, 639), (2065, 744), (2065,))

In [3]:
# Extracting latent factor for heart and brain
factor_heart = FactorAnalysis(n_components=1)
factor_heart = factor_heart.fit_transform(X1, agg_score)
factor_brain = FactorAnalysis(n_components=1)
factor_brain = factor_brain.fit_transform(X2, agg_score)
factor_heart = factor_heart[:, 0]
factor_brain = factor_brain[:, 0]
factor_heart.shape, factor_brain.shape, agg_score.shape

((2065,), (2065,), (2065,))

In [4]:
data['factor_heart'] = factor_heart
data['factor_brain'] = factor_brain

In [5]:
data_new = data.filter(['factor_heart', 'factor_brain', 'agg_score'], axis=1)
data_new.head()

Unnamed: 0,factor_heart,factor_brain,agg_score
0,-1.364912,1.062501,1
1,-1.18644,-1.240734,3
2,1.164212,-1.648003,1
3,0.793579,-0.003127,2
4,-0.66112,0.919291,2


# 1. Mediation Analysis

Many recent publications have proved that changes in brain structure correlate with changes in vascular health, differences in heart CMR radiomics are associated with differences in brain imaging, and changes in heart CMR radiomics correlate with changes in vascular health. However, because these connections have been studied independently but not simultaneously, there are potential redundancies in the data. For this reason, causal mediation analysis plays an essential role by helping to identify intermediate variables (or mediators) that lie in the causal pathway between the treatment and the outcome [1].

To apply these causal mediation analyses we used the “Mediation” class from the “Statsmodels” library, the Python version for the “mediation R package”. This package implements a comprehensive suite of statistical tools for conducting such an analysis, and is organized into two distinct approaches. For the purpose of this thesis we used the model-based approach, in which researchers can estimate causal mediation effects and conduct sensitivity analysis under the standard research design [2].

<center><img src="Figures/mediation analysis.png"></center>

Generical graphical representation of a mediation analysis. $a$ and $b$ reflect the indirect path of the effect of $X$ on the outcome ($Y$) through the mediator ($M$), while $c'$ is the direct effect of $X$ on the outcome after the indirect path has been removed. The total effect of $X$ is the combined indirect and direct effects.

## 1.1 agg_score - Heart - Brain

First, we will study the mediating role that heart structure (aka heart radiomics) plays between cardiovascular risk (agg_score) and brain structure. In other words, how much of the connection between cardiovascular risk and brain structure can be explained by changes in heart structure.

<center><img src="Figures/agg - heart - brain.png"></center>

In [6]:
# Regression model for the outcome. Predictor variables include the treatment and the mediator
outcome_model = sm.OLS.from_formula("factor_brain ~  agg_score + factor_heart", data = data_new)

In [7]:
# Regression model for the mediator variable. Predictor variables include the treatment and any other variables of interest.
mediator_model = sm.OLS.from_formula("factor_heart ~ agg_score", data = data_new)

In [8]:
# Define the model class
med = Mediation(outcome_model, mediator_model, "agg_score", mediator = "factor_heart")

In [9]:
# Fit a regression model to assess mediation. Either ‘parametric’ or ‘bootstrap’.
# n_rep: The number of simulation replications.
med_result = med.fit(n_rep = 500)

The average causal mediation effect (ACME) represents the expected difference in the potential outcome when the mediator took the value that would realize under the treatment condition as opposed to the control condition, while the treatment status itself is held constant.

In [10]:
print(np.round(med_result.summary(), decimals = 3))

                          Estimate  Lower CI bound  Upper CI bound  P-value
ACME (control)               0.008          -0.001           0.019    0.088
ACME (treated)               0.008          -0.001           0.019    0.088
ADE (control)               -0.134          -0.172          -0.094    0.000
ADE (treated)               -0.134          -0.172          -0.094    0.000
Total effect                -0.126          -0.161          -0.088    0.000
Prop. mediated (control)    -0.063          -0.157           0.007    0.088
Prop. mediated (treated)    -0.063          -0.157           0.007    0.088
ACME (average)               0.008          -0.001           0.019    0.088
ADE (average)               -0.134          -0.172          -0.094    0.000
Prop. mediated (average)    -0.063          -0.157           0.007    0.088


## 1.2 agg_score - Brain - Heart 

Second, we will study the mediating role that brain structure (aka brain MRI indices) plays between cardiovascular risk (agg_score) and heart structure. In other words, how much of the connection between cardiovascular risk and heart structure can be explained by changes in brain structure.

<center><img src="Figures/agg - brain - heart.png"></center>

In [11]:
outcome_model = sm.OLS.from_formula("factor_heart ~  agg_score + factor_brain", data = data_new)

In [12]:
mediator_model = sm.OLS.from_formula("factor_brain ~ agg_score", data = data_new)

In [13]:
med = Mediation(outcome_model, mediator_model, "agg_score", mediator = "factor_brain")

In [14]:
med_result = med.fit(n_rep = 500)

In [15]:
print(np.round(med_result.summary(), decimals = 3))

                          Estimate  Lower CI bound  Upper CI bound  P-value
ACME (control)               0.005          -0.001           0.012    0.096
ACME (treated)               0.005          -0.001           0.012    0.096
ADE (control)               -0.223          -0.256          -0.187    0.000
ADE (treated)               -0.223          -0.256          -0.187    0.000
Total effect                -0.219          -0.253          -0.183    0.000
Prop. mediated (control)    -0.019          -0.055           0.003    0.096
Prop. mediated (treated)    -0.019          -0.055           0.003    0.096
ACME (average)               0.005          -0.001           0.012    0.096
ADE (average)               -0.223          -0.256          -0.187    0.000
Prop. mediated (average)    -0.019          -0.055           0.003    0.096


## 1.3 Heart - agg_score - Brain

Third, we will study the mediating role that cardiovascular risk (agg_score) plays between heart structure and brain structure. In other words, how much of the connection between heart structure and brain structure can be explained by changes in cardiovascular risk.

<center><img src="Figures/heart - agg - brain.png"></center>

In [16]:
outcome_model = sm.OLS.from_formula("factor_brain ~  agg_score + factor_heart", data = data_new)

In [17]:
mediator_model = sm.OLS.from_formula("agg_score ~ factor_heart", data = data_new)

In [18]:
med = Mediation(outcome_model, mediator_model, "factor_heart", mediator = "agg_score")

In [19]:
med_result = med.fit(n_rep = 500)

In [20]:
print(np.round(med_result.summary(), decimals = 3))

                          Estimate  Lower CI bound  Upper CI bound  P-value
ACME (control)               0.041           0.027           0.058    0.000
ACME (treated)               0.041           0.027           0.058    0.000
ADE (control)               -0.038          -0.084           0.004    0.092
ADE (treated)               -0.038          -0.084           0.004    0.092
Total effect                 0.003          -0.044           0.044    0.916
Prop. mediated (control)     0.969         -23.162          29.172    0.916
Prop. mediated (treated)     0.969         -23.162          29.172    0.916
ACME (average)               0.041           0.027           0.058    0.000
ADE (average)               -0.038          -0.084           0.004    0.092
Prop. mediated (average)     0.969         -23.162          29.172    0.916


## 1.4 Brain - agg_score - Heart

Lastly, we will study the mediating role that cardiovascular risk (agg_score) plays between brain structure and heart structure. In other words, how much of the connection between brain structure and heart structure can be explained by changes in cardiovascular risk.

<center><img src="Figures/brain - agg - heart.png"></center>

In [21]:
outcome_model = sm.OLS.from_formula("factor_heart ~  agg_score + factor_brain", data = data_new)

In [22]:
mediator_model = sm.OLS.from_formula("agg_score ~ factor_brain", data = data_new)

In [23]:
med = Mediation(outcome_model, mediator_model, "factor_brain", mediator = "agg_score")

In [24]:
med_result = med.fit(n_rep = 500)

In [25]:
print(np.round(med_result.summary(), decimals = 3))

                          Estimate  Lower CI bound  Upper CI bound  P-value
ACME (control)               0.040           0.021           0.061    0.000
ACME (treated)               0.040           0.021           0.061    0.000
ADE (control)               -0.039          -0.083           0.004    0.084
ADE (treated)               -0.039          -0.083           0.004    0.084
Total effect                 0.001          -0.049           0.047    0.968
Prop. mediated (control)     0.861         -18.446          28.710    0.968
Prop. mediated (treated)     0.861         -18.446          28.710    0.968
ACME (average)               0.040           0.021           0.061    0.000
ADE (average)               -0.039          -0.083           0.004    0.084
Prop. mediated (average)     0.861         -18.446          28.710    0.968


# 2. References

[1]: [Imai, Kosuke, Luke Keele, and Dustin Tingley (Oct. 2010)](https://doi.apa.org/doiLanding?doi=10.1037%2Fa0020761). A General Approach to Causal Mediation Analysis.

[2]: [Tingley, Dustin et al. (Oct. 2014)](https://www.jstatsoft.org/article/view/v059i05). Mediation: R Package for Causal Mediation Analysis.