# Automating Abstraction Error

After exploring the idea of abstraction [1] and transformation [2] in the first three notebooks, we now focus on the more practical problem of how, given two SCMs and an abstraction between them, we can compute the abstraction error between them.

In this notebook we will develop code to compute exactly the abstraction error as defined in [1] and in the previous notebook. To achieve our goal, we will write functions that will be dependant on the metric we want to adopt and on the set of nodes over which we will evaluate the abstraction error. We will present a series of implementation progressively considering richer sets over which to compute the abstraction error.

This notebook was developed in order to offer a practical implementation of the ideas presented in the framework introduced in [1], and to lay stonger foundations to further work with the idea of abstraction of causal models. The notebook is structured as follows: 
- Setup of standard and custom libraries (Section 2)
- Review of the definition of abstraction and notation (Section 3)
- Automated computation of abstraction error given pairs of variables in M1 connected by a directed path in M1 (Section 4)
- Automated computation of abstraction error given all pairs of variables in M1 (Section 5)
- Automated computation of abstraction error given pairs of variables in M1 connected by a directed path in M0 or M1 (Section 6)
- Automated computation of abstraction error given sets of variables in M1 connected by a directed path in M0 or M1 (Section 7)
- Integration of the code for the automated computation in the *Abstraction* class (Section 8)

DISCLAIMER 1: the notebook refers to ideas from *causality* and *category theory* for which only a quick definition is offered. Useful references for causality are [3,4], while for category theory are [5,6].

DISCLAIMER 2: mistakes are in all likelihood due to misunderstandings by the notebook author in reading [1]. Feedback very welcome! :)

# Setup

## Standard libraries
First of all we import standard libraries.

In [1]:
import numpy as np
import networkx as nx
import itertools
from scipy.spatial import distance

For reproducibility, and for discussing our results in this notebook, we set a random seed to $1985$.

In [2]:
np.random.seed(1985)

## SCM implemenation

So far we have used a custom class to define and run our SCMs; we defined this class in the previous notebooks and we exported it into *src/models.py*. We will now switch to a standard library to implement Bayesian networks (BN): [pgmpy](https://pgmpy.org/index.html). *pgmpy* will allow us to define our SCMs as BNs and use existing functions to perform inferences on our network.

To use *pgmpy* we import the relevant libraries:

In [3]:
from pgmpy.models import BayesianNetwork as BN
from pgmpy.factors.discrete import TabularCPD as cpd
from pgmpy.inference import VariableElimination

## Abstraction definition

To be compatible with our new objects, we need to slightly revise the *Abstraction* class. Before, our class expected to work with custom *FinStochSCM* objects; now it will work with *BayesianNetwork* objects.

Importantly, we had to redefine the components of an abstraction:
- *R* is now a list of strings, each one denoting a variable in the base model $\mathcal{M}$/$\mathtt{M0}$ deemed relevant;
- *a* is now a dictionary indexed by the relevant variables in *R*; it specifies on which variable in the abstracted model $\mathcal{M'}$/$\mathtt{M1}$ the relevant variables are mapped.
- *alphas* is now dictionary indexed by variables in the abstracted model $\mathcal{M'}$/$\mathtt{M1}$; it specifies a matrix defining how variable(s) in the base model $\mathcal{M}$/$\mathtt{M0}$ are mapped onto a variable in the abstracted model $\mathcal{M'}$/$\mathtt{M1}$.

A tweaked implementation of the *Abstraction* class able to deal with these inputs has been written in *src/pgmpy_models.py*. We import it.

In [4]:
from src.pgmpy_models import Abstraction

# Abstraction error refresher

Before moving to implementing a module for the automatic computation of error, let us review the definition of the abstraction error $e(\alpha)$.

Let's $\mathbf{A}$ and $\mathbf{B}$ two sets of variables in the base model $\mathcal{M}$/$\mathtt{M0}$; let's $X$ and $Y$ be the related variables in the abstracted model $\mathcal{M}'$/$\mathtt{M1}$. Let us now consider the following diagram:

$$
\begin{array}{ccc}
\mathcal{\mathcal{M}_{do}}\left[\mathbf{A}\right] & \overset{\mathcal{\mathcal{M}_{do}}\left[\phi_{\mathbf{B}}\right]}{\longrightarrow} & \mathcal{\mathcal{M}_{do}}\left[\mathbf{B}\right]\\
\sideset{}{\alpha_{X}}\downarrow &  & \sideset{}{\alpha_{Y}}\downarrow\\
\mathcal{\mathcal{M'}_{do}}\left[X\right] & \overset{\mathcal{\mathcal{M'}_{do}}\left[\phi_{Y}\right]}{\longrightarrow} & \mathcal{\mathcal{M'}_{do}}\left[Y\right]
\end{array}
$$

The abstraction error $E_\alpha(X,Y)$ with respect to the variables $X,Y$ is the JS distance under intervention between the upper path and the lower path:
$$
    D_{JS}(\alpha_Y \circ \mathcal{M}_{do}\left[\phi_{\mathbf{B}}\right], \mathcal{M'}_{do}\left[\phi_{Y}\right] \circ \alpha_X)
$$

The overall abstraction error $e(\alpha)$ is the highest abstraction error $E_\alpha(X,Y)$ considering all the meaningful pairs of variables $(X,Y)$ in the abstracted model $\mathcal{M}'$/$\mathtt{M1}$. We call the set of pair of variables $(X,Y)$ over which the error is estimated the **evaluation set** $\mathcal{J}$. 

## A note on the convention

To facilitate the understanding of the code and clarify how input models and abstractions should be defined, it is useful to write a short note on the representation convention for the matrices.

All morphisms we are dealing with (abstraction maps $\alpha_\cdot$ and mechanisms $\mathcal{M}[\phi_\cdot]$) are defined as stochastic matrices.

**Abstractions.** An abstraction $\alpha_X$ maps a set of variables $\mathbf{A}$ in the base model $\mathcal{M}$/$\mathtt{M0}$ to a single variable $X$ in the abstracted model $\mathcal{M}'$/$\mathtt{M1}$. 
Let $\mathcal{M}[A_1] \times \mathcal{M}[A_2] \times ... \times \mathcal{M}[A_n]$ be the domain of $\mathbf{A}$, and $\mathcal{M'}[X]$ the domain of $X$. Then, the matrix encoding $\alpha_X$ will be a two-dimensional matrix with dimensions:
$$
\left[\left|\mathcal{M'}[X]\right|, \left|\mathcal{M}[A_1]\right|\cdot\left|\mathcal{M}[A_2]\right|\cdot...\cdot\left|\mathcal{M}[A_n]\right|\right]
$$

For instance, consider $\alpha_{X}:A_1\times A_2\rightarrow X$ and suppose:
- $\mathcal{M}[A_1]=\{0,1,2\}$, 
- $\mathcal{M}[A_2]=\{0,1\}$, 
- $\mathcal{M'}[X]=\{0,1,2,3\}$

then, $\alpha_{X}$ is a stochastic matrix with dimensions $[4,6]$ and the following interpretation:
$$
\begin{array}{ccccccc}
 & \boldsymbol{(A_{1}=0,A_{2}=0)} & \boldsymbol{(A_{1}=0,A_{2}=1)} & \boldsymbol{(A_{1}=1,A_{2}=0)} & \boldsymbol{(A_{1}=1,A_{2}=1)} & \boldsymbol{(A_{1}=2,A_{2}=0)} & \boldsymbol{(A_{1}=2,A_{2}=1)}\\
\boldsymbol{(X=0)} & (0,0)\mapsto0 & (0,1)\mapsto0 & (1,0)\mapsto0 & (1,1)\mapsto0 & (2,0)\mapsto0 & (2,1)\mapsto0\\
\boldsymbol{(X=1)} & (0,0)\mapsto1 & (0,1)\mapsto1 & (1,0)\mapsto1 & (1,1)\mapsto1 & (2,0)\mapsto1 & (2,1)\mapsto1\\
\boldsymbol{(X=2)} & (0,0)\mapsto2 & (0,1)\mapsto2 & (1,0)\mapsto2 & (1,1)\mapsto2 & (2,0)\mapsto2 & (2,1)\mapsto2\\
\boldsymbol{(X=3)} & (0,0)\mapsto3 & (0,1)\mapsto3 & (1,0)\mapsto3 & (1,1)\mapsto3 & (2,0)\mapsto3 & (2,1)\mapsto3
\end{array}
$$

**Mechanisms.** A mechanism $\mathcal{M}[\phi_B]$ maps a set of variables $\mathbf{A}$ to a a set of variables $\mathbf{B}$ in the same model according to the conditional $P(\mathbf{B} \vert \mathbf{A})$. 
Let $\mathcal{M}[A_1] \times \mathcal{M}[A_2] \times ... \times \mathcal{M}[A_n]$ be the domain of $\mathbf{A}$, and $\mathcal{M}[B_1] \times \mathcal{M}[B_2] \times ... \times \mathcal{M}[B_m]$ the domain of $\mathbf{B}$. Then, the matrix encoding $\mathcal{M}[\phi_B]$ will be a two-dimensional matrix with dimensions:
$$
\left[\left|\mathcal{M}[B_1]\right|\cdot\left|\mathcal{M}[B_2]\right|\cdot...\cdot\left|\mathcal{M}[B_m]\right|, \left|\mathcal{M}[A_1]\right|\cdot\left|\mathcal{M}[A_2]\right|\cdot...\cdot\left|\mathcal{M}[A_n]\right|\right]
$$

For instance, consider $\mathcal{M}[\phi_B]:A_1\times A_2\rightarrow B_1\times B_2$ and suppose:
- $\mathcal{M}[A_1]=\{0,1,2\}$, 
- $\mathcal{M}[A_2]=\{0,1\}$, 
- $\mathcal{M}[B_1]=\{0,1\}$, 
- $\mathcal{M}[B_2]=\{0,1,2,3\}$, 

then, $\mathcal{M}[\phi_B]$ is a stochastic matrix with dimensions $[8,6]$ and the following interpretation:
$$
\begin{array}{ccccc}
 & \boldsymbol{(A_{1}=0,A_{2}=0)} & \boldsymbol{(A_{1}=0,A_{2}=1)} &  & \boldsymbol{(A_{1}=2,A_{2}=1)}\\
\boldsymbol{(B_{1}=0,B_{2}=0)} & P(B_{1}=0,B_{2}=0\vert A_{1}=0,A_{2}=0) & P(B_{1}=0,B_{2}=0\vert A_{1}=0,A_{2}=1) &  & P(B_{1}=0,B_{2}=0\vert A_{1}=2,A_{2}=1)\\
\boldsymbol{(B_{1}=0,B_{2}=1)} & P(B_{1}=0,B_{2}=1\vert A_{1}=0,A_{2}=0) & P(B_{1}=0,B_{2}=1\vert A_{1}=0,A_{2}=1) &  & P(B_{1}=0,B_{2}=1\vert A_{1}=2,A_{2}=1)\\
\boldsymbol{(B_{1}=0,B_{2}=2)} & P(B_{1}=0,B_{2}=2\vert A_{1}=0,A_{2}=0) & P(B_{1}=0,B_{2}=2\vert A_{1}=0,A_{2}=1) & ... & P(B_{1}=0,B_{2}=2\vert A_{1}=2,A_{2}=1)\\
\boldsymbol{(B_{1}=0,B_{2}=3)} & P(B_{1}=0,B_{2}=3\vert A_{1}=0,A_{2}=0) & P(B_{1}=0,B_{2}=3\vert A_{1}=0,A_{2}=1) &  & P(B_{1}=0,B_{2}=3\vert A_{1}=2,A_{2}=1)\\
\boldsymbol{(B_{1}=1,B_{2}=0)} & P(B_{1}=1,B_{2}=0\vert A_{1}=0,A_{2}=0) & P(B_{1}=1,B_{2}=0\vert A_{1}=0,A_{2}=1) &  & P(B_{1}=1,B_{2}=0\vert A_{1}=2,A_{2}=1)\\
\boldsymbol{(B_{1}=1,B_{2}=1)} & P(B_{1}=1,B_{2}=1\vert A_{1}=0,A_{2}=0) & P(B_{1}=1,B_{2}=1\vert A_{1}=0,A_{2}=1) &  & P(B_{1}=1,B_{2}=1\vert A_{1}=2,A_{2}=1)\\
\boldsymbol{(B_{1}=1,B_{2}=2)} & P(B_{1}=1,B_{2}=2\vert A_{1}=0,A_{2}=0) & P(B_{1}=1,B_{2}=2\vert A_{1}=0,A_{2}=1) &  & P(B_{1}=1,B_{2}=2\vert A_{1}=2,A_{2}=1)\\
\boldsymbol{(B_{1}=1,B_{2}=3)} & P(B_{1}=1,B_{2}=3\vert A_{1}=0,A_{2}=0) & P(B_{1}=1,B_{2}=3\vert A_{1}=0,A_{2}=1) &  & P(B_{1}=1,B_{2}=3\vert A_{1}=2,A_{2}=1)
\end{array}
$$

**Composition**. This encoding (codomain on the row, domain on the column; product of multiple finite sets in row-order) is consistent with the encoding used in *pgmpy* and it allows us to perform matrix multiplications in the same order of categorical composition.

# Evaluating abstraction considering only individual variables connected by a directed path

We start implementing our algorithm for the automatic computation of the abstraction error using as evaluation set $\mathcal{J}$ the pairs $(X,Y)$ such that there exist a directed path from $X$ to $Y$ in the abstracted model $\mathcal{M}'$/$\mathtt{M1}$.  

## Example 1

To drive our implementation, we work using as a reference the prototypical example we have studied in the previous notebooks. We consider models for the lung cancer scenario at different levels. A precise description of each one of them is available in the notebook *Categorical Abstraction.ipynb*.

### Model definition

We first define our two models. Notice how our models are now defined using *pgmpy*; we also take advantage of the *check_model()* function offered by *pgmpy* to verify that our model and its stochastic matrices are correct. 

In [5]:
M0 = BN([('Smoking','Tar'),('Tar','Cancer')])

cpdS = cpd(variable='Smoking',
          variable_card=2,
          values=[[.8],[.2]],
          evidence=None,
          evidence_card=None)
cpdT = cpd(variable='Tar',
          variable_card=2,
          values=[[1,.2],[0.,.8]],
          evidence=['Smoking'],
          evidence_card=[2])
cpdC = cpd(variable='Cancer',
          variable_card=2,
          values=[[.9,.6],[.1,.4]],
          evidence=['Tar'],
          evidence_card=[2])

M0.add_cpds(cpdS,cpdT,cpdC)
M0.check_model()

True

In [6]:
M1 = BN([('Smoking','Cancer')])

cpdS = cpd(variable='Smoking',
          variable_card=2,
          values=[[.8],[.2]],
          evidence=None,
          evidence_card=None)
cpdC = cpd(variable='Cancer',
          variable_card=2,
          values=[[.9,.66],[.1,.34]],
          evidence=['Smoking'],
          evidence_card=[2])

M1.add_cpds(cpdS,cpdC)
M1.check_model()

True

Next we define an abstraction. The definition of $(R,a,\alpha)$ has correspondingly changed to work with the *pgmpy* objects.

In [7]:
R = ['Smoking','Cancer']

a = {'Smoking': 'Smoking',
    'Cancer': 'Cancer'}
alphas = {'Smoking': np.eye(2),
         'Cancer': np.eye(2)}

In [8]:
A = Abstraction(M0,M1,R,a,alphas)

### Enumeration of all possible pairs in M1 with a directed path connection in M1

We now populate our evaluation set $\mathcal{J}$ computing all possible pairs (source,target) along directed paths in the abstracted model $\mathcal{M'}/\mathtt{M1}$. To do so, we call the *networkx* function *has_path()* among all possible pairs of nodes.

In [9]:
J = []

sources = list(A.M1.nodes())
targets = list(A.M1.nodes())

for s in sources:
    for t in list(set(targets)-{s}):
        if nx.has_path(A.M1,s,t):
            J.append((s,t))

In [10]:
print(J)

[('Smoking', 'Cancer')]


As obviosly clear, we have a single path in the abstracted model.

### Running the computational loop

We now come to the loop that will compute the abstraction error $e(\alpha)$. The loop iterates over all the pairs in the evaluation set $\mathcal{J}$ and computing the specific abstraction error.

Keeping in mind this diagram:
$$
\begin{array}{ccc}
\mathcal{\mathcal{M}_{do}}\left[\mathbf{A}\right] & \overset{\mathcal{\mathcal{M}_{do}}\left[\phi_{\mathbf{B}}\right]}{\longrightarrow} & \mathcal{\mathcal{M}_{do}}\left[\mathbf{B}\right]\\
\sideset{}{\alpha_{X}}\downarrow &  & \sideset{}{\alpha_{Y}}\downarrow\\
\mathcal{\mathcal{M}_{do}}\left[X\right] & \overset{\mathcal{\mathcal{M}_{do}}\left[\phi_{Y}\right]}{\longrightarrow} & \mathcal{\mathcal{M}_{do}}\left[Y\right]
\end{array}
$$

let us decompose and explain the loop:
1. Given a pair from the evaluation set $\mathcal{J}$, extract the source node $X$ and the target node $Y$ in the abstracted model $\mathcal{M'}/\mathtt{M1}$.
2. Using the mapping $a$, retrieve the corresponding source set $\mathbf{A}$ and target set $\mathbf{B}$ in the base model $\mathcal{M}/\mathtt{M0}$.
3. Compute the intervention $do(X)$ in the abstracted model $\mathcal{M'}/\mathtt{M1}$ and initialize the *pgmpy* inference engine on this intervened model.
4. Compute the intervention $do(\mathbf{A})$ in the base model $\mathcal{M}/\mathtt{M0}$ and initialize the *pgmpy* inference engine on this intervened model.
5. Compute the mechanism $\mathcal{M'}_{do}[\phi_Y] = P_{{M'}_{do}}(Y\vert X)$ as $\frac{P_{{M'}_{do}}(Y,X)}{P_{{M'}_{do}}(X)}$. The two distributions $P_{{M'}_{do}}(Y,X)$ and $P_{{M'}_{do}}(X)$ are evaluated using the *pgmpy* inference engine.
4. Extract the matrix form of $\mathcal{M'}_{do}[\phi_Y] = P_{{M'}_{do}}(Y\vert X)$.
5. Check if the matrix correctly represents $Y$ on the rows and $X$ on the columns.
6. Compute the mechanism $\mathcal{M}_{do}[\phi_\mathbf{B}] = P_{{M}_{do}}(\mathbf{B}\vert \mathbf{A})$ as $\frac{P_{{M}_{do}}(\mathbf{B},\mathbf{A})}{P_{{M}_{do}}(\mathbf{A})}$. The two distributions $P_{{M}_{do}}(\mathbf{B},\mathbf{A})$ and $P_{{M}_{do}}(\mathbf{A})$ are evaluated using the *pgmpy* inference engine.
7. Extract the matrix form of $\mathcal{M}_{do}[\phi_\mathbf{B}] = P_{{M'}_{do}}(\mathbf{B}\vert \mathbf{A})$.
8. Reorder the matrix to be sure that the variables in $\mathbf{B}$ are coming before the variables in $\mathbf{A}$, and that they are in order they are supposed to be.
9. Compact the matrix reducing it to two dimensions: on the rows we have the variables of $\mathbf{B}$, on the columns the varibles of $\mathbf{A}$.
10. Extract the matrices for the abstractions of interest: $\alpha_X$ and $\alpha_Y$.
11. Compute the two alternative path on the diagram by composing $\mathcal{M'}_{do}[\phi_Y] \circ \alpha_X$ (lower path) and $\alpha_Y \circ \mathcal{M}_{do}[\phi_\mathbf{B}]$ (upper path) via a simple matrix product.
12. For every possible intervention on $\mathbf{A}$ compute the JS distance between the two paths. This is equivalent to consider every column in the matrix encoding of the two paths and computing their JS distance.
13. Select the highest distance with respect to all the interventions as the error $E_\alpha(Y,X)$.
14. Select the highest distance with repsect to all pairs $(X,Y)$ as the error $e(\alpha)$.

In [11]:
abstraction_errors = []

for pair in J:
    # Get nodes in the abstracted model
    M1_source = [pair[0]]
    M1_target = [pair[1]]
    print('\nM1: {0} -> {1}'.format(M1_source,M1_target))

    # Get nodes in the base model
    M0_source = A.invert_a(M1_source)
    M0_target = A.invert_a(M1_target)
    print('M0: {0} -> {1}'.format(M0_source,M0_target))
    
    # Perform intenrventions in the abstracted model and setup the inference engine
    M1do = A.M1.do(M1_source)
    inferM1 = VariableElimination(M1do)
    
    # Perform intenrventions in the base model and setup the inference engine
    M0do = A.M0.do(M0_source)
    inferM0 = VariableElimination(M0do)
    
    # Evaluate the mechanism in the abstracted model
    M1_joint_TS = inferM1.query(M1_target+M1_source,show_progress=False)
    M1_joint_S = inferM1.query(M1_source,show_progress=False)
    M1_cond_TS = M1_joint_TS / M1_joint_S
    
    # Extract the matrix
    M1_cond_TS_val = M1_cond_TS.values

    # Check ordering
    if (M1_cond_TS.variables[0] != M1_target[0]):
        M1_cond_TS_val = M1_cond_TS_val.T

    # Evaluate the mechanism in the base model
    M0_joint_TS = inferM0.query(M0_target+M0_source,show_progress=False)
    M0_joint_S = inferM0.query(M0_source,show_progress=False)
    M0_cond_TS = M0_joint_TS / M0_joint_S
    
    # Extract the matrix
    M0_cond_TS_val = M0_cond_TS.values

    # Reorder the matrix
    old_indexes = range(len(M0_target+M0_source))
    new_indexes = [(M0_target+M0_source).index(i) for i in M0_joint_TS.variables]
    M0_cond_TS_val = np.moveaxis(M0_cond_TS_val, old_indexes, new_indexes)

    # Compact the matrix
    M0_target_cards=[A.M0.get_cardinality(t) for t in M0_target]
    M0_target_card = np.prod(M0_target_cards)
    M0_source_cards=[A.M0.get_cardinality(s) for s in M0_source]
    M0_source_card = np.prod(M0_source_cards)
    M0_cond_TS_val = M0_cond_TS_val.reshape(M0_target_card,M0_source_card)
    
    # Extract the alphas
    alpha_S = A.alphas[M1_source[0]]
    alpha_T = A.alphas[M1_target[0]]
    
    # Evaluate the paths on the diagram
    lowerpath = np.dot(M1_cond_TS_val,alpha_S)
    upperpath = np.dot(alpha_T,M0_cond_TS_val)
    
    # Compute abstraction error for every possible intervention
    distances = []
    for c in range(lowerpath.shape[1]):
        distances.append( distance.jensenshannon(lowerpath[:,c],upperpath[:,c]) )
    print('All JS distances: {0}'.format(distances))
    
    # Select the greatest distance over all interventions
    print('\nAbstraction error: {0}'.format(np.max(distances)))
    abstraction_errors.append(np.max(distances))

# Select the greatest distance over all pairs considered
print('\n\nOVERALL ABSTRACTION ERROR: {0}'.format(np.max(abstraction_errors)))


M1: ['Smoking'] -> ['Cancer']
M0: ['Smoking'] -> ['Cancer']
All JS distances: [3.332000937312528e-09, 9.599401598218922e-09]

Abstraction error: 9.599401598218922e-09


OVERALL ABSTRACTION ERROR: 9.599401598218922e-09


As we knew, this example is a case of zero-error abstraction, and indeed our result is virtually zero once we account for numerical approximation.

# Evaluating abstraction considering all pairs of nodes

We now move on using the same algorithm with a richer evaluation set $\mathcal{J}$ containg all the pairs $(X,Y)$ in the abstracted model $\mathcal{M}'$/$\mathtt{M1}$, independently from the existence of a directed path.

## Example 2

We still rely on the prototypical case study on the lung cancer scenario. This time we take into consideration the last extreme abstraction that removes all edges from the abstracted model. If we were to consider an evaluation set $\mathcal{J}$ that contains only directed paths in the abstracted model $\mathcal{M}''$/$\mathtt{M2}$ there would be no path; however in the notebook *Categorical Abstraction.ipynb* we computed a distance even in the absence of a directed path. For this reason in this example we enlarge the evaluation set $\mathcal{J}$.

### Model definition

First, we define the models and the abstraction.

In [12]:
M0 = BN([('Smoking','Cancer')])

cpdS = cpd(variable='Smoking',
          variable_card=2,
          values=[[.8],[.2]],
          evidence=None,
          evidence_card=None)
cpdC = cpd(variable='Cancer',
          variable_card=2,
          values=[[.9,.66],[.1,.34]],
          evidence=['Smoking'],
          evidence_card=[2])

M0.add_cpds(cpdS,cpdC)
M0.check_model()

True

In [13]:
M1 = BN()
M1.add_node('Smoking')
M1.add_node('Cancer')

cpdS = cpd(variable='Smoking',
          variable_card=2,
          values=[[.8],[.2]],
          evidence=None,
          evidence_card=None)
cpdC = cpd(variable='Cancer',
          variable_card=2,
          values=[[.852],[.148]],
          evidence=None,
          evidence_card=None)

M1.add_cpds(cpdS,cpdC)
M1.check_model()

True

In [14]:
R = ['Smoking','Cancer']

a = {'Smoking': 'Smoking',
    'Cancer': 'Cancer'}
alphas = {'Smoking': np.eye(2),
         'Cancer': np.eye(2)}

In [15]:
A = Abstraction(M0,M1,R,a,alphas)

### Enumeration of all possible pairs in M1

Then we populate the evaluation set $\mathcal{J}$ consider all possible pairs $(X,Y)$ in the abstracted model $\mathcal{M'}/\mathtt{M1}$. To do so, we just call *itertools.permutations()* to retrieve all the permutations of two elements.

In [16]:
J = list(itertools.permutations(A.M1.nodes(),2))

In [17]:
print(J)

[('Smoking', 'Cancer'), ('Cancer', 'Smoking')]


Notice that we consider the pair $(Smoking, Cancer)$ even if in the abstracted model there is no link between them. Moreover we consider also the pair $(Cancer, Smoking)$ although it is debatable whether we should consider this inverted relation that requires estimating a mechanism $\mathcal{M}[\phi_{Smoking}] = P(Smoking \vert Cancer)$ that hardly has a causal interpretation.

### Running the computational loop

We run the same evaluatin loop. See above for an explanation of every step.

In [18]:
abstraction_errors = []

for pair in J:
    # Get nodes in the abstracted model
    M1_source = [pair[0]]
    M1_target = [pair[1]]
    print('\nM1: {0} -> {1}'.format(M1_source,M1_target))

    # Get nodes in the base model
    M0_source = A.invert_a(M1_source)
    M0_target = A.invert_a(M1_target)
    print('M0: {0} -> {1}'.format(M0_source,M0_target))
    
    # Perform intenrventions in the abstracted model and setup the inference engine
    M1do = A.M1.do(M1_source)
    inferM1 = VariableElimination(M1do)
    
    # Perform intenrventions in the base model and setup the inference engine
    M0do = A.M0.do(M0_source)
    inferM0 = VariableElimination(M0do)
    
    # Evaluate the mechanism in the abstracted model
    M1_joint_TS = inferM1.query(M1_target+M1_source,show_progress=False)
    M1_joint_S = inferM1.query(M1_source,show_progress=False)
    M1_cond_TS = M1_joint_TS / M1_joint_S
    
    # Extract the matrix
    M1_cond_TS_val = M1_cond_TS.values

    # Check ordering
    if (M1_cond_TS.variables[0] != M1_target[0]):
        M1_cond_TS_val = M1_cond_TS_val.T

    # Evaluate the mechanism in the base model
    M0_joint_TS = inferM0.query(M0_target+M0_source,show_progress=False)
    M0_joint_S = inferM0.query(M0_source,show_progress=False)
    M0_cond_TS = M0_joint_TS / M0_joint_S
    
    # Extract the matrix
    M0_cond_TS_val = M0_cond_TS.values

    # Reorder the matrix
    old_indexes = range(len(M0_target+M0_source))
    new_indexes = [(M0_target+M0_source).index(i) for i in M0_joint_TS.variables]
    M0_cond_TS_val = np.moveaxis(M0_cond_TS_val, old_indexes, new_indexes)

    # Compact the matrix
    M0_target_cards=[A.M0.get_cardinality(t) for t in M0_target]
    M0_target_card = np.prod(M0_target_cards)
    M0_source_cards=[A.M0.get_cardinality(s) for s in M0_source]
    M0_source_card = np.prod(M0_source_cards)
    M0_cond_TS_val = M0_cond_TS_val.reshape(M0_target_card,M0_source_card)

    # Extract the alphas
    alpha_S = A.alphas[M1_source[0]]
    alpha_T = A.alphas[M1_target[0]]
    
    # Evaluate the paths on the diagram
    lowerpath = np.dot(M1_cond_TS_val,alpha_S)
    upperpath = np.dot(alpha_T,M0_cond_TS_val)
    
    # Compute abstraction error for every possible intervention
    distances = []
    for c in range(lowerpath.shape[1]):
        distances.append( distance.jensenshannon(lowerpath[:,c],upperpath[:,c]) )
    print('All JS distances: {0}'.format(distances))
    
    # Select the greatest distance over all interventions
    print('\nAbstraction error: {0}'.format(np.max(distances)))
    abstraction_errors.append(np.max(distances))

# Select the greatest distance over all pairs considered
print('\n\nOVERALL ABSTRACTION ERROR: {0}'.format(np.max(abstraction_errors)))


M1: ['Smoking'] -> ['Cancer']
M0: ['Smoking'] -> ['Cancer']
All JS distances: [0.051634404110767126, 0.15974085850231143]

Abstraction error: 0.15974085850231143

M1: ['Cancer'] -> ['Smoking']
M0: ['Cancer'] -> ['Smoking']
All JS distances: [0.0, 0.0]

Abstraction error: 0.0


OVERALL ABSTRACTION ERROR: 0.15974085850231143


Notice that, looking at the pair $(Smoking,Cancer)$ we obtain a result in line with what we obtained in the notebook *Categorical Abstraction.ipynb*. 
The result for the pair $(Cancer,Smoking)$ may look at first counterintuive: why the error is in this case equal to 0? The result actually make sense because we are measuring $P(Smoking\vert do(Cancer))$ in the two models $\mathcal{M}/M0$ and $\mathcal{M'}/M1$. In $\mathcal{M'}/M1$, $Smoking$ and $Cancer$ are independent from the beginning; in $\mathcal{M}/M0$, $Smoking$ and $Cancer$ are made independent by the intervention. So we are just comparing $P(Smoking)$ in $\mathcal{M}/M0$ and $\mathcal{M'}/M1$, and this marginal happens to have been defined equally in the two models.

Notice, also, that if we were to print $\mathcal{M'}[\phi_{Cancer}] = P(Cancer \vert Smoking)$ (or $\mathcal{M'}[\phi_{Smoking}] = P(Smoking \vert Cancer)$) we could see from its matrix form that it encodes a conditional distribution in which $Smoking$ and $Cancer$ are independent in the abstracted model $\mathcal{M'}/M1$. The matrix has the expected shape $[2,2]$ but it is columns are identical, suggesting that the value on the column does not matter. This makes sense ($Smoking$ and $Cancer$ have no connection in the graph, they are independent), and it is consistent with the formalism and result in the notebook *Categorical Abstraction.ipynb*.

# Evaluating abstraction considering pairs of node potentially having a causal relation

Considering an evaluation set $\mathcal{J}$ containing all the pairs $(X,Y)$ in the abstracted model $\mathcal{M}'$/$\mathtt{M1}$ seems to generate a set inclusing meaninglss pairs. We will then refine the evaluation set $\mathcal{J}$ containg all the pairs $(X,Y)$ that are connected by a directed path in $\mathcal{M}'$/$\mathtt{M1}$ or whose counterimages $(\mathbf{A},\mathbf{B})$ are connected by a directed path in $\mathcal{M}$/$\mathtt{M0}$.

## Example 3

We instantiate now a new test models containing more nodes and more challenging mappings.

### Model definition

First, we define the models and the abstraction.

In [21]:
def generate_values(c,d):
    val = np.random.rand(c,d)
    return val / np.sum(val,axis=0)

In [22]:
M0 = BN([('A','C'), ('E','B'), ('E','C'), ('B','C'), ('C','D'), ('C','F')])

cpdA = cpd(variable='A',
          variable_card=3,
          values=generate_values(3,1),
          evidence=None,
          evidence_card=None)

cpdE = cpd(variable='E',
          variable_card=2,
          values=generate_values(2,1),
          evidence=None,
          evidence_card=None)

cpdB = cpd(variable='B',
          variable_card=4,
          values=generate_values(4,2),
          evidence=['E'],
          evidence_card=[2])

cpdC = cpd(variable='C',
          variable_card=6,
          values=generate_values(6,24),
          evidence=['A','B','E'],
          evidence_card=[3,4,2])

cpdD = cpd(variable='D',
          variable_card=3,
          values=generate_values(3,6),
          evidence=['C'],
          evidence_card=[6])

cpdF = cpd(variable='F',
          variable_card=2,
          values=generate_values(2,6),
          evidence=['C'],
          evidence_card=[6])

M0.add_cpds(cpdA,cpdB,cpdC,cpdD,cpdE,cpdF)
M0.check_model()

True

In [23]:
M1 = BN([('X','Y'), ('Y','Z'), ('Y','W'), ('X','Z')])

cpdX = cpd(variable='X',
          variable_card=3,
          values=generate_values(3,1),
          evidence=None,
          evidence_card=None)

cpdY = cpd(variable='Y',
          variable_card=4,
          values=generate_values(4,3),
          evidence=['X'],
          evidence_card=[3])

cpdZ = cpd(variable='Z',
          variable_card=2,
          values=generate_values(2,12),
          evidence=['Y','X'],
          evidence_card=[4,3])

cpdW = cpd(variable='W',
          variable_card=2,
          values=generate_values(2,4),
          evidence=['Y'],
          evidence_card=[4])

M1.add_cpds(cpdX,cpdY,cpdZ,cpdW)
M1.check_model()

True

In [24]:
R = ['A','B', 'C', 'D', 'F']

a = {'A': 'X',
     'B': 'X',
     'C': 'Y',
     'D': 'Z',
     'F': 'W'}
alphas = {'X': generate_values(3,12),
         'Y': generate_values(4,6),
         'Z': generate_values(2,3),
         'W': generate_values(2,2)}

In [25]:
A = Abstraction(M0,M1,R,a,alphas)

### Enumeration of all possible pairs in M1 with a directed path connection in M1 or M0

We now populate our evaluation set $\mathcal{J}$ computing all possible pairs (source,target) along directed paths in the abstracted model $\mathcal{M'}/\mathtt{M1}$ or in the base model $\mathcal{M'}/\mathtt{M1}$. 

We start implementing a helper function *check_path_between_sets()* to check the existence of a path between two set of nodes $\mathbf{A}$ and $\mathbf{B}$. This function creates an artificial parent $A'$ to all the nodes in $\mathbf{A}$ and an artifical child $B'$ to all the nodes in $\mathbf{B}$ and then we call the *networkx* function *has_path()* between $A'$ and $B'$.

In [26]:
def check_path_between_sets(G,sources,targets):
    augmentedG = G.copy()
            
    augmented_s = 'augmented_s_'+str(np.random.randint(10**6))
    augmented_t = 'augmented_t_'+str(np.random.randint(10**6))
    augmentedG.add_node(augmented_s)
    augmentedG.add_node(augmented_t)

    [augmentedG.add_edge(augmented_s,s) for s in sources]
    [augmentedG.add_edge(t,augmented_t) for t in targets]

    return nx.has_path(augmentedG,augmented_s,augmented_t)  

We can populate $\mathcal{J}$. To verify if there is a path in $\mathcal{M'}/\mathtt{M1}$ between node $X$ and $Y$ we use the same method we used before of calling the *networkx* function *has_path()*. To verify if there is a path in $\mathcal{M}/\mathtt{M0}$ between the set of nodes $\mathbf{A}$ and $\mathbf{B}$ we rely on our helper function.

In [27]:
J = []

sources = list(A.M1.nodes())
targets = list(A.M1.nodes())

for s in sources:
    for t in list(set(targets)-{s}):
        if nx.has_path(A.M1,s,t):
            J.append((s,t))
        else:
            M0_sources = A.invert_a(s)
            M0_targets = A.invert_a(t)            
            if check_path_between_sets(A.M0,M0_sources,M0_targets):
                J.append((s,t))

In [28]:
print(J)

[('X', 'W'), ('X', 'Y'), ('X', 'Z'), ('Y', 'W'), ('Y', 'Z')]


### Running the computational loop

In [29]:
abstraction_errors = []

for pair in J:
    # Get nodes in the abstracted model
    M1_source = [pair[0]]
    M1_target = [pair[1]]
    print('\nM1: {0} -> {1}'.format(M1_source,M1_target))

    # Get nodes in the base model
    M0_source = A.invert_a(M1_source)
    M0_target = A.invert_a(M1_target)
    print('M0: {0} -> {1}'.format(M0_source,M0_target))
    
    # Perform intenrventions in the abstracted model and setup the inference engine
    M1do = A.M1.do(M1_source)
    inferM1 = VariableElimination(M1do)
    
    # Perform intenrventions in the base model and setup the inference engine
    M0do = A.M0.do(M0_source)
    inferM0 = VariableElimination(M0do)
    
    # Evaluate the mechanism in the abstracted model
    M1_joint_TS = inferM1.query(M1_target+M1_source,show_progress=False)
    M1_joint_S = inferM1.query(M1_source,show_progress=False)
    M1_cond_TS = M1_joint_TS / M1_joint_S
    
    # Extract the matrix
    M1_cond_TS_val = M1_cond_TS.values

    # Check ordering
    if (M1_cond_TS.variables[0] != M1_target[0]):
        M1_cond_TS_val = M1_cond_TS_val.T

    # Evaluate the mechanism in the base model
    M0_joint_TS = inferM0.query(M0_target+M0_source,show_progress=False)
    M0_joint_S = inferM0.query(M0_source,show_progress=False)
    M0_cond_TS = M0_joint_TS / M0_joint_S
    
    # Extract the matrix
    M0_cond_TS_val = M0_cond_TS.values

    # Reorder the matrix
    old_indexes = range(len(M0_target+M0_source))
    new_indexes = [(M0_target+M0_source).index(i) for i in M0_joint_TS.variables]
    M0_cond_TS_val = np.moveaxis(M0_cond_TS_val, old_indexes, new_indexes)

    # Compact the matrix
    M0_target_cards=[A.M0.get_cardinality(t) for t in M0_target]
    M0_target_card = np.prod(M0_target_cards)
    M0_source_cards=[A.M0.get_cardinality(s) for s in M0_source]
    M0_source_card = np.prod(M0_source_cards)
    M0_cond_TS_val = M0_cond_TS_val.reshape(M0_target_card,M0_source_card)

    # Extract the alphas
    alpha_S = A.alphas[M1_source[0]]
    alpha_T = A.alphas[M1_target[0]]
    
    # Evaluate the paths on the diagram
    lowerpath = np.dot(M1_cond_TS_val,alpha_S)
    upperpath = np.dot(alpha_T,M0_cond_TS_val)
    
    # Compute abstraction error for every possible intervention
    distances = []
    for c in range(lowerpath.shape[1]):
        distances.append( distance.jensenshannon(lowerpath[:,c],upperpath[:,c]) )
    print('All JS distances: {0}'.format(distances))
    
    # Select the greatest distance over all interventions
    print('\nAbstraction error: {0}'.format(np.max(distances)))
    abstraction_errors.append(np.max(distances))

# Select the greatest distance over all pairs considered
print('\n\nOVERALL ABSTRACTION ERROR: {0}'.format(np.max(abstraction_errors)))


M1: ['X'] -> ['W']
M0: ['A', 'B'] -> ['F']
All JS distances: [0.13815224611327184, 0.1273757964993583, 0.08645541414919178, 0.0645651747041587, 0.10819276288057657, 0.11061149338691058, 0.14576258686126095, 0.08799726080882544, 0.10012181605278463, 0.12471385502651094, 0.058890191373457385, 0.10853268107410888]

Abstraction error: 0.14576258686126095

M1: ['X'] -> ['Y']
M0: ['A', 'B'] -> ['C']
All JS distances: [0.21778220890672814, 0.2321001717790131, 0.25722144019571525, 0.28137584862863363, 0.24263223583668658, 0.21972692121334755, 0.26939635221337843, 0.2202617176329589, 0.2064007881979404, 0.25324368242404216, 0.23664608275648263, 0.23403874351638465]

Abstraction error: 0.28137584862863363

M1: ['X'] -> ['Z']
M0: ['A', 'B'] -> ['D']
All JS distances: [0.10117802163067492, 0.11374832780232973, 0.11946917732263437, 0.12922938441951445, 0.11490721406962395, 0.10766192909711927, 0.1142860727299889, 0.1084969285754312, 0.10358194553494722, 0.11202183352132038, 0.1016135164495766, 0.1

# Evaluating abstraction considering sets of node potentially having a causal relation

The actual definition of abstraction error requires considering not only all potential pairs $(X,Y)$ in the abstracted model $\mathcal{M}'$/$\mathtt{M1}$, but every disjoint set of variables $(\mathbf{X},\mathbf{Y})$ in $\mathcal{M}'$/$\mathtt{M1}$. Here, we will populate the evaluation set $\mathcal{J}$ containg with all the sets $(\mathbf{X},\mathbf{Y})$ that are connected by a directed path in $\mathcal{M}'$/$\mathtt{M1}$ or whose counterimages $(\mathbf{A},\mathbf{B})$ are connected by a directed path in $\mathcal{M}$/$\mathtt{M0}$.

## Example 4

We instantiate the same models as in Example 3.

### Model definition

We redefine the models and the abstraction.

In [33]:
M0 = BN([('A','C'), ('E','B'), ('E','C'), ('B','C'), ('C','D'), ('C','F')])

cpdA = cpd(variable='A',
          variable_card=3,
          values=generate_values(3,1),
          evidence=None,
          evidence_card=None)

cpdE = cpd(variable='E',
          variable_card=2,
          values=generate_values(2,1),
          evidence=None,
          evidence_card=None)

cpdB = cpd(variable='B',
          variable_card=4,
          values=generate_values(4,2),
          evidence=['E'],
          evidence_card=[2])

cpdC = cpd(variable='C',
          variable_card=6,
          values=generate_values(6,24),
          evidence=['A','B','E'],
          evidence_card=[3,4,2])

cpdD = cpd(variable='D',
          variable_card=3,
          values=generate_values(3,6),
          evidence=['C'],
          evidence_card=[6])

cpdF = cpd(variable='F',
          variable_card=2,
          values=generate_values(2,6),
          evidence=['C'],
          evidence_card=[6])

M0.add_cpds(cpdA,cpdB,cpdC,cpdD,cpdE,cpdF)
M0.check_model()

True

In [34]:
M1 = BN([('X','Y'), ('Y','Z'), ('Y','W'), ('X','Z')])

cpdX = cpd(variable='X',
          variable_card=3,
          values=generate_values(3,1),
          evidence=None,
          evidence_card=None)

cpdY = cpd(variable='Y',
          variable_card=4,
          values=generate_values(4,3),
          evidence=['X'],
          evidence_card=[3])

cpdZ = cpd(variable='Z',
          variable_card=2,
          values=generate_values(2,12),
          evidence=['Y','X'],
          evidence_card=[4,3])

cpdW = cpd(variable='W',
          variable_card=2,
          values=generate_values(2,4),
          evidence=['Y'],
          evidence_card=[4])

M1.add_cpds(cpdX,cpdY,cpdZ,cpdW)
M1.check_model()

True

In [35]:
R = ['A','B', 'C', 'D', 'F']

a = {'A': 'X',
     'B': 'X',
     'C': 'Y',
     'D': 'Z',
     'F': 'W'}
alphas = {'X': generate_values(3,12),
         'Y': generate_values(4,6),
         'Z': generate_values(2,3),
         'W': generate_values(2,2)}

In [36]:
A = Abstraction(M0,M1,R,a,alphas)

### Enumeration of all possible sets in M1 with directed path connections in M1 or M0

We now want to populate our evaluation set $\mathcal{J}$ considering not only pairs of nodes in M1, but all possible ways of generating two disjoint subsets out of $\mathcal{X}_\mathcal{M'}$.

To do so, we start implementing a simple function *powerset()* to compute the powersets of $\mathcal{X}_\mathcal{M'}$ following the recipe provided in the *itertools* documentation.

In [37]:
def powerset(iterable):
    s = list(iterable)
    return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))

We then populate the evaluation set $\mathcal{J}$ according to the following step:
1. Consider a pair of sets from the powerset
2. Check whether the sets are disjoint:
    - If yes, move to step 3
    - If not, go back to step 1
3. Check if the two sets of nodes have a directed path in \mathtt{M1}:
    - If yes, add them to $\mathcal{J}$ and go back to step 1
    - If not, go to step 4
4. Find the corresponding sets in M0 (Notice that since $a$ is a function we already know these two sets will also be disjoint)
5. Check if the two sets of nodes have a directed path in \mathtt{M0}:
    - If yes, add them to $\mathcal{J}$ and go back to step 1
    - If not, go back to step 1
    
This procedure evaluate all possible pairings of elements in the powerset, and it selects only the right ones for the evaluation set $\mathcal{J}$. 

In [38]:
J = []

sets = list(powerset(A.M1.nodes()))
sets.remove(())

for i in sets:
    for j in sets:
        M1_sources = list(i)
        M1_targets = list(j)
        if not(any(x in M1_sources for x in M1_targets)):            
            if check_path_between_sets(A.M1,M1_sources,M1_targets):
                print('- Checking {0} -> {1}: True'.format(M1_sources,M1_targets))
                J.append([M1_sources,M1_targets])
            else:
                print('- Checking {0} -> {1}: False'.format(M1_sources,M1_targets))
                M0_sources = A.invert_a(M1_sources)
                M0_targets = A.invert_a(M1_targets)
                if check_path_between_sets(A.M0,M0_sources,M0_targets):
                    print('---- Checking {0} -> {1}: True'.format(M0_sources,M0_targets))
                    J.append([M1_sources,M1_targets])
                else:
                    print('---- Checking {0} -> {1}: False'.format(M0_sources,M0_targets))
                        
print('\n {0} legitimate pairs of sets out of {1} possbile pairs of sets'.format(len(J),len(sets)**2))    

- Checking ['X'] -> ['Y']: True
- Checking ['X'] -> ['Z']: True
- Checking ['X'] -> ['W']: True
- Checking ['X'] -> ['Y', 'Z']: True
- Checking ['X'] -> ['Y', 'W']: True
- Checking ['X'] -> ['Z', 'W']: True
- Checking ['X'] -> ['Y', 'Z', 'W']: True
- Checking ['Y'] -> ['X']: False
---- Checking ['C'] -> ['A', 'B']: False
- Checking ['Y'] -> ['Z']: True
- Checking ['Y'] -> ['W']: True
- Checking ['Y'] -> ['X', 'Z']: True
- Checking ['Y'] -> ['X', 'W']: True
- Checking ['Y'] -> ['Z', 'W']: True
- Checking ['Y'] -> ['X', 'Z', 'W']: True
- Checking ['Z'] -> ['X']: False
---- Checking ['D'] -> ['A', 'B']: False
- Checking ['Z'] -> ['Y']: False
---- Checking ['D'] -> ['C']: False
- Checking ['Z'] -> ['W']: False
---- Checking ['D'] -> ['F']: False
- Checking ['Z'] -> ['X', 'Y']: False
---- Checking ['D'] -> ['A', 'B', 'C']: False
- Checking ['Z'] -> ['X', 'W']: False
---- Checking ['D'] -> ['A', 'B', 'F']: False
- Checking ['Z'] -> ['Y', 'W']: False
---- Checking ['D'] -> ['C', 'F']: False
-

In [39]:
print(J)

[[['X'], ['Y']], [['X'], ['Z']], [['X'], ['W']], [['X'], ['Y', 'Z']], [['X'], ['Y', 'W']], [['X'], ['Z', 'W']], [['X'], ['Y', 'Z', 'W']], [['Y'], ['Z']], [['Y'], ['W']], [['Y'], ['X', 'Z']], [['Y'], ['X', 'W']], [['Y'], ['Z', 'W']], [['Y'], ['X', 'Z', 'W']], [['X', 'Y'], ['Z']], [['X', 'Y'], ['W']], [['X', 'Y'], ['Z', 'W']], [['X', 'Z'], ['Y']], [['X', 'Z'], ['W']], [['X', 'Z'], ['Y', 'W']], [['X', 'W'], ['Y']], [['X', 'W'], ['Z']], [['X', 'W'], ['Y', 'Z']], [['Y', 'Z'], ['W']], [['Y', 'Z'], ['X', 'W']], [['Y', 'W'], ['Z']], [['Y', 'W'], ['X', 'Z']], [['X', 'Y', 'Z'], ['W']], [['X', 'Y', 'W'], ['Z']], [['X', 'Z', 'W'], ['Y']]]


### Running the computational loop

Running the computational loop is conceptually equivalent to what we have done so far, but we have now to deal with the fact that also when working within the high-level model we may have to consider sets of variables and not just individual nodes.

Instead of the diagram we have seen before:
$$
\begin{array}{ccc}
\mathcal{\mathcal{M}_{do}}\left[\mathbf{A}\right] & \overset{\mathcal{\mathcal{M}_{do}}\left[\phi_{\mathbf{B}}\right]}{\longrightarrow} & \mathcal{\mathcal{M}_{do}}\left[\mathbf{B}\right]\\
\sideset{}{\alpha_{X}}\downarrow &  & \sideset{}{\alpha_{Y}}\downarrow\\
\mathcal{\mathcal{M}_{do}}\left[X\right] & \overset{\mathcal{\mathcal{M}_{do}}\left[\phi_{Y}\right]}{\longrightarrow} & \mathcal{\mathcal{M}_{do}}\left[Y\right]
\end{array}
$$
we have to consider:
$$
\begin{array}{ccc}
\mathcal{\mathcal{M}_{do}}\left[\mathbf{A}\right] & \overset{\mathcal{\mathcal{M}_{do}}\left[\phi_{\mathbf{B}}\right]}{\longrightarrow} & \mathcal{\mathcal{M}_{do}}\left[\mathbf{B}\right]\\
\sideset{}{\alpha_{\mathbf{X}}}\downarrow &  & \sideset{}{\alpha_{\mathbf{Y}}}\downarrow\\
\mathcal{\mathcal{M}_{do}}\left[\mathbf{X}\right] & \overset{\mathcal{\mathcal{M}_{do}}\left[\phi_{\mathbf{Y}}\right]}{\longrightarrow} & \mathcal{\mathcal{M}_{do}}\left[\mathbf{Y}\right]
\end{array}
$$
where $X$ and $Y$ has now been typed in boldface, $\mathbf{X},\mathbf{Y}$ to denote we are working with sets.

This requires us assemble abstractions ($\alpha_{\mathbf{X}},\alpha_{\mathbf{Y}}$) and mechanisms ($\mathcal{M}[\phi_{\mathbf{Y}}]$) with the proper dimensionality. To do so we introduce two helper functions:
- *tensorize_list*: given a list of matrices $[X_1, X_2, X_3, ..., X_n]$ with dimensions $[(r_1,c_1),(r_2,c_2),(r_3,c_3),...,(r_n,c_n)]$, this function recursively compute the tensor product $X_1 \otimes X_2 \otimes X_3 \otimes ... \otimes X_n$ with dimension $(r_1\cdot r_2\cdot r_3\cdot ...\cdot r_n, c_1\cdot c_2\cdot c_3\cdot ...\cdot c_n)$.
- *tensorize_mechanism*: this function wraps up the code we used previously to compute the mechanism in $\mathtt{M0}$. It receives the inference engine for the underlying graph, a source set of nodes, a target set of nodes, and a dictionary with the cardinalities of the nodes. It then use the engine to evaluate the mechanism, and the it re-orders and compacts the resulting matrix in the right shape.

In [40]:
def tensorize_list(tensor,l):
    if tensor is None:
        if len(l)>1:
            tensor = np.einsum('ij,kl->ikjl',l[0],l[1])
            tensor = tensor.reshape((tensor.shape[0]*tensor.shape[1],tensor.shape[2]*tensor.shape[3]))
            return tensorize_list(tensor,l[2:])
        else:
            return l[0]
    else:
        if len(l)>0:
            tensor = np.einsum('ij,kl->ikjl',tensor,l[0])
            tensor = tensor.reshape((tensor.shape[0]*tensor.shape[1],tensor.shape[2]*tensor.shape[3]))
            return tensorize_list(tensor,l[1:])
        else:
            return tensor

In [41]:
def tensorize_mechanisms(inference,sources,targets,cardinalities):
    
    # Evaluate the mechanism
    joint_TS = inference.query(targets+sources,show_progress=False)
    marginal_S = inference.query(sources,show_progress=False)
    cond_TS = joint_TS / marginal_S
    
    # Extract the matrix
    cond_TS_val = cond_TS.values

    # Reorder the matrix
    old_indexes = range(len(targets+sources))
    new_indexes = [(targets+sources).index(i) for i in joint_TS.variables]
    cond_TS_val = np.moveaxis(cond_TS_val, old_indexes, new_indexes)

    # Compact the matrix
    target_cards=[cardinalities[t] for t in targets]
    target_card = np.prod(target_cards)
    source_cards=[cardinalities[s] for s in sources]
    source_card = np.prod(source_cards)
    cond_TS_val = cond_TS_val.reshape(target_card,source_card)
    
    return cond_TS_val

We can then run the actual computational loop. Its overall organization is the same, although we rely on the functions defined above to make it more compact:

1. Given a pair from the evaluation set $\mathcal{J}$, extract the source nodes $\mathbf{X}$ and the target nodes $\mathbf{Y}$ in the abstracted model $\mathcal{M'}/\mathtt{M1}$.
2. Using the mapping $a$, retrieve the corresponding source set $\mathbf{A}$ and target set $\mathbf{B}$ in the base model $\mathcal{M}/\mathtt{M0}$.
3. Compute the intervention $do(X)$ in the abstracted model $\mathcal{M'}/\mathtt{M1}$ and initialize the *pgmpy* inference engine on this intervened model.
4. Compute the intervention $do(\mathbf{A})$ in the base model $\mathcal{M}/\mathtt{M0}$ and initialize the *pgmpy* inference engine on this intervened model.
5. Compute the mechanism $\mathcal{M'}_{do}[\phi_\mathbf{Y}]$ relying on *tensorize_mechanism()*. 
6. Compute the mechanism $\mathcal{M}_{do}[\phi_\mathbf{B}]$ relying on *tensorize_mechanism()*. 
7. Compute the matrix for $\alpha_\mathbf{X}$ relying on *tensorize_list()*.
8. Compute the matrix for $\alpha_\mathbf{Y}$ relying on *tensorize_list()*.
9. Compute the two alternative path on the diagram by composing $\mathcal{M'}_{do}[\phi_\mathbf{Y}] \circ \alpha_\mathbf{X}$ (lower path) and $\alpha_\mathbf{Y} \circ \mathcal{M}_{do}[\phi_\mathbf{B}]$ (upper path) via a simple matrix product.
10. For every possible intervention on $\mathbf{A}$ compute the JS distance between the two paths.
11. Select the highest distance with respect to all the interventions as the error $E_\alpha(\mathbf{Y},\mathbf{X})$.
12. Select the highest distance with repsect to all pairs $(\mathbf{X},\mathbf{Y})$ as the error $e(\alpha)$.

In [42]:
abstraction_errors = []

for pair in J:
    # Get nodes in the abstracted model
    M1_sources = pair[0]
    M1_targets = pair[1]
    print('\nM1: {0} -> {1}'.format(M1_sources,M1_targets))

    # Get nodes in the base model
    M0_sources = A.invert_a(M1_sources)
    M0_targets = A.invert_a(M1_targets)
    print('M0: {0} -> {1}'.format(M0_sources,M0_targets))
    
    # Perform interventions in the abstracted model and setup the inference engine
    M1do = A.M1.do(M1_sources)
    inferM1 = VariableElimination(M1do)
    
    # Perform interventions in the base model and setup the inference engine
    M0do = A.M0.do(M0_sources)
    inferM0 = VariableElimination(M0do)

    # Compute the high-level mechanisms
    M1_cond_TS_val = tensorize_mechanisms(inferM1,M1_sources,M1_targets,A.M1.get_cardinality())
    print('M1 mechanism shape: {}'.format(M1_cond_TS_val.shape))
        
    # Compute the low-level mechanisms
    M0_cond_TS_val = tensorize_mechanisms(inferM0,M0_sources,M0_targets,A.M0.get_cardinality())
    print('M0 mechanism shape: {}'.format(M0_cond_TS_val.shape))

    # Compute the alpha for sources
    alphas_S = [A.alphas[i] for i in M1_sources]
    alpha_S = tensorize_list(None,alphas_S)
    print('Alpha_s shape: {}'.format(alpha_S.shape))
    
    # Compute the alpha for targers
    alphas_T = [A.alphas[i] for i in M1_targets]
    alpha_T = tensorize_list(None,alphas_T)
    print('Alpha_t shape: {}'.format(alpha_T.shape))
        
    # Evaluate the paths on the diagram
    lowerpath = np.dot(M1_cond_TS_val,alpha_S)
    upperpath = np.dot(alpha_T,M0_cond_TS_val)
    
    # Compute abstraction error for every possible intervention
    distances = []
    for c in range(lowerpath.shape[1]):
        distances.append( distance.jensenshannon(lowerpath[:,c],upperpath[:,c]) )
    print('All JS distances: {0}'.format(distances))
    
    # Select the greatest distance over all interventions
    print('\nAbstraction error: {0}'.format(np.max(distances)))
    abstraction_errors.append(np.max(distances))

# Select the greatest distance over all pairs considered
print('\n\nOVERALL ABSTRACTION ERROR: {0}'.format(np.max(abstraction_errors)))


M1: ['X'] -> ['Y']
M0: ['A', 'B'] -> ['C']
M1 mechanism shape: (4, 3)
M0 mechanism shape: (6, 12)
Alpha_s shape: (3, 12)
Alpha_t shape: (4, 6)
All JS distances: [0.05017846621305217, 0.2226678030590416, 0.08245879643972502, 0.16556786637950252, 0.10628279987749047, 0.07184436262930498, 0.08806573719033846, 0.14878139029330179, 0.12833048669046532, 0.10399805419625928, 0.2237460281190651, 0.13682728601865066]

Abstraction error: 0.2237460281190651

M1: ['X'] -> ['Z']
M0: ['A', 'B'] -> ['D']
M1 mechanism shape: (2, 3)
M0 mechanism shape: (3, 12)
Alpha_s shape: (3, 12)
Alpha_t shape: (2, 3)
All JS distances: [0.17973554196489633, 0.18062587122732615, 0.17113531454186756, 0.18412866147913767, 0.17479492898271867, 0.1688380350378461, 0.1761389213098498, 0.1890012351491756, 0.17731787499519716, 0.1740191669293788, 0.17998418963339513, 0.161252862620544]

Abstraction error: 0.1890012351491756

M1: ['X'] -> ['W']
M0: ['A', 'B'] -> ['F']
M1 mechanism shape: (2, 3)
M0 mechanism shape: (2, 12)
A

M0 mechanism shape: (6, 36)
Alpha_s shape: (6, 36)
Alpha_t shape: (4, 6)
All JS distances: [0.05017846621305232, 0.05017846621305232, 0.050178466213052456, 0.22266780305904155, 0.2226678030590415, 0.2226678030590414, 0.08245879643972501, 0.08245879643972502, 0.08245879643972488, 0.16556786637950252, 0.1655678663795025, 0.16556786637950238, 0.10628279987749015, 0.10628279987749054, 0.10628279987749041, 0.07184436262930527, 0.07184436262930567, 0.07184436262930546, 0.08806573719033861, 0.08806573719033883, 0.0880657371903387, 0.14878139029330179, 0.1487813902933017, 0.1487813902933017, 0.1283304866904654, 0.12833048669046554, 0.12833048669046526, 0.10399805419625922, 0.10399805419625915, 0.10399805419625917, 0.22374602811906508, 0.2237460281190651, 0.223746028119065, 0.1368272860186509, 0.13682728601865093, 0.13682728601865052]

Abstraction error: 0.2237460281190651

M1: ['X', 'Z'] -> ['W']
M0: ['A', 'B', 'D'] -> ['F']
M1 mechanism shape: (2, 6)
M0 mechanism shape: (2, 36)
Alpha_s shape:

# Automating the evaluation of abstraction error

To finalize the process of automation we package and move our code inside the class *Abstraction* in *src/pgm_models.py*. We define a function *evaluate_abstraction_error()* which, besides a verbosity parameter, takes two key parameters:

- *metric*: a function defining how to compute the distance among interventional distribution. If no function is passed, the algorithm falls back on JSD.
- *J_algorithm*: a function used to compute the evaluation set $\mathcal{J}$. If no function is passed, the algorithms falls back on the last algorithm we defined in this notebook (enumerating all pairs for which there is a directed path in the base or in the abstracted model).

These two parameters are key component for evaluating abstraction error, determining the quality and the computational cost of the process. In particular, the *J_algorithm* determines the cardinality of the set of pairs to evaluate and therefore has a direct impact on the complexity of the algorithm. Heuristics may be used to reduce this set and allow an efficient computation.

## Example 5

We now look at a last, even more complex abstract case, and use our *Abstraction* object.

### Model definition

We will use the exact same model we implemented above.

### Running the computational loop

In [44]:
result = A.evaluate_abstraction_error(verbose=True)

- Checking ['X'] -> ['Y']: True
- Checking ['X'] -> ['Z']: True
- Checking ['X'] -> ['W']: True
- Checking ['X'] -> ['Y', 'Z']: True
- Checking ['X'] -> ['Y', 'W']: True
- Checking ['X'] -> ['Z', 'W']: True
- Checking ['X'] -> ['Y', 'Z', 'W']: True
- Checking ['Y'] -> ['X']: False
---- Checking ['C'] -> ['A', 'B']: False
- Checking ['Y'] -> ['Z']: True
- Checking ['Y'] -> ['W']: True
- Checking ['Y'] -> ['X', 'Z']: True
- Checking ['Y'] -> ['X', 'W']: True
- Checking ['Y'] -> ['Z', 'W']: True
- Checking ['Y'] -> ['X', 'Z', 'W']: True
- Checking ['Z'] -> ['X']: False
---- Checking ['D'] -> ['A', 'B']: False
- Checking ['Z'] -> ['Y']: False
---- Checking ['D'] -> ['C']: False
- Checking ['Z'] -> ['W']: False
---- Checking ['D'] -> ['F']: False
- Checking ['Z'] -> ['X', 'Y']: False
---- Checking ['D'] -> ['A', 'B', 'C']: False
- Checking ['Z'] -> ['X', 'W']: False
---- Checking ['D'] -> ['A', 'B', 'F']: False
- Checking ['Z'] -> ['Y', 'W']: False
---- Checking ['D'] -> ['C', 'F']: False
-

All JS distances: [0.1901869294351166, 0.047124920170917795, 0.04855018540329598, 0.13414314983303763, 0.07137371506892483, 0.10795279252387285, 0.19018692943511664, 0.047124920170917795, 0.048550185403295774, 0.13414314983303738, 0.07137371506892483, 0.1079527925238729, 0.19018692943511656, 0.047124920170917795, 0.04855018540329598, 0.13414314983303738, 0.07137371506892398, 0.10795279252387331, 0.1901869294351166, 0.04712492017091775, 0.04855018540329598, 0.13414314983303757, 0.07137371506892456, 0.10795279252387321, 0.19018692943511661, 0.047124920170917795, 0.04855018540329598, 0.13414314983303735, 0.07137371506892412, 0.10795279252387315, 0.19018692943511686, 0.047124920170917586, 0.04855018540329598, 0.13414314983303735, 0.07137371506892456, 0.10795279252387321, 0.1901869294351166, 0.047124920170917274, 0.0485501854032967, 0.13414314983303716, 0.07137371506892398, 0.10795279252387331, 0.1901869294351166, 0.04712492017091831, 0.04855018540329616, 0.13414314983303735, 0.071373715068

M1 mechanism shape: (2, 8)
M0 mechanism shape: (3, 12)
Alpha_s shape: (8, 12)
Alpha_t shape: (2, 3)
All JS distances: [0.11787180575991571, 0.11787180575991571, 0.12791449496918936, 0.12791449496918947, 0.17811988540828805, 0.17811988540828802, 0.10646528990897644, 0.10646528990897657, 0.17965020182681865, 0.17965020182681865, 0.19287036317289502, 0.19287036317289508]

Abstraction error: 0.19287036317289508

M1: ['Y', 'W'] -> ['X', 'Z']
M0: ['C', 'F'] -> ['A', 'B', 'D']
M1 mechanism shape: (6, 8)
M0 mechanism shape: (36, 12)
Alpha_s shape: (8, 12)
Alpha_t shape: (6, 36)
All JS distances: [0.24953416498432732, 0.2495341649843273, 0.25098596004884965, 0.2509859600488498, 0.2772890368650649, 0.2772890368650649, 0.24025198566930697, 0.2402519856693071, 0.27875221676790674, 0.27875221676790674, 0.2831247854340249, 0.2831247854340249]

Abstraction error: 0.2831247854340249

M1: ['X', 'Y', 'Z'] -> ['W']
M0: ['A', 'B', 'C', 'D'] -> ['F']
M1 mechanism shape: (2, 24)
M0 mechanism shape: (2, 216)

# Conclusion

In this notebook we have automated the procedure for computing the abstraction error between two models related by an abstraction. We have built on a series of examples, each time considering richer evaluation set over which to compute the abstraction error. The final implementation consider all possible disjoint sets of variables in the high-level model and evaluates the abstraction error over them. Meaningful and efficient computation of the abstraction error will depend on the selection of a representative and efficient evaluation set and a meaningful and robust distance metric.

## Bibliography

[1] Rischel, Eigil Fjeldgren. "The Category Theory of Causal Models." (2020).

[2] Rubenstein, Paul K., et al. "Causal consistency of structural equation models." arXiv preprint arXiv:1707.00819 (2017).

[3] Pearl, Judea. Causality. Cambridge university press, 2009.

[4] Peters, Jonas, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.

[5] Spivak, David I. Category theory for the sciences. MIT Press, 2014.

[6] Fong, Brendan, and David I. Spivak. "Seven sketches in compositionality: An invitation to applied category theory." arXiv preprint arXiv:1803.05316 (2018).