In [1]:
import json

# The DAG Module

The DAG module can be used to create and edit DAGs (Directed Acyclic Graphs) to reflect the causal relationships between attributes. In the following examples, we walkthrough different features of the DAG module. There will also be some tasks for you to gain familiarity working with the DAG.

In [2]:
from causalvis import DAG

### DAG quick start

The easiest way of getting started is to create an empty DAG canvas. Custom attributes can be added using the `ADD NODE` button on the left. Along the top menu are the options: `Select/Move`, `Edit Links`, `Search`, `Download Image`, and `Download JSON`.

Task: 

1) add **three attributes (nodes)** to the canvas, 

- Click the `ADD NODE' Button
- Enter a Variable Name (any name of your choice)
- Click `ADD'
- Repeat above three times

2) **position** each node 

- Click and drag to move nodes

3) **connect** the nodes to express a causal relationship.

- Toggle to the `Edit Links` mode by clicking the connect icon next to search box
- Click two nodes in a series to draw a directed edge between them
- Click an edge to remove it

Once you get familiar with the process, you can move on to the next task.

In [3]:
DAG()

DAG(component='DAG', props={'attributes': [], 'graph': None})

### DAG from attributes list

If the list of causal factors is known, the DAG can also be initialized by passing this list to the `attributes` prop.

Task: 

1) Edit the cell below to initialize the module with **five attributes.**

- Add an attribute name to the list `["A", "List", "of", "Variables"]`

In [4]:
dg = DAG(attributes=["A", "List", "of", "Variables", "Another"])
dg

DAG(component='DAG', props={'attributes': ['A', 'Another', 'List', 'of', 'Variables'], 'graph': None})

### DAG from file

If a json nodelink file has already been created to capture the causal relationships between attributes, the contents of this file can be passed to the DAG module using the `graph` prop.

In [5]:
with open('../../public/DAG.json', 'r') as d:
    graph = json.load(d)

In [6]:
from causalvis import DAG

dg2 = DAG(graph=graph)
dg2

DAG(component='DAG', props={'attributes': None, 'graph': {'nodes': [{'x': 728.2895763246523, 'y': 83.853498218…

Once a graph has been created (such as in the cell above), treatment and outcome variables can be set using the context menu for each attribute in the list on the left. To pull up the context menu, `Shift - Right Click` on the attribute name.

Setting the treatment and outcome variables will prompt an automatic highlighting of other attributes in the DAG to reflect their relationship to the treatment and outcome. The color legend can be seen on the bottom right.

Task:

1) Set `absences` as the **treatment** variable

- `Shift - Right Click` on `absences` in the menu on the left
- Select `Set as Treatment`
- The corresponding node in the DAG should change color

2) Set `G1` as the **outcome** variable

- `Shift - Right Click` on `G1` in the menu on the left
- Select `Set as Outcome`
- The corresponding node in the DAG should change color

3) **save** the DAG as an png image

- Click the image icon on the top right corner)

### Obtaining Controls

The list of confounds, colliders, mediators, and prognostic variables can be accessed using the `Download JSON` button (top right). They can also be obtained by accessing the python variable using one of the following:

- `dg.confounds`
- `dg.colliders`
- `dg.mediators`
- `dg.prognostics`

Task:

1) Get the **confounds and prognostic variables** of the DAG above.

- Edit the following cell to access the appropriate variables

In [7]:
dg2.prognostics

[]

### DAG from DataFrame

The DAG can also be initialized by passing in a pandas `DataFrame` using the `data` prop. In the following example, we load a dataset of student school performance. After dropping columns that contain sensitive demographic information, we pass the `DataFrame` to the DAG module.

This example is from [Causalnex](https://causalnex.readthedocs.io/en/latest/03_tutorial/01_first_tutorial.html).

In [4]:
import pandas as pd

data = pd.read_csv('./data/student-mat.csv', delimiter=';')
data.head(5)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10
3,GP,F,15,U,GT3,T,4,2,health,services,...,3,2,2,1,1,5,2,15,14,15
4,GP,F,16,U,GT3,T,3,3,other,other,...,4,3,2,1,2,5,4,6,10,10


In [5]:
drop_col = ['school','sex','age','Mjob', 'Fjob','reason','guardian']
data = data.drop(columns=drop_col)
data.head(5)

Unnamed: 0,address,famsize,Pstatus,Medu,Fedu,traveltime,studytime,failures,schoolsup,famsup,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,U,GT3,A,4,4,2,2,0,yes,no,...,4,3,4,1,1,3,6,5,6,6
1,U,GT3,T,1,1,1,2,0,no,yes,...,5,3,3,1,1,3,4,5,5,6
2,U,LE3,T,1,1,1,2,3,yes,no,...,4,3,2,2,3,3,10,7,8,10
3,U,GT3,T,4,2,1,3,0,no,yes,...,3,2,2,1,1,5,2,15,14,15
4,U,GT3,T,3,3,1,2,0,no,yes,...,4,3,2,1,2,5,4,6,10,10


In [6]:
import numpy as np

struct_data = data.copy()
non_numeric_columns = list(struct_data.select_dtypes(exclude=[np.number]).columns)

print(non_numeric_columns)

['address', 'famsize', 'Pstatus', 'schoolsup', 'famsup', 'paid', 'activities', 'nursery', 'higher', 'internet', 'romantic']


In [7]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

for col in non_numeric_columns:
    struct_data[col] = le.fit_transform(struct_data[col])

struct_data.head(5)

Unnamed: 0,address,famsize,Pstatus,Medu,Fedu,traveltime,studytime,failures,schoolsup,famsup,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,1,0,0,4,4,2,2,0,1,0,...,4,3,4,1,1,3,6,5,6,6
1,1,0,1,1,1,1,2,0,0,1,...,5,3,3,1,1,3,4,5,5,6
2,1,1,1,1,1,1,2,3,1,0,...,4,3,2,2,3,3,10,7,8,10
3,1,0,1,4,2,1,3,0,0,1,...,3,2,2,1,1,5,2,15,14,15
4,1,0,1,3,3,1,2,0,0,1,...,4,3,2,1,2,5,4,6,10,10


In [8]:
DAG(data=struct_data)

NameError: name 'DAG' is not defined

### DAG from NetworkX and Causalnex

Causal discovery packages such as [Causalnex](https://causalnex.readthedocs.io/en/latest/index.html) can also be used with the Causalvis DAG module. If you do not have Causalnex on your machine, you can skip the following example.

Note that the Causalnex package outputs a [NetworkX](https://networkx.org/documentation/stable/index.html) graph. The DAG module accepts graphs of this type using the `nx_graph` prop. The layout of the nodes will be automatically processed (Causalvis assumes no x and y coordinates are provided). Any NetworkX graph can be passed to this prop, and it is not limited to Causalnex outputs.

In [9]:
import warnings
from causalnex.structure import StructureModel

warnings.filterwarnings("ignore")  # silence warnings

sm = StructureModel()

In [10]:
from causalnex.structure.notears import from_pandas

sm = from_pandas(struct_data)
sm.remove_edges_below_threshold(0.8)

In [12]:
from causalvis import DAG

DAG(nx_graph=sm)

DAG(component='DAG', props={'attributes': None, 'graph': {'nodes': [{'x': 0.4115067384123578, 'y': -0.03933052…