Skip to content

Latest commit

 

History

History
297 lines (212 loc) · 6.8 KB

pk_direct.rst

File metadata and controls

297 lines (212 loc) · 6.8 KB

How to use prior knowledge in DirectLiNGAM

Import and settings

In this example, we need to import numpy, pandas, and graphviz in addition to lingam.

import numpy as np
import pandas as pd
import graphviz
import lingam
from lingam.utils import make_prior_knowledge, make_dot

print([np.__version__, pd.__version__, graphviz.__version__, lingam.__version__])

np.set_printoptions(precision=3, suppress=True)
np.random.seed(0)
['1.16.2', '0.24.2', '0.11.1', '1.5.2']

Utility function

We define a utility function to draw the directed acyclic graph.

def make_prior_knowledge_graph(prior_knowledge_matrix):
    d = graphviz.Digraph(engine='dot')

    labels = [f'x{i}' for i in range(prior_knowledge_matrix.shape[0])]
    for label in labels:
        d.node(label, label)

    dirs = np.where(prior_knowledge_matrix > 0)
    for to, from_ in zip(dirs[0], dirs[1]):
        d.edge(labels[from_], labels[to])

    dirs = np.where(prior_knowledge_matrix < 0)
    for to, from_ in zip(dirs[0], dirs[1]):
        if to != from_:
            d.edge(labels[from_], labels[to], style='dashed')
    return d

Test data

We create test data consisting of 6 variables.

x3 = np.random.uniform(size=10000)
x0 = 3.0*x3 + np.random.uniform(size=10000)
x2 = 6.0*x3 + np.random.uniform(size=10000)
x1 = 3.0*x0 + 2.0*x2 + np.random.uniform(size=10000)
x5 = 4.0*x0 + np.random.uniform(size=10000)
x4 = 8.0*x0 - 1.0*x2 + np.random.uniform(size=10000)
X = pd.DataFrame(np.array([x0, x1, x2, x3, x4, x5]).T ,columns=['x0', 'x1', 'x2', 'x3', 'x4', 'x5'])
X.head()
.dataframe { font-family: verdana, arial, sans-serif; font-size: 11px; color: #333333; border-width: 1px; border-color: #B3B3B3; border-collapse: collapse; } .dataframe thead th { border-width: 1px; padding: 8px; border-style: solid; border-color: #B3B3B3; background-color: #B3B3B3; } .dataframe tbody th { border-width: 1px; padding: 8px; border-style: solid; border-color: #B3B3B3; } .dataframe tr:nth-child(even) th{ background-color: #EAEAEA; } .dataframe tr:nth-child(even) td{ background-color: #EAEAEA; } .dataframe td { border-width: 1px; padding: 8px; border-style: solid; border-color: #B3B3B3; background-color: #ffffff; }
x0 x1 x2 x3 x4 x5
0 2.394708 15.312359 3.685054 0.548814 15.780259 9.948090
1 2.325771 16.145216 4.332293 0.715189 14.335879 9.514409
2 2.197313 15.848718 4.539881 0.602763 14.027410 9.266158
3 1.672250 13.200354 3.675534 0.544883 10.421554 6.771233
4 1.282752 11.337503 3.486211 0.423655 7.533376 5.368668

m = np.array([[0.0, 0.0, 0.0, 3.0, 0.0, 0.0],
              [3.0, 0.0, 2.0, 0.0, 0.0, 0.0],
              [0.0, 0.0, 0.0, 6.0, 0.0, 0.0],
              [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
              [8.0, 0.0,-1.0, 0.0, 0.0, 0.0],
              [4.0, 0.0, 0.0, 0.0, 0.0, 0.0]])

make_dot(m)

Make Prior Knowledge Matrix

We create prior knowledge so that x0, x1 and x4 are sink variables.

The elements of prior knowledge matrix are defined as follows:

  • 0: {x}_{i} does not have a directed path to {x}_{j}
  • 1: {x}_{i} has a directed path to {x}_{j}
  • -1 : No prior knowledge is available to know if either of the two cases above (0 or 1) is true.
prior_knowledge = make_prior_knowledge(
    n_variables=6,
    sink_variables=[0, 1, 4],
)
print(prior_knowledge)
[[-1  0 -1 -1  0 -1]
 [ 0 -1 -1 -1  0 -1]
 [ 0  0 -1 -1  0 -1]
 [ 0  0 -1 -1  0 -1]
 [ 0  0 -1 -1 -1 -1]
 [ 0  0 -1 -1  0 -1]]
# Draw a graph of prior knowledge
make_prior_knowledge_graph(prior_knowledge)

Causal Discovery

To run causal discovery using prior knowledge, we create a DirectLiNGAM object with the prior knowledge matrix.

model = lingam.DirectLiNGAM(prior_knowledge=prior_knowledge)
model.fit(X)
print(model.causal_order_)
print(model.adjacency_matrix_)
[3, 2, 5, 0, 1, 4]
[[ 0.     0.     0.     0.178  0.     0.235]
 [ 0.     0.     2.01   0.45   0.     0.707]
 [ 0.     0.     0.     6.001  0.     0.   ]
 [ 0.     0.     0.     0.     0.     0.   ]
 [ 0.     0.    -0.757  0.     0.     1.879]
 [ 0.     0.     0.    12.017  0.     0.   ]]

We can see that x0, x1, and x4 are output as sink variables, as specified in the prior knowledge.

make_dot(model.adjacency_matrix_)

Next, let’s specify the prior knowledge so that x0 is an exogenous variable.

prior_knowledge = make_prior_knowledge(
    n_variables=6,
    exogenous_variables=[0],
)

model = lingam.DirectLiNGAM(prior_knowledge=prior_knowledge)
model.fit(X)

make_dot(model.adjacency_matrix_)