# Root Cause Analysis

In this notebook, we will explain how to use PyRCA for root cause analysis.

In [4]:
import networkx as nx
import numpy as np
import pyrca
from pyrca.analyzers.ht import HT, HTConfig


ModuleNotFoundError: No module named 'pyrca.analyzers'

In [None]:
G = nx.DiGraph()



# Draw the graph
nx.draw(G, with_labels=True)

In [None]:
model = HT(config=HTConfig(graph=estimated_matrix))
model.train(normal_data_df)

results = model.find_root_causes(abnormal_data_df, "X1", True).to_list()
print(results)

In [None]:
# transform node names from 0 to N-1 to X1 to XN
no_of_var = data['meta']['parent_weights'].shape[0]
original_names = [i for i in range(no_of_var)]
node_names = [("X%d" % (i + 1)) for i in range(no_of_var)]
mapping = dict(zip(original_names, node_names))
G = nx.relabel_nodes(G, mapping)

# label the root cause nodes 
color_list = np.array(['blue','red'])
node_color_idx = list((data['meta']['root_causes'] != 0).astype(int))
node_color = color_list[node_color_idx]
print(f"The generated graph: {nx.is_directed_acyclic_graph(G)} is directed and acyclic.")
nx.draw(G, with_labels = True, node_color=node_color)

In [None]:
import networkx as nx
import numpy as np

# transform node names from 0 to N-1 to X1 to XN
no_of_var = data['meta']['parent_weights'].shape[0]
original_names = [i for i in range(no_of_var)]
node_names = [("X%d" % (i + 1)) for i in range(no_of_var)]
mapping = dict(zip(original_names, node_names))
G = nx.relabel_nodes(G, mapping)

# label the root cause nodes 
color_list = np.array(['blue','red'])
node_color_idx = list((data['meta']['root_causes'] != 0).astype(int))
node_color = color_list[node_color_idx]
print(f"The generated graph: {nx.is_directed_acyclic_graph(G)} is directed and acyclic.")
nx.draw(G, with_labels = True, node_color=node_color)

In the graph, we can find X18 and X2 are two root causes.

In [None]:
import pandas as pd
true_matrix = pd.DataFrame((data['meta']['parent_weights']!=0).astype(int), columns=node_names, index=node_names)
true_matrix

We can infer this causal graph from the normal data.

In [None]:
from pyrca.graphs.causal.pc import PC
import pandas as pd

# load data
training_samples = data['data']['num_samples']
tot_data = data['data']['data']
normal_data = tot_data[:training_samples]
normal_data_df = pd.DataFrame(normal_data, columns=node_names)
abnormal_data = tot_data[training_samples:]
abnormal_data_df = pd.DataFrame(abnormal_data, columns=node_names)

# train causal graph construction model
model = PC(PC.config_class())
estimated_matrix = model.train(normal_data_df)

In [None]:
estimated_matrix

We can also evaluat the peformane of estimated graph using preceision, recall, f1 and shd

In [None]:
from pyrca.utils.evaluation import precision, recall, f1, shd

adjPrec = precision(true_matrix, estimated_matrix)
print(f"Precision: {adjPrec:.3f}")
adjRec = recall(true_matrix, estimated_matrix)
print(f"Recall: {adjRec:.3f}")
f1 = f1(true_matrix, estimated_matrix)
print(f"F1: {f1 :.3f}")
shd = shd(true_matrix, estimated_matrix)
print(f"SHD: {shd.get_shd()}")

If you are not satisfied with the performance of estimated graph, you can add more domain knowledge, like required link, forbidden link, root nodes to improve the performance.

Given the estimated graph, we build a root cause localization model by using hypothesis testing algorithm. In the defult setting, the model would output the top-3 root cause nodes.

We are able to identify the root causes X18 and X2 using a hypothesis testing algorithm, despite the estimated graph not being entirely accurate.