# Learning essential graphs

| | | |
|-|-|-|
|[ ![Creative Commons License](images/cc4.png)](http://creativecommons.org/licenses/by-nc/4.0/) |[ ![aGrUM](images/logoAgrum.png)](https://agrum.org) |[ ![interactive online version](images/atbinder.svg)](https://agrum.gitlab.io/extra/agrum_at_binder.html)

In [1]:
from pylab import *
import matplotlib.pyplot as plt

import os

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb



### Compare learning algorithms
Essentially MIIC and 3off2 computes the essential graph (CPDAG) from data. Essential graphs are PDAGs (Partially Directed Acyclic Graphs).

In [2]:
learner=gum.BNLearner("res/sample_asia.csv")
learner.use3off2()
learner.useNMLCorrection()
print(learner)

Filename       : res/sample_asia.csv
Size           : (50000,8)
Variables      : visit_to_Asia[2], lung_cancer[2], tuberculosis[2], bronchitis[2], positive_XraY[2], smoking[2], tuberculos_or_cancer[2], dyspnoea[2]
Induced types  : True
Missing values : False
Algorithm      : 3off2
Score          : BDeu
Correction     : NML  (Not used for score-based algorithms)
Prior          : -



In [3]:
ge3off2=learner.learnEssentialGraph()
ge3off2

In [4]:
learner=gum.BNLearner("out/sample_asia.csv")
learner.useMIIC()
learner.useNMLCorrection()
print(learner)
gemiic=learner.learnEssentialGraph()
gemiic

Filename       : out/sample_asia.csv
Size           : (50000,8)
Variables      : visit_to_Asia[2], tuberculos_or_cancer[2], positive_XraY[2], dyspnoea[2], tuberculosis[2], bronchitis[2], lung_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : MIIC
Score          : BDeu
Correction     : NML  (Not used for score-based algorithms)
Prior          : -



For the others methods, it is possible to obtain the essential graph from the learned BN.

In [5]:
learner=gum.BNLearner("out/sample_asia.csv")
learner.useGreedyHillClimbing()
bnHC=learner.learnBN()
print(learner)
geHC=gum.EssentialGraph(bnHC)
geHC
gnb.sideBySide(bnHC,geHC)

Filename       : out/sample_asia.csv
Size           : (50000,8)
Variables      : visit_to_Asia[2], tuberculos_or_cancer[2], positive_XraY[2], dyspnoea[2], tuberculosis[2], bronchitis[2], lung_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : Greedy Hill Climbing
Score          : BDeu
Correction     : MDL  (Not used for score-based algorithms)
Prior          : -



0,1
G lung_cancer lung_cancer tuberculos_or_cancer tuberculos_or_cancer lung_cancer->tuberculos_or_cancer bronchitis bronchitis lung_cancer->bronchitis smoking smoking lung_cancer->smoking dyspnoea dyspnoea positive_XraY positive_XraY tuberculos_or_cancer->dyspnoea tuberculos_or_cancer->positive_XraY bronchitis->dyspnoea bronchitis->smoking visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->tuberculos_or_cancer tuberculosis->visit_to_Asia,no_name 0 visit_to_Asia 4 tuberculosis 0->4 1 tuberculos_or_cancer 2 positive_XraY 1->2 3 dyspnoea 1->3 4->1 5 bronchitis 5->3 6 lung_cancer 5->6 7 smoking 5->7 6->1 6->7


In [6]:
learner=gum.BNLearner("out/sample_asia.csv")
learner.useLocalSearchWithTabuList()
print(learner)
bnTL=learner.learnBN()
geTL=gum.EssentialGraph(bnTL)
geTL
gnb.sideBySide(bnTL,geTL)

Filename       : out/sample_asia.csv
Size           : (50000,8)
Variables      : visit_to_Asia[2], tuberculos_or_cancer[2], positive_XraY[2], dyspnoea[2], tuberculosis[2], bronchitis[2], lung_cancer[2], smoking[2]
Induced types  : True
Missing values : False
Algorithm      : Local Search with Tabu List
Tabu list size : 2
Score          : BDeu
Correction     : MDL  (Not used for score-based algorithms)
Prior          : -



0,1
G lung_cancer lung_cancer positive_XraY positive_XraY lung_cancer->positive_XraY tuberculos_or_cancer tuberculos_or_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea bronchitis bronchitis bronchitis->dyspnoea visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->positive_XraY tuberculosis->tuberculos_or_cancer tuberculosis->visit_to_Asia smoking->bronchitis,no_name 0 visit_to_Asia 4 tuberculosis 0->4 1 tuberculos_or_cancer 3 dyspnoea 1->3 2 positive_XraY 4->1 4->2 5 bronchitis 5->3 7 smoking 5->7 6 lung_cancer 6->1 6->2 6->7


Hence we can compare the 4 algorithms.

In [7]:
(
  gnb.flow.clear()
  .add(ge3off2,"Essential graph from 3off2")
  .add(gemiic,"Essential graph from miic")
  .add(bnHC,"BayesNet from GHC")
  .add(geHC,"Essential graph from GHC")
  .add(bnTL,"BayesNet from TabuList")
  .add(geTL,"Essential graph from TabuList")
  .display()
)