<a href="https://colab.research.google.com/github/L00NE/loone_tmp_scripts/blob/main/notebooks/bnlearn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<a href='https://erdogant.medium.com/membership' target='_blank'><img height='200' style='border:0px;height:36px;' src='https://erdogant.github.io/bnlearn/pages/html/_images/logo.png' border='0' alt='Follow me on Medium' /></a>

# Bnlearn for Python

Welcome to the notebook of **bnlearn**. bnlearn is Python package for learning the graphical structure of Bayesian networks, parameter learning, inference and sampling methods. Because probabilistic graphical models can be difficult in usage, Bnlearn for python (this package) is build on the pgmpy package and contains the most-wanted pipelines. Navigate to API documentations for more detailed information.

The core functionalities are:
<br>
<b>* Causal Discovery</b>
<br>
<b>* Structure Learning</b>
<br>
<b>* Parameter Learning</b>
<br>
<b>* Inferences using do-calculus</b>
<br>
<br>

---

## Read the Medium blog for more detailed information.

#### [1. A Step-by-Step Guide in detecting causal relationships using Bayesian Structure Learning in Python](https://towardsdatascience.com/a-step-by-step-guide-in-detecting-causal-relationships-using-bayesian-structure-learning-in-python-c20c6b31cee5)


#### [2. A step-by-step guide in designing knowledge-driven models using Bayesian theorem.](https://towardsdatascience.com/a-step-by-step-guide-in-designing-knowledge-driven-models-using-bayesian-theorem-7433f6fd64be)

#### [3. The Power of Bayesian Causal Inference: A Comparative Analysis of Libraries to Reveal Hidden Causality in Your Dataset.](https://towardsdatascience.com/the-power-of-bayesian-causal-inference-a-comparative-analysis-of-libraries-to-reveal-hidden-d91e8306e25e)

#### [4. Chat with Your Dataset using Bayesian Inferences.](https://towardsdatascience.com/chat-with-your-dataset-using-bayesian-inferences-bfd4dc7f8dcd)





<br>

---


## Github
* [Github](https://github.com/erdogant/bnlearn)
* [Documentation pages](https://erdogant.github.io/bnlearn/)

<br>

---

## Github

<a href="https://colab.research.google.com/github/erdogant/bnlearn/blob/master/notebooks/bnlearn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<br>

---


## Support
This library runs on coffee :) You can [support](https://erdogant.github.io/pca/pages/html/Documentation.html) in various ways, have a look at the [sponser page](https://erdogant.github.io/pca/pages/html/Documentation.html). Report bugs, issues and feature extensions at [github page](https://github.com/erdogant/pca).

<a href='https://www.buymeacoffee.com/erdogant' target='_blank'><img height='36' style='border:0px;height:36px;' src='https://cdn.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' /></a>
<a href='https://erdogant.medium.com/subscribe' target='_blank'><img height='50' style='border:0px;height:36px;' src='https://erdogant.github.io/images/medium_follow_me.jpg' border='0' alt='Follow me on Medium' /></a>

---



**Installation of libraries**

In [None]:
# Install bnlearn
!pip install -U bnlearn

# Install d3blocks
!pip install d3blocks

# Install from github source
#!pip install -U git+https://github.com/erdogant/bnlearn

In [None]:
# Version matplotlib should be >= 3.3.4
import matplotlib
print(matplotlib.__version__)

import pandas as pd
import numpy as np

# Version pgmpy should be >= 0.1.13
import pgmpy
print(pgmpy.__version__)

# Latest version bnlearn
import bnlearn as bn
print(bn.__version__)

3.7.1
0.1.22
0.7.16


**Structure learning example**

In [None]:
# Example dataframe sprinkler_data.csv can be loaded with:
df = bn.import_example()
# df = pd.read_csv('sprinkler_data.csv')
model = bn.structure_learning.fit(df)


In [None]:
G = bn.plot(model)

In [None]:
# Set some colors to the edges and nodes
node_properties = bn.get_node_properties(model)
node_properties['Sprinkler']['node_color']='#FF0000'

edge_properties = bn.get_edge_properties(model)

edge_properties[('Cloudy', 'Rain')]['color']='#FF0000'
edge_properties[('Cloudy', 'Rain')]['weight']=5

[bnlearn] >Set node properties.
[bnlearn] >Set edge properties.


In [None]:
G = bn.plot(model,
            node_properties=node_properties,
            edge_properties=edge_properties,
            interactive=True,
            params_interactive={'notebook':True})

[d3blocks] >INFO> Cleaning edge_properties and config parameters..
[d3blocks] >INFO> Set directed=True to see the markers!
[d3blocks] >INFO> Keep only edges with weight>0
[d3blocks] >INFO> Number of unique nodes: 4
[d3blocks] >INFO> Slider range is set to [0, 5]
[d3blocks] >INFO> Write to path: [/tmp/tmp_fytyffi/d3graph.html]
[d3blocks] >INFO> File already exists and will be overwritten: [/tmp/tmp_fytyffi/d3graph.html]
[d3blocks] >INFO> Number of unique nodes: 4
[d3blocks] >INFO> Keep only edges with weight>0
[d3blocks] >INFO> Slider range is set to [0, 5]
[d3blocks] >INFO> Write to path: [/tmp/tmpds_3ymwh/bnlearn_causal_network.html]
[d3blocks] >INFO> File already exists and will be overwritten: [/tmp/tmpds_3ymwh/bnlearn_causal_network.html]


**Various methodtypes and scoringtypes**

In [None]:
model_hc_bic  = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic')
model_hc_k2   = bn.structure_learning.fit(df, methodtype='hc', scoretype='k2')
model_hc_bdeu = bn.structure_learning.fit(df, methodtype='hc', scoretype='bdeu')
model_ex_bic  = bn.structure_learning.fit(df, methodtype='ex', scoretype='bic')
model_ex_k2   = bn.structure_learning.fit(df, methodtype='ex', scoretype='k2')
model_ex_bdeu = bn.structure_learning.fit(df, methodtype='ex', scoretype='bdeu')
model_ex_bdeu = bn.structure_learning.fit(df, methodtype='cl', root_node='Cloudy')

Parameter Learning

In [None]:
# Import dataframe
df = bn.import_example()
# As an example we set the CPD at False which returns an "empty" DAG
model = bn.import_DAG('sprinkler', CPD=False)
# Now we learn the parameters of the DAG using the df
model_update = bn.parameter_learning.fit(model, df)
# Make plot
#G = bn.plot(model_update)
G = bn.plot(model_update,
            interactive=True,
            params_interactive={'notebook':True, 'cdn_resources': 'remote'})


**Inferences**

In [None]:
model = bn.import_DAG('sprinkler')
q_1 = bn.inference.fit(model, variables=['Rain'], evidence={'Cloudy':1,'Sprinkler':0, 'Wet_Grass':1})
q_2 = bn.inference.fit(model, variables=['Rain'], evidence={'Cloudy':1})

In [None]:
print(dir(q_2))
print(q_2.df)

**Sampling**

In [None]:
model = bn.import_DAG('sprinkler')
df = bn.sampling(model, n=1000)

**Comparing networks**

In [None]:
# Load asia DAG
model = bn.import_DAG('asia')
# plot ground truth
G = bn.plot(model, interactive=False)
# Sampling
df = bn.sampling(model, n=10000)
# Structure learning of sampled dataset
model_sl = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic')
# Plot based on structure learning of sampled data
bn.plot(model_sl, pos=G['pos'])

# Compare networks and make plot
bn.compare_networks(model, model_sl, pos=G['pos'])

**Loading example DAG files**

Note that some of the CPDs do not add up to exactly 1. This will raise an error that needs to be fixed in the input file. As an example, the **asia** works correctly but when you try **pathfinder** throws the error: "*>Warning: CPD [Fault] does not add up to 1 but is: 1.00000003*". You need to make sure that it exactly sums up to 1.

In [None]:
bif_file= 'sprinkler'
bif_file= 'alarm'
bif_file= 'andes'
bif_file= 'asia'
#bif_file= 'pathfinder'
#bif_file= 'sachs'
#bif_file= 'miserables'
#bif_file= 'filepath/to/model.bif'

# Loading example dataset
model = bn.import_DAG(bif_file, verbose=1)



---



---



### Example to learn structure in dataset that start with source-target and the weights.

The weights can be counts how often two edges were detected.

In [None]:
raw = bn.import_example('stormofswords')
# Convert raw data into sparse datamatrix
df = bn.vec2df(raw['source'], raw['target'], raw['weight'])


In [None]:
print(df.head())

   Aegon  Aemon  Aerys  Alliser  Amory  Anguy   Arya  Balon  Barristan  \
0  False   True  False    False  False  False  False  False      False   
1  False   True  False    False  False  False  False  False      False   
2  False   True  False    False  False  False  False  False      False   
3  False   True  False    False  False  False  False  False      False   
4  False   True  False    False  False  False  False  False      False   

   Belwas  ...  Tommen  Tyrion  Tywin    Val  Varys  Viserys  Walder  Walton  \
0   False  ...   False   False  False  False  False    False   False   False   
1   False  ...   False   False  False  False  False    False   False   False   
2   False  ...   False   False  False  False  False    False   False   False   
3   False  ...   False   False  False  False  False    False   False   False   
4   False  ...   False   False  False  False  False    False   False   False   

    Worm  Ygritte  
0  False    False  
1  False    False  
2  False    Fa

In [None]:
# Make the actual Bayesian DAG
DAG = bn.make_DAG(list(zip(raw['source'], raw['target'])), verbose=0)
# Make plot
G=bn.plot(DAG,
          interactive=True,
          params_interactive={'notebook':True, 'cdn_resources': 'remote', 'filter_menu': True, 'select_menu': True})

# You will see that this is a huge network with hundreds of edges.

[d3blocks] >INFO> Cleaning edge_properties and config parameters..
[d3blocks] >INFO> Set directed=True to see the markers!
[d3blocks] >INFO> Keep only edges with weight>0
[d3blocks] >INFO> Number of unique nodes: 107
[d3blocks] >INFO> Slider range is set to [0, 1]


[bnlearn] >Set node properties.
[bnlearn] >Set edge properties.


[d3blocks] >INFO> Write to path: [/tmp/tmpxbz8b3si/d3graph.html]
[d3blocks] >INFO> File already exists and will be overwritten: [/tmp/tmpxbz8b3si/d3graph.html]
[d3blocks] >INFO> Number of unique nodes: 107
[d3blocks] >INFO> Keep only edges with weight>0
[d3blocks] >INFO> Slider range is set to [0, 1]
[d3blocks] >INFO> Write to path: [/tmp/tmphtv67_ed/bnlearn_causal_network.html]
[d3blocks] >INFO> File already exists and will be overwritten: [/tmp/tmphtv67_ed/bnlearn_causal_network.html]


In [None]:
# Parameter learning
model = bn.parameter_learning.fit(DAG, df, verbose=3)
# All the CPDs are now learned

[bnlearn] >Parameter learning> Computing parameters using [bayes]
[bnlearn] >CPD of Aemon:
+--------------+-----+---------------+
| Jon          | ... | Jon(True)     |
+--------------+-----+---------------+
| Robert       | ... | Robert(True)  |
+--------------+-----+---------------+
| Stannis      | ... | Stannis(True) |
+--------------+-----+---------------+
| Aemon(False) | ... | 0.5           |
+--------------+-----+---------------+
| Aemon(True)  | ... | 0.5           |
+--------------+-----+---------------+
[bnlearn] >CPD of Grenn:
+--------------+-----+----------------+---------------+
| Aemon        | ... | Aemon(True)    | Aemon(True)   |
+--------------+-----+----------------+---------------+
| Eddison      | ... | Eddison(True)  | Eddison(True) |
+--------------+-----+----------------+---------------+
| Jon          | ... | Jon(True)      | Jon(True)     |
+--------------+-----+----------------+---------------+
| Samwell      | ... | Samwell(False) | Samwell(True) |
+------

In [None]:
# Generate some data based on DAG
# df1 = bn.sampling(model, n=1000)
# Make predictions
# print(query)
query = bn.inference.fit(DAG, variables=['Grenn'], evidence={'Aemon': 1, 'Samwell': 1})
print(query)
query.df

[bnlearn] >Variable Elimination..
[bnlearn] >Data is stored in [query.df]
+----+---------+----------+
|    | Grenn   |        p |
|  0 | False   | 0.578207 |
+----+---------+----------+
|  1 | True    | 0.421793 |
+----+---------+----------+
+--------------+--------------+
| Grenn        |   phi(Grenn) |
| Grenn(False) |       0.5782 |
+--------------+--------------+
| Grenn(True)  |       0.4218 |
+--------------+--------------+


Unnamed: 0,Grenn,p
0,False,0.578207
1,True,0.421793


In [None]:
# Structure learning on such a huge network will take a lot of time, and with some methods it may even be impossible to compute.
# Lets try to learn it structure for a smaller subset of the columns.
DAG_learned = bn.structure_learning.fit(df.iloc[:, 0:20])

[bnlearn] >Computing best DAG using [hc]
[bnlearn] >Set scoring type at [bic]
[bnlearn] >Compute structure scores for model comparison (higher is better).


In [None]:
# Keep only significant edges
DAG_learned = bn.independence_test(DAG_learned, df, prune=True)

[bnlearn] >Compute edge strength with [chi_square]
[bnlearn] >Edge [Bran <-> Brynden] [P=0.0519475] is excluded because it was not significant (P<0.05) with [chi_square]


In [None]:
# Plot the edges for the small network
_ = bn.plot(DAG_learned,
            interactive=True,
            params_interactive={'notebook':True, 'cdn_resources': 'remote', 'filter_menu': True, 'select_menu': True})

[d3blocks] >INFO> Cleaning edge_properties and config parameters..
[d3blocks] >INFO> Set directed=True to see the markers!
[d3blocks] >INFO> Keep only edges with weight>0
[d3blocks] >INFO> Number of unique nodes: 12
[d3blocks] >INFO> Slider range is set to [0, 10]
[d3blocks] >INFO> Write to path: [/tmp/tmp9detd1uf/d3graph.html]
[d3blocks] >INFO> File already exists and will be overwritten: [/tmp/tmp9detd1uf/d3graph.html]
[d3blocks] >INFO> Number of unique nodes: 12
[d3blocks] >INFO> Keep only edges with weight>0
[d3blocks] >INFO> Slider range is set to [0, 10]


[bnlearn] >Set node properties.
[bnlearn]> Set edge weights based on the [chi_square] test statistic.
[bnlearn] >Set edge properties.


[d3blocks] >INFO> Write to path: [/tmp/tmpbm5rz9lw/bnlearn_causal_network.html]
[d3blocks] >INFO> File already exists and will be overwritten: [/tmp/tmpbm5rz9lw/bnlearn_causal_network.html]


**Fin notebook**