# Structure Learning

## Import modules

In [17]:
!pip install bamt==0.1.202

Defaulting to user installation because normal site-packages is not writeable
Collecting bamt==0.1.202
  Downloading BAMT-0.1.202-py3-none-any.whl (191 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m191.1/191.1 KB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m MB/s[0m eta [36m0:00:01[0m
Installing collected packages: bamt
  Attempting uninstall: bamt
    Found existing installation: bamt 0.1.201
    Uninstalling bamt-0.1.201:
      Successfully uninstalled bamt-0.1.201
Successfully installed bamt-0.1.202


In [18]:
%%time
%matplotlib inline
import bamt.Networks as Nets
import bamt.Preprocessors as pp

import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib.pyplot as plt

CPU times: user 122 ms, sys: 16.7 ms, total: 138 ms
Wall time: 137 ms


## Preprocessing

In [19]:
hack = pd.read_csv(r'../data/hack_processed_with_rf.csv')

In [20]:
cols = ['Tectonic regime', 'Period', 'Lithology', 'Structural setting', 'Gross','Netpay','Porosity','Permeability', 'Depth']
hack = hack[cols]

In [21]:
encoder = preprocessing.LabelEncoder()
discretizer = preprocessing.KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')

p = pp.Preprocessor([('encoder', encoder), ('discretizer', discretizer)])
discretized_data, est = p.apply(hack)

## Initializing Bayessian Network

<p> There are 3 type of Bayessian Networks - DiscreteBN, ContinuousBN, HybridBN. <br>
Note that if you pass discrete data to Continoust BN, you get an error.<br><br>
For ContinousBN user can choose whether use mixture nodes or not, for Hybrid user can restrict/allow using logit/mixture nodes.<p> 

List of scoring_functions BAMT can deal with: <br>
1. Group 1
    - Mutual Information (MI)
    - LL
    - BIC
    - AIC
2. K2Score <br><br>
<p> For group 1 user can pass tuple as ('MI',). For K2Score user must import K2Scorer and pass ("K2", K2Scorer) <p>

In [22]:
bn = Nets.HybridBN(has_logit=True, use_mixture=True) # init BN
info = p.info # mapping of nodes (Dict["types": Dict[node_name: type], "signs": Dict[node_name: sign]])

Structural learning contains from two parts: building nodes and building edges. <br><br>

First stage: <br>
Instance of bayessian network initialized a primary nodes inside with two types ('Discrete' and 'Gaussian') according descriptor's info<br>
Second stage: <br>
Instance of bayessian network rewrites nodes according their parents and conditions (parameters)

In [23]:
bn.add_nodes(info)
# DATA PREPROCESSED MUST BE THERE
bn.add_edges(discretized_data, scoring_function=('MI',)) # use mutual information sf implemented in BAMT

In [24]:
bn.get_info()

Unnamed: 0,name,node_type,data_type,parents,parents_types
0,Tectonic regime,Discrete,disc,[],[]
1,Period,ConditionalLogit (LogisticRegression),disc,"[Depth, Structural setting, Lithology]","[cont, disc, disc]"
2,Lithology,ConditionalLogit (LogisticRegression),disc,"[Netpay, Structural setting]","[cont, disc]"
3,Structural setting,Logit (LogisticRegression),disc,[Permeability],[cont]
4,Gross,MixtureGaussian,cont,[Porosity],[cont]
5,Netpay,MixtureGaussian,cont,[Permeability],[cont]
6,Porosity,ConditionalMixtureGaussian,cont,[Tectonic regime],[disc]
7,Permeability,MixtureGaussian,cont,[Gross],[cont]
8,Depth,MixtureGaussian,cont,[Gross],[cont]


In [25]:
bn.plot('Test1')

2022-03-04 01:52:43,570 | ERROR    | Networks.py-plot-0342 | This version allows only html format.


In [26]:
# Now let's plot graph
bn.plot('Simple.html')

## Parameters in structure learning

<p> There are 5 parameters user can tune: init_nodes, init_edges, white_list, remove_init_edges, bl_add. <p>

### Init_nodes

This one defines root nodes (thus they haven't parents at all).

In [27]:
print(bn.nodes)

[Tectonic regime, Period, Lithology, Structural setting, Gross, Netpay, Porosity, Permeability, Depth]


In [28]:
params = {'init_nodes': ['Tectonic regime', 'Period', 'Lithology', 'Structural setting', 'Gross']}
bn.add_edges(discretized_data, scoring_function=('MI',), params=params)
bn.plot('init_nodes.html')

### Init_edges

<p> This one defines edges from which learning procedure starts <p>

In [29]:
params = {'init_nodes': ['Tectonic regime', 'Period', 'Lithology', 'Structural setting', 'Gross'],
          'init_edges':[('Period', 'Permeability'), ('Structural setting', 'Netpay'), ('Gross', 'Permeability')],}
bn.add_edges(discretized_data, scoring_function=('MI',), params=params)
bn.plot('init_edges.html')

### white_list

Strictly set edges where algoritm must learn

In [30]:
params = {'init_nodes': ['Tectonic regime', 'Period', 'Lithology', 'Structural setting', 'Gross'],
         'white_list': [('Period', 'Permeability'), ('Structural setting', 'Netpay'), ('Gross', 'Permeability')]}
bn.add_edges(discretized_data, scoring_function=('MI',), params=params)
bn.plot('whihe_list.html')

### bl_add

Restrict edges

In [31]:
params = {'init_nodes': ['Tectonic regime', 'Period', 'Lithology', 'Structural setting', 'Gross'],
         'white_list': [('Period', 'Permeability'), ('Structural setting', 'Netpay'), ('Gross', 'Permeability')],
         'bl_add':[('Structural setting', 'Netpay')]}
bn.add_edges(discretized_data, scoring_function=('MI',), params=params)
bn.plot('bl_add.html')

### Remove_init_edges

Allow algorithm to remove edges defined by user.

In [32]:
params = {'init_nodes': ['Tectonic regime', 'Period', 'Lithology', 'Structural setting', 'Gross'],
          'init_edges':[('Period', 'Permeability'), ('Structural setting', 'Netpay'), ('Gross', 'Permeability')],
         'remove_init_edges':True}
bn.add_edges(discretized_data, scoring_function=('MI',), params=params)
bn.plot('remove_init.html')