Author: dkesada

## Integration of _dbnR_ with Python

### Initial setup

In this notebook I will explain how can we use the _dbnR_ package in Python with the _rpy2_ library. It is very common to code in several languages, and I've come across the problem of having to use Python to deploy DBN models when all my code is done in R. I will show how can we use the _rpy2_ library to fit a DBN model and predict with it. This is not a perfect solution, but it gets the job done.

First of all, we will need a working Python environment with _rpy2_ installed. In my case, I will be defining a new conda environment for this purpose:

```Shell
conda create -n rpy2 python=3.7.0
conda activate rpy2
pip install rpy2
conda install -c anaconda pandas
```

This will get us a working environment with a local R distribution that we will need to populate with our desired R packages, in our case _dbnR_ and its dependencies.

In [6]:
import rpy2
print(rpy2.__version__)

3.4.4


Now that our environment is prepared, we will proceed with the installation of _dbnR_ into the new R distribution of _rpy2_ 

In [1]:
import rpy2.robjects as robj
import rpy2.robjects.packages as rpack
from rpy2.rinterface_lib.embedded import RRuntimeError
from rpy2.robjects.packages import PackageNotInstalledError

try:
    dbnR = rpack.importr('dbnR')
except PackageNotInstalledError as e:
    print(e)

The R package "dbnR" is not installed.


If you run into R '.dll' files not found errors after executing `import rpy2.robjects as robj`, then you need to fiddle with your environment variables so that the R binaries are in your path. I had to add several environment variables on Windows. I'll list them in case it solves some possible problems in the future, and take care to put the correct R version, user name and path. Notice how I'm using miniconda, if you are using conda or not using it at all, you have to adjust the path to _rpy2_ accordingly. Only add the 'R_USER' variable if you want to set a specific _rpy2_ library, because it will mess your other R libraries paths. If you are using Rstudio or some other R interpreter, this will make it default to the _rpy2_ library and that is likely not a good thing.

User variables:

* Path - C:\Program Files\R\R-4.0.2\bin\x64

* R_USER - C:\Users\\<Your_username\>\Miniconda3\envs\rpy2\Lib\site-packages\rpy2

System variables:

* R_HOME - C:\Program Files\R\R-4.0.2 

* R_USER - C:\Program Files\RStudio\bin;C:\Users\\<Your_username\>\Miniconda3\envs\rpy2\Lib\site-packages\rpy2

### Installation of _dbnR_

The simplest way of doing this would be using the _utils_ base R package to install other packages from CRAN. In case we need to install the GitHub version, we first need the _devtools_ package and that is trickier. If we locate the R library of the _rpy2_ environment we could also install it by hand, but that would not be optimal. The last option is to change the library path of _rpy2_ to the one we normally use in R, that way we will be able to install packages directly from R and we will be able to just use them in _rpy2_.

In the next chunk, we install the package using CRAN and give an option to print the path of the _rpy2_ R libraries:

In [3]:
import rpy2.robjects as robj
import rpy2.robjects.packages as rpackages

show_path = False
libpath = robj.r[".libPaths"]
if(show_path):
    print("The path to the libraries used by rpy2 is:")
    print(libpath())

utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)

if(rpackages.isinstalled('dbnR')):
    print("dbnR is already installed.")

else:
    utils.install_packages('dbnR')
    print("dbnR was successfully installed.")

dbnR is already installed.


### Learning a DBN model and forecasting with it

Now that we have _dbnR_ up and running, we will reproduce the example in the 'usage_example.Rmd' file. We will follow the same pipeline: learning the structure of the network from the 'motor' dataset, fitting the model to the data and forecasting with it. The functions of the _dbnR_ package are called via the package instantiation in Python as if we were calling functions inside an object. Other elements of the R environment can also be accessed via the 'robj.r' object that represents the R session. Translations between pandas and R data.frames are done automatically by _rpy2_.

In [3]:
import pandas as pd
import rpy2.robjects as robj
import rpy2.robjects.packages as rpack
from rpy2.robjects import pandas2ri

# Activate the online conversor from pandas to data.frame
pandas2ri.activate()

# The object that represents the R session
r = robj.r

# Load all the dbnR package
dbnR = rpack.importr('dbnR')

# Read the data
motor = robj.r['motor'] # Usually, you would have your dataset stored as a '.csv' file somewhere in order to load it with pandas

# Learn the structure
size = 3
dt_train = motor.iloc[1:2800]
dt_val = motor.iloc[2801:3000]
blacklist = r['matrix'](robj.StrVector(["motor_speed_t_0", "motor_speed_t_0", "i_d_t_0", "i_q_t_0"]), ncol = 2)
net = dbnR.learn_dbn_struc(dt_train, size, method = "dmmhc", blacklist = blacklist,
                             restrict = "mmpc", maximize = "hc") # Arguments with dots generate errors

# Fit the parameters
f_dt_train = dbnR.fold_dt(dt_train, size)
f_dt_val = dbnR.fold_dt(dt_val, size)
fit = dbnR.fit_dbn_params(net, f_dt_train, method = "mle-g")

# Predict with the model
obj_var = robj.StrVector(["stator_winding_t_0"])
res = dbnR.predict_dt(fit, f_dt_val, obj_var, verbose = False) # Be careful with plots, they sometimes kill your kernel
print("Pointwise prediction:")
print(res)

print("Forecasting:")
res = dbnR.forecast_ts(f_dt_val, fit, obj_var = robj.StrVector(["pm_t_0", "stator_winding_t_0"]),
                       ini = 100, len = 70, plot_res = False)


Pointwise prediction:
     nrow  stator_winding_t_0
1       1            0.320855
2       2            0.324326
3       3            0.324271
4       4            0.323423
5       5            0.325163
..    ...                 ...
193   193            0.402305
194   194            0.402777
195   195            0.402748
196   196            0.402368
197   197            0.402191

[197 rows x 2 columns]
Forecasting:
Time difference of -0.161736 secs
[1] The average MAE per execution is:
[1] pm_t_0: 0.0016
[1] stator_winding_t_0: 0.0095


With this, we cover the basic use of the _dbnR_ package from Python. This still requires R running in the background, so it is not a Python port by any means. The results are also converted to pandas dataframes automatically by _rpy2_. On this notebook, the R plots always crashed the Python kernel, so I would advise against them. The visualization of the network doesn't translate into Python either, as it was to be expected. Even so, the model can still be learned, used and deployed without issues. The most troublesome part was getting the right R library linked with _rpy2_ by fiddling with the environment variables (if you are working on Windows). The only other issues I saw were the plot crashing the kernel and the arguments containing the dot character '.' generating errors.