Showcase: calling Microsoft Cognitive Toolkit (CNTK) 2.0 deep learning library fromwithin R using reticulate package and Azure DSVM.
R Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
data
deployment
img
logs
packages
python
tests
work
.Rprofile
.gitignore
LICENSE
NEWS.md
PARAMETERS
README.md
config_templ.txt
microsoft_cntk2.0_from_r.Rproj

README.md

Showcase: calling Microsoft Cognitive Toolkit (CNTK) 2.0 deep learning library fromwithin R using reticulate package and Azure DSVM.

Introduction

On 2017/06/01 Microsoft released its Microsoft Cognitive toolkit (CNTK) (check CNTK release page). The toolkit is very interesting and one can read more about it here. Especially chapter Reasons to Switch from TensorFlow to CNTK.

Unfortunately CNTK does not have bindings to R. Fortunately it has to Python. I am very big fan of R and wanted to have CNTK available from this great tool. I present shortly R version of SimpleMNIST.py. This example shows how to apply MLP (infamous) MNIST dataset of handwritten digits.

Preparing environment

I was working on Azure DSVM using NV12 Instance with Tesla K80 GPU and Ubuntu OS 16.04.2 LTS.

Creating anaconda environment

I like to have clean workspace. The good practice is to create a new environment for a project. I issued the following instructions

conda create -n cntk2.0 python=3.5

This gave me a new environment with python 3.5.

Installing CNTK

I selected appropriate CNTK version from this list I used version for Python 3.5 with GPU and 1-SGD compiled ([link])

source activate cntk2.0
pip install https://cntk.ai/PythonWheel/GPU-1bit-SGD/cntk-2.0-cp35-cp35m-linux_x86_64.whl
source deactivate cntk2.0

After those commands I had the newest CNTK installed in Anaconda environment called cntk2.0.

Installing R packages

I have used:

  • R in version 3.3.2 that was shipped with Azure DSVM.
  • R Suite in version 0.9-211

To install packages and build cntkR package run following commands

rsuite proj depsinst
rsuite proj build

Downloading datasets

From CNTK github examples repo I copied two python scripts that download and prepare datasets. To download and prepare datasets just issue the instruction.

cd python
python install_mnist.py
cd ..

In folder data you should find two files Test-28x28_cntk_text.txt and Train-28x28_cntk_text.txt.

Running CNTK from within R

Scipt mnist.R contains my version of SimpleMNIST.py. Before running the script you should modify line 2 in file config_templ.txt to point to your python installation. In the version I prepared this line looks like this

python_path: ~/.conda/envs/cntk2.0/bin

This path should work for you if you are following my post using Azure DSVM.

Finally, we can run our R script using instruction

Rscript R/mnist.R

Below you can see fitting process in progress - it takes around 0.7-1.0s to perform one epoch (60 000 steps).

cntk_R_console.PNG

And here we present output of nvidia-smi taken during the fitting process

cntk_R_nvidia_smi.PNG

It takes around 23 seconds to build an MLP network and score it on test dataset. It reported 2.3% classification error, which is not bad but it was not most important for this showcase.

Power of GPU

Script mnist.R accepts a parameter device that can take two values:

  • cpu - run computation on CPU (default)
  • gpu - run copuation on GPU (id = 0)

On my machine GPU was Tesla M60 GPU. When you switch to cpu() you will notice around 30x slowdown!

Even more performance gain you can see calling mnist_conv.R script that implements Convolution Network that is more computing intensive (check ConvNet_MNIST.py forPython version). Below we present output of running this model using GPU. It takes around 3.7sec for one epoch (60 000 steps). Running this model on CPU is hopeless.

cntk_R_console_2.PNG

Conclusion

I have been using R for large scale analytical solutions for 12 years. It is a great analytics oriented glue-language. I gives me access to different powerful tools like H2O.AI or Apache Spark. Unfortunately sometimes R API is missing. Fortunately if there is Python API we can benefit from using great reticulate package from R Studio. The package worked smoothly with CNTK and Python 3.5. I was really astonished and I am happy I can now benefit from both R and Python toolboxes.