# Intel BigDL with DSX on Cloud in Python

Users can install and use a recent version of BigDL themselves.
For Python notebooks, this requires two steps, which are explained in detail below:
1. Install a JAR file into the default classpath.
2. Install the Python package.

Instructions below are for notebooks running in DSX on Cloud, backed by Apache Spark as a Service.  
Instructions have also been [posted on StackOverflow.com](https://stackoverflow.com/a/47282619/5629418) now.

## Preparation

You need to select a matching combination of Python, Spark, and BigDL.
The notebook kernel in DSX on Cloud determines Python and Spark versions.
This notebook uses Python 3.5 with Spark 2.1.

Check the BigDL [download page](https://bigdl-project.github.io/master/#release-download/)
for a recent release that supports your chosen version of Spark. The fixlevel does not matter.
At the time of writing, release 0.3.0 is the newest, and it does support Spark 2.1.
**Note:** Do not download the package, just pick the BigDL version you want to use.

Installation with Python 2.7 works exactly like with Python 3.5.
If you want to switch Python versions during development,
you have to install the Python package twice, once for each version.
Switching between Spark versions will not work, because they use the same classpath
and you can only put one BigDL JAR there.
  

## Cleanup

Make sure that BigDL isn't installed yet. If it is, delete it before installing a new version.

In [None]:
# JAR files... if the output is empty, none are in the way
!find ~/data/libs -name bigdl-\*

In [None]:
# Remove BigDL JAR files...
!find ~/data/libs -name bigdl-\* -exec rm -vf {} +

In [None]:
# Python package... if the output is empty, it's not installed.
!pip freeze | grep -i BigDL

In [None]:
# Remove the user-installed BigDL Python package...
!rm -rf ~/.local/lib/python${_py_version_}/site-packages/{bigdl,BigDL}*
# ${_py_version_} stands for your Python version, 3.5 or 2.7

# Remove possible leftovers from a recent installation attempt...
# https://stackoverflow.com/q/47179822
!rm -rf $PIP_BUILD

If you had to clean up an old installation of BigDL, restart the kernel now.

## Installation

You need a JAR file with dependencies. This can be downloaded from a Maven repository. The URL depends on the versions of Spark and BigDL. Each version appears twice, once in the path and once in the filename. For example, the download link for Spark 2.1 and BigDL 0.3.0 is

    https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-SPARK_2.1/0.3.0/bigdl-SPARK_2.1-0.3.0-jar-with-dependencies.jar

Put the JAR in `~/data/libs/` and it will be found.

In [None]:
# modify the versions of Spark (sv) and BigDL (bv) as required, the URL will adjust automatically...
!(export sv=2.1 bv=0.3.0 ; cd ~/data/libs/ && wget  https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-SPARK_${sv}/${bv}/bigdl-SPARK_${sv}-${bv}-jar-with-dependencies.jar)

You need the Python package for the BigDL version of the JAR. It can be installed from PyPI with `pip`.
The `--user` flag is automatically provided by the environment.

In [None]:
!pip install bigdl==0.3.0 | cat

Restart the notebook kernel and BigDL is ready for use.

# Use BigDL

When importing bigdl, some output about paths is expected.
A warning about SPARK_HOME and pyspark is safe to ignore.

The example below is adapted from the "Forward and backward" tutorial at:  
https://github.com/intel-analytics/BigDL-tutorials



In [1]:
# https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/forward_and_backward.ipynb
from bigdl.nn.layer import *
from bigdl.nn.criterion import *
import numpy as np

Prepending /gpfs/fs01/user/s292-2479f4c3f28697-e04c173fe4bd/.local/lib/python3.5/site-packages/bigdl/share/conf/spark-bigdl.conf to sys.path




In [2]:
linear = Linear(2, 1)
print (linear.parameters())

creating: createLinear
{'Linear2023adc': {'gradBias': array([0.], dtype=float32), 'weight': array([[-0.33976966,  0.10730403]], dtype=float32), 'gradWeight': array([[0., 0.]], dtype=float32), 'bias': array([-0.3070345], dtype=float32)}}


In [3]:
input = np.array([1,-2])
# forward to output
output = linear.forward(input)
print (output)

[-0.8614122]


In [4]:
# mean absolute error
mae = AbsCriterion()
target = np.array([0])

loss = mae.forward(output, target)
print("loss: " + str(loss))
        
grad_output = mae.backward(output, target)
linear.backward(input, grad_output)

print (linear.parameters())

creating: createAbsCriterion
loss: 0.8614122
{'Linear2023adc': {'gradBias': array([-1.], dtype=float32), 'weight': array([[-0.33976966,  0.10730403]], dtype=float32), 'gradWeight': array([[-1.,  2.]], dtype=float32), 'bias': array([-0.3070345], dtype=float32)}}
