### Getting started with SystemML

Exercise objectives:
- Load the SystemML library
- Run basic SystemML code

Load the SystemML.jar from the sparktc.ibmcloud site


In [1]:
%AddJar https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

Starting download from https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
Finished download of SystemML.jar


Import the MLContext from the SystemML API. MLContext is where you will be able to interact with SystemML. Lesson 2 will explain MLContext in more details. For now, you will just import and use the API from it.

In [2]:
import org.apache.sysml.api.MLContext

Import the SQLContext to use some of its capabilites. This SQLContext library comes from the Apache Spark library.

In [3]:
import org.apache.spark.sql.SQLContext

Now, create the SQLContext from the SparkContext, which is initilized by default for you in this notebook environment as the variable `sc`. Pass the `sc` variable into the SQLContext constructor to create the `sqlCtx` variable.

In [4]:
val sqlCtx = new SQLContext(sc)

Now do the same and create the MLContext variable from the SparkContext

In [5]:
val ml = new MLContext(sc)

Now is where we start to get into the SystemML work. Let's create a variable called dml. In the machine learning space, dml is an acronym for Declarative Machine Learning. DMLs will be covered in greater detail in a future lesson, but the idea is that the DML scripts stores all the actual code for the algorithms in a script. You can then use SystemML to run DML scripts, which is what we're going to do here.

In [6]:
val dml = """
X = rand(rows=100, cols=10)
sumX = sum(X)
outMatrix = matrix(sumX, rows=1, cols=1)
write(outMatrix, " ", format="csv")
"""

So there's really no magic here. The first line creates a matrix of random values with 100 rows and 10 columns. The second line add up all columns of X. The third line convert that sum result into a matrix itself of 1 row and 1 column. Finally, write that out to a csv file.

Register the output variable.

In [7]:
ml.reset()
ml.registerOutput("outMatrix")

Execute the script

In [8]:
val out = ml.executeScript(dml)

Get the outputMatrix

In [9]:
val outMatrix = out.getDF(sqlCtx, "outMatrix")

Print the matrix

In [10]:
outMatrix.show

+-------+-----------------+
|__INDEX|               C1|
+-------+-----------------+
|    1.0|515.3344684572643|
+-------+-----------------+



So this just basically tells us that the sum of this randomly generated matrix is the value in the C1 column. 

This concludes this exercise. You were able to load and run with some basic SystemML libraries. In later exercises, you'll see more around this.

Then, click on import note to import by URL and provide this URL https://raw.githubusercontent.com/apache/incubator-systemml/master/samples/zeppelin-notebooks/2BCHR4T1Q/note.json

<img src = "https://ibm.box.com/shared/static/26zavdwas97lyb94pmivkjhdzl00rzgl.png" width='367', height='333' align = 'left' />

Once the notebook has been imported, open it by clicking on NYC-311 from the list of recent notebooks in Zeppelin. That notebook is based on a really large dataset (6.7GB CSV file). I would recommend you change that path to a subset of the data that I've hosted here: https://ibm.box.com/shared/static/j8xx1smlz0t49mzlev6ue6xjjcp4c5tc.csv