# How to create a real-time web service for a Spark model on Azure

Before running the tutorial, you must configure your DSVM as specified in the README on the [Deploying Spark ML Models on Azure (Preview)](https://github.com/Azure/Spark-Operationalization-On-Azure) GitHub repo. If you have previously configured your DSVM, you may want to check the GitHub repo to ensure that you are using the most recent instructions

In the tutorial, we will walk you through loading a dataset, exploring
its features, training a model on the dataset, and then publishing a
realtime scoring API for the model.

First, read in the Boston Housing Price dataset. This dataset is publicly available at https://archive.ics.uci.edu/ml/datasets/Housing. We have placed a copy in your ```azureml/datasets``` folder.

In [None]:
# Read in the housing price dataset
df2 = spark.read.csv("../datasets/housing.csv", header=True, inferSchema=True)
df2.show()
df2.printSchema()

## Train your model

Using Spark's ML library, we can train a gradient boosted tree regressor for our data to produce a model that can predict median values of houses in Boston. Once you have trained the model, you can evaluate it for quality using the root mean squared error metric.

In [None]:
# Train a boosted decision tree regressor
from pyspark.ml.feature import RFormula
from pyspark.ml.regression import GBTRegressor
from pyspark.ml.pipeline import Pipeline
import numpy as np
formula = RFormula(formula="MEDV~.")
gbt = GBTRegressor()
pipeline = Pipeline(stages=[formula, gbt]).fit(df2)

In [None]:
# Evaluate scores
scores = pipeline.transform(df2)
from pyspark.ml.evaluation import RegressionEvaluator
print "R^2 error =", RegressionEvaluator(metricName="r2").evaluate(scores)

### Save your model and schema

Once you have a model that performs well, you can package it into a scoring service. To prepare for this, save your model and dataset schema locally first.

In [None]:
# Save model
pipeline.write().overwrite().save("housing.model")
print "Model saved"

In [None]:
# Save schema
from azuremlcli import azuremlutilities
reload(azuremlutilities)
azuremlutilities.saveSchema(df2, "webserviceschema.json")

## Authoring a Realtime Web Service

In this section, you how author a realtime web service that scores the model you saved above. 

### Define ```init``` and ```run```

Start by defining your ```init``` and ```run``` functions in the cell below. 

The ```init``` function initializes the web service, loading in any data or models that it needs to score your inputs. In the example below, it loads in the trained model and the schema of your dataset.

The ```run``` function defines what is executed on a scoring call. In this simple example, the service loads the json input as a data frame and runs the pipeline on the input.

In [None]:
%%save_file -f testing.py
# Prepare the web service definition by authoring
# init() and run() functions. Once tested, remove
# the commented magic on the first line to save
# the cell to a file.
def init():
    # read in the model file
    from pyspark.ml import PipelineModel
    global pipeline
    pipeline = PipelineModel.load("housing.model")
    
    # read in the schema
    global inputSchema
    inputSchema=azuremlutilities.loadSchema("webserviceschema.json")
    
def run(inputString):
    import json
    from pyspark.ml import PipelineModel

    input=json.loads(inputString)
    inputRDD=sc.parallelize(input)
    inputDF=spark.createDataFrame(inputRDD,inputSchema, None, False)
    score=pipeline.transform(inputDF)
    return score.collect()[0]['prediction']


### Test ```init``` and ```run```

Before publishing the web service, you can test the init and run functions in the notebook by running the the following cell.

In [None]:
init()
run('[[0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,4.98,24.0]]')

### Create a script that defines the web service

Your goal is to create an endpoint that you can call to make predictions based on the input data. To create a web service using the model you saved, you start by authoring a script to do the scoring.
 
In the script you identify the input parameters you want your web service to consume and the outputs it should produce. 

Go back to the cell where you defined your ```init``` and ```run``` functions, uncomment the magic in the first line (```#%%save_file -f testing.py```), and run the cell again. This saves the contents of the cell to a local file with the name supplied to the ```-f``` argument.


### Use the CLI to deploy and manage your web services

SSH into the DSVM and run the following commands to deploy your service locally.

Set the environment variables, either from the command line or from a script, that you generated when you setup your DSVM. 

Change to azureml folder containing the realtime notebook.

```
cd ~/notebooks/azureml/realtime
```
Next run the following commands to create the web service:

```
aml env local
aml service create realtime -f testing.py -m housing.model -s webserviceschema.json -n mytestapp
```

To create and run the web service on the ACS cluster, change to the cluster mode and rerun the service creation command:

```
aml env cluster
aml service create realtime -f testing.py -m housing.model -s webserviceschema.json -n mytestapp
```

