# How to create a batch web service for a Spark model on Azure

Before running the tutorial, you must configure your DSVM as specified in the README on the [Machine Learing Operationaliztion](https://aka.ms/o16ncli) GitHub repo. If you have previously configured your DSVM, you may want to check the GitHub repo to ensure that you are using the most recent instructions.


In the tutorial you will use [Apache Spark](http://spark.apache.org/) to create a model that uses a Logistic Regression learner to predict food inspection results. To do this, you will call the Spark Python API ([PySpark](http://spark.apache.org/docs/0.9.0/python-programming-guide.html)) to load a dataset, train a model using the dataset, and publish a batch scoring API for the model.

## Load the data

The tutorial uses the *Food Inspections Data Set* which contains the results of food inspections that were conducted in Chicago. To facilitate this tutorial, we have placed a copy of the data in the ```azureml/datasets``` folder. The original dataset is available from the [City of Chicago data portal](https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5). 

In [None]:
### Import the relevant PySpark bindings
from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import HashingTF, Tokenizer
from pyspark.sql.functions import UserDefinedFunction
from pyspark.sql.types import *

### Parse the food inspections dataset and create numerical labels for training

In [None]:
inspections = spark.read.csv("../datasets/food_inspections1.csv",mode='DROPMALFORMED',inferSchema=False)

schema = StructType([StructField("id", IntegerType(), False), 
                     StructField("name", StringType(), False), 
                     StructField("results", StringType(), False), 
                     StructField("violations", StringType(), True)])

df = sqlContext.createDataFrame(inspections.rdd.map(lambda l: (int(l[0]), l[1], l[12], l[13] if l[13] else '')), schema) 
df.registerTempTable('CountResults')

def labelForResults(s):
    if s == 'Fail':
        return 0.0
    elif s == 'Pass w/ Conditions' or s == 'Pass':
        return 1.0
    else:
        return -1.0
    
label = UserDefinedFunction(labelForResults, DoubleType())
labeledData = df.select(label(df.results).alias('label'), df.violations).where('label >= 0')
labeledData.write.format('parquet').mode('overwrite').save('foo')

#### Create and save the model
Next, you train a logistic regression model to predict inspection results. The following code tokenizes each "violations" string to get the individual words in each string. It then uses a HashingTF to convert each set of tokens into a feature vector which is passed to the logistic regression algorithm to construct a model. 

In [None]:
tokenizer = Tokenizer(inputCol="violations", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression(maxIter=10, regParam=0.01)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

model = pipeline.fit(labeledData)

Finally, you save the model to use when deploying the web service.

In [None]:
model.write().overwrite().save("food_inspection.model")
print "Model saved"

## Creating a Batch Web Service

In this section, you will create and deploy a batch webservice that will make predictions on given data using the model that you trained.

### Create a prediction script 

Your goal is to create a web service that you can call to make predictions based on the input data. To create a web service using the model you saved, you start by authoring a script to do the scoring (see the sample script called batch_score.py in the same folder).

## Use the CLI to deploy and manage your batch web service

#### Deploy to local machine

To create the batch web service locally on the DSVM, set your CLI environment to run in local mode.
```
az ml env local
```

To create the web service, run the following command (update the account name with your storage account and container names):

```
az ml service create batch -f batch_score.py --in=--trained-model:food_inspection.model --in=--input-data:https://<yourStorageAccount>.blob.core.windows.net/<containerName>/food_inspections2.csv --out=--output-data -v -n samplebatch
```

Once the web service is successfully created, use the following command to run the job. Note that the wasbs path (wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>) for the output file.

```
az ml service run batch --out=--output-data:wasbs://<containerName>@<accountName>.blob.core.windows.net/output.parquet -v -n samplebatch 
```

View the list of jobs running against your web service to get the ID of the job:

```
az ml service listjobs batch -n batchwebservice
```
Use the Job Name to view the status with the following command:
```
az ml service viewjob batch -n batchwebservice -j <paste job name here>
```