# Operationalization (preparation)

After having trained a model on a feature dataset, we are now almost ready to create a real-time scoring Web service. In the two previous notebooks – Feature Engineering and Model Training – we generated 3 artifacts:
- *service_schema.json*, or the Swagger
- featurizer module (*featurization.py*)
- fitted Spark pipeline (in other words, the model)

In this last notebook, we will be creating the Azure ML Web service module to glue these 3 pieces together.

We will **not** be creating an actual Web service or any Azure resources here, but after completing this notebook, all assets necessary for operationalization will be present in the $AZUREML_NATIVE_SHARE_DIRECTORY.

In [1]:
import os
import json
import importlib
from pyspark.sql import SparkSession, SQLContext, Row

AZUREML_NATIVE_SHARE_DIRECTORY = os.getenv('AZUREML_NATIVE_SHARE_DIRECTORY') + 'Solution1'

First, lets load the sample data embedded in *service_schema.json*. We will use this tiny dataset to verify that the Web service is operational. (Think of it as a unit test!)

In [2]:
sc = SparkSession.builder.getOrCreate()
sql = SQLContext.getOrCreate(sc)

with open(os.path.join(AZUREML_NATIVE_SHARE_DIRECTORY, 'service_schema.json')) as f:
    schema = json.loads(f.read())
    
sample_df = sql.createDataFrame([Row(**x) for x in schema['input']['input_df']['swagger']['example']])

sample_df.printSchema()

root
 |-- ambient_pressure: double (nullable = true)
 |-- ambient_temperature: double (nullable = true)
 |-- pressure: double (nullable = true)
 |-- speed: double (nullable = true)
 |-- temperature: double (nullable = true)
 |-- vibration: array (nullable = true)
 |    |-- element: long (containsNull = true)



### Source code of the Azure ML Web service

In [3]:
%%writefile $AZUREML_NATIVE_SHARE_DIRECTORY/score.py

import os
import numpy as np
from functools import reduce
from pyspark.ml.feature import StringIndexer, VectorAssembler, VectorIndexer
from featurization import featurize

def init():
    from pyspark.ml import PipelineModel
    global pipeline
    dir_path = os.path.dirname(os.path.realpath(__file__))
    model_path = os.path.join(dir_path, 'model')
    pipeline = PipelineModel.load(model_path)

def run(input_df):
    # same featurizer was used prior to training the model    
    features_df = featurize(input_df)
    
    sorted_feature_columns = sorted(features_df.columns)

    # assemble features
    va = VectorAssembler(inputCols=sorted_feature_columns, outputCol='features')

    vectorized_features = va.transform(features_df)

    predictions = pipeline.transform(vectorized_features).collect()

    # get each scored result.
    preds = [x['predictedFailure'] for x in predictions]
    return preds

Overwriting /mnt/azureml-share/score.py


Let's execute this module so that both *init* and *run* methods are present in the current scope.

In [4]:
%run $AZUREML_NATIVE_SHARE_DIRECTORY/score.py

Now we can emulate the real-time ML Web Server by directly calling *init*, which will load the model, and *run*, which will use the model to score our test input.

In [5]:
init()   

In [6]:
run(sample_df)

['None', 'None', 'None']

The command below will show all the pieces (4 files) necessary for creating the ML Web service.

In [7]:
% ls $AZUREML_NATIVE_SHARE_DIRECTORY | grep -v /

[01;32mfeaturization.py[0m*
[01;32mmodel.tar.gz[0m*
[01;32mscore.py[0m*
[01;32mservice_schema.json[0m*
