# Step 4: Model operationalization & Deployment

In this script, a model is saved as a .model file along with the relevant scheme for deployment. The functions are first tested locally before operationalizing the model using Azure Machine Learning Model Management environment for use in production in realtime.


In [1]:
## setup our environment by importing required libraries
import json
import os
import shutil

from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
# for creating pipelines and model
from pyspark.ml.feature import StringIndexer, VectorAssembler, VectorIndexer
# setup the pyspark environment
from pyspark.sql import SparkSession

from azureml.api.schema.dataTypes import DataTypes
from azureml.api.schema.sampleDefinition import SampleDefinition
from azureml.api.realtime.services import generate_schema

from azure.storage.blob import BlockBlobService
from azure.storage.blob import PublicAccess

spark = SparkSession.builder.getOrCreate()

In [2]:
# Enter your Azure blob storage details here 
ACCOUNT_NAME = "pdmamlworkbench"   ## "<your blob storage account name>"

# You can find the account key under the _Access Keys_ link in the 
# [Azure Portal](portal.azure.com) page for your Azure storage container.
ACCOUNT_KEY = "O5uLzNKX7o+ZHFXtHDyS87SIev9QHlkdX2IhIbxYwhRo7sA9zp45HOOFFttUp4r0LyWCcLQ0cCA7l+e8Ct3Yew==" ## "<account key>"

#-------------------------------------------------------------------------------------------
# We will create this container to hold the results of executing this notebook.
# If this container name already exists, we will use that instead, however
# This notebook will ERASE ALL CONTENTS.
CONTAINER_NAME = "featureengineering"
FE_DIRECTORY = 'featureengineering_files.parquet'

# Connect to your blob service     
my_service = BlockBlobService(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY)

# Create a new container if necessary, otherwise you can use an existing container.
# This command creates the container if it does not already exist. Else it does nothing.
my_service.create_container(CONTAINER_NAME, 
                            fail_on_exist=False, 
                            public_access=PublicAccess.Container)
# create a local path where to store the results later.

MODEL_DIRECTORY = 'model_operationalize.parquet'
if not os.path.exists(MODEL_DIRECTORY):
    os.makedirs(MODEL_DIRECTORY)
    print('DONE creating a local directory!')

# define your blob service     
my_service = BlockBlobService(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY)

# download the entire parquet result folder to local path for a new run 
for blob in my_service.list_blobs(CONTAINER_NAME):
    if CONTAINER_NAME in blob.name:
        local_file = os.path.join(MODEL_DIRECTORY, os.path.basename(blob.name))
        my_service.get_blob_to_path(CONTAINER_NAME, blob.name, local_file)

fedata = spark.read.parquet(MODEL_DIRECTORY)

fedata.limit(5).toPandas().head(5)


DONE creating a local directory!


Unnamed: 0,machineID,dt_truncated,volt_rollingmean_3,rotate_rollingmean_3,pressure_rollingmean_3,vibration_rollingmean_3,volt_rollingmean_24,rotate_rollingmean_24,pressure_rollingmean_24,vibration_rollingmean_24,...,error5sum_rollingmean_24,comp1sum,comp2sum,comp3sum,comp4sum,model,age,model_encoded,failure1,label_e
0,45,2016-01-01 06:00:00,174.881727,527.816907,117.225971,39.472147,185.926371,470.121966,113.564799,39.936377,...,0.0,474.0,459.0,384.0,579.0,model1,14,"(0.0, 0.0, 0.0)",0.0,0.0
1,45,2016-01-01 03:00:00,188.406674,474.631787,124.261901,38.869583,186.103373,461.103049,112.467296,39.969765,...,0.0,474.0,459.0,384.0,579.0,model1,14,"(0.0, 0.0, 0.0)",0.0,0.0
2,45,2016-01-01 00:00:00,171.329008,454.871774,123.029121,37.712346,184.493495,458.147521,110.240281,39.621269,...,0.0,473.666667,458.666667,383.666667,578.666667,model1,14,"(0.0, 0.0, 0.0)",0.0,0.0
3,45,2015-12-31 21:00:00,187.832175,478.072922,128.126653,38.659306,184.465271,453.473419,107.234582,39.799088,...,0.0,473.0,458.0,383.0,578.0,model1,14,"(0.0, 0.0, 0.0)",0.0,0.0
4,45,2015-12-31 18:00:00,188.931361,465.157611,115.04854,41.007536,182.153162,454.506948,103.7281,39.916098,...,0.0,473.0,458.0,383.0,578.0,model1,14,"(0.0, 0.0, 0.0)",0.0,0.0


# Define the features, labels for the model

In [3]:
# define list of input columns for downstream modeling 
input_features = [
'volt_rollingmean_3',
'rotate_rollingmean_3',
'pressure_rollingmean_3',
'vibration_rollingmean_3',
'volt_rollingmean_24',
'rotate_rollingmean_24',
'pressure_rollingmean_24',
'vibration_rollingmean_24',
'volt_rollingstd_3',
'rotate_rollingstd_3',
'pressure_rollingstd_3',
'vibration_rollingstd_3',
'volt_rollingstd_24',
'rotate_rollingstd_24',
'pressure_rollingstd_24',
'vibration_rollingstd_24',
'error1sum_rollingmean_24',
'error2sum_rollingmean_24',
'error3sum_rollingmean_24',
'error4sum_rollingmean_24',
'error5sum_rollingmean_24',
'comp1sum',
'comp2sum',
'comp3sum',
'comp4sum',
'age'  
]

label_var = ['label_e']
key_cols =['machineID','dt_truncated']

Vectorize the dataframe

Once you have a model that performs well, you can package it into a scoring service. To prepare for this, save your model and dataset schema locally first. For this ensure that the user changes the setting within aml_config and set docker.compute file to have sharedVolumes: true and prepare the environment. 

In this section, we show the user how to author a realtime web service that scores the model you saved above. First, check to ensure that the latest version of the azure-ml-api-sdk is available for use.

# Define init and run functions
Start by defining the init() and run() functions as shown in the cell below. Then write them to the score.py file. This file will load the model, perform the prediction, and return the result.

The init() function initializes your web service, loading in any data or models that you need to score your inputs. In the example below, we load in the trained model. This command is run when the Docker container containing your service initializes.
The run() function defines what is executed on a scoring call. In our simple example, we simply load in the input as a data frame, and run our pipeline on the input, and return the prediction.

In [4]:
def init():
    # read in the model file
    from pyspark.ml import PipelineModel
    global pipeline
    
    pipeline = PipelineModel.load(os.environ['AZUREML_NATIVE_SHARE_DIRECTORY']+'pdmrfull.model')
    
def run(input_df):
    import json
    response = ''
    try:
        #Get prediction results for the dataframe
        input_features = [
            'volt_rollingmean_3',
            'rotate_rollingmean_3',
            'pressure_rollingmean_3',
            'vibration_rollingmean_3',
            'volt_rollingmean_24',
            'rotate_rollingmean_24',
            'pressure_rollingmean_24',
            'vibration_rollingmean_24',
            'volt_rollingstd_3',
            'rotate_rollingstd_3',
            'pressure_rollingstd_3',
            'vibration_rollingstd_3',
            'volt_rollingstd_24',
            'rotate_rollingstd_24',
            'pressure_rollingstd_24',
            'vibration_rollingstd_24',
            'error1sum_rollingmean_24',
            'error2sum_rollingmean_24',
            'error3sum_rollingmean_24',
            'error4sum_rollingmean_24',
            'error5sum_rollingmean_24',
            'comp1sum',
            'comp2sum',
            'comp3sum',
            'comp4sum',
            'age',
        ]
        
        va = VectorAssembler(inputCols=(input_features), outputCol='features')
        data = va.transform(input_df).select('machineID','features')
        score = pipeline.transform(data)
        predictions = score.collect()

        #Get each scored result
        preds = [str(x['prediction']) for x in predictions]
        response = ",".join(preds)
    except Exception as e:
        print("Error: {0}",str(e))
        return (str(e))
    
    # Return results
    print(json.dumps(response))
    return json.dumps(response)

# Create schema and schema file
Create a schema for the input to the web service and generate the schema file. This will be used to create a Swagger file for your web service which can be used to discover its input and sample data when calling it.

In [5]:
# define the input data frame
inputs = {"input_df": SampleDefinition(DataTypes.SPARK, 
                                       fedata.drop("dt_truncated","failure1","label_e", "model","model_encoded"))}

In [6]:
x = generate_schema(run_func=run, inputs=inputs, filepath='service_schema.json')
print(x)
print(inputs)


{'input': {'input_df': {'internal': {'fields': [{'metadata': {}, 'type': 'long', 'name': 'machineID', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'volt_rollingmean_3', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'rotate_rollingmean_3', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'pressure_rollingmean_3', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'vibration_rollingmean_3', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'volt_rollingmean_24', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'rotate_rollingmean_24', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'pressure_rollingmean_24', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'vibration_rollingmean_24', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'volt_rollingstd_3', 'nullable': True}, {'metadata': {}, 'type': 'double', 'name': 'rotate_rollingstd_3', 'nullable': True}, {'metadata': 

# Test init and run
We can then test the init() and run() functions right here in the notebook, before we decide to actually publish a web service.

In [7]:
# this is how the input data should be
input_data = [[114, 163.375732902,333.149484586,100.183951698,44.0958812638,164.114723991,277.191815232,97.6289110707,50.8853505161,21.0049565219,67.5287259378,12.9361526861,4.61359760918,15.5377738062,67.6519885441,10.528274633,6.94129487555,0.0,0.0,0.0,0.0,0.0,489.0,549.0,549.0,564.0,18.0]]
input_data

[[114,
  163.375732902,
  333.149484586,
  100.183951698,
  44.0958812638,
  164.114723991,
  277.191815232,
  97.6289110707,
  50.8853505161,
  21.0049565219,
  67.5287259378,
  12.9361526861,
  4.61359760918,
  15.5377738062,
  67.6519885441,
  10.528274633,
  6.94129487555,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  489.0,
  549.0,
  549.0,
  564.0,
  18.0]]

In [8]:
df = (spark.createDataFrame(input_data, ["machineID", "volt_rollingmean_3", "rotate_rollingmean_3", "pressure_rollingmean_3", "vibration_rollingmean_3", "volt_rollingmean_24", 
            "rotate_rollingmean_24", "pressure_rollingmean_24", "vibration_rollingmean_24", "volt_rollingstd_3", "rotate_rollingstd_3",
            "pressure_rollingstd_3", "vibration_rollingstd_3", "volt_rollingstd_24", "rotate_rollingstd_24", "pressure_rollingstd_24",
            "vibration_rollingstd_24", "error1sum_rollingmean_24", "error2sum_rollingmean_24", "error3sum_rollingmean_24",
            "error4sum_rollingmean_24", "error5sum_rollingmean_24", "comp1sum", "comp2sum", "comp3sum", "comp4sum",
            "age"]))

In [9]:
# test init() in local notebook
init()
# test run() in local notebook
run(df)

"0.0"


'"0.0"'

In [10]:
# save the schema file for deployment
out = json.dumps(x)
with open(os.environ['AZUREML_NATIVE_SHARE_DIRECTORY'] + 'service_schema.json', 'w') as f:
    f.write(out)

Now the user will need to navigate to the folder: 
```C:\Users\<username>\.azureml\share\<team account>\<Project Name> ```

Copy the file service_schema.json to your projects folder for deployment.

Now we will use %%writefile command will save the *.py file.

In [11]:
%%writefile /azureml-share/pdmscore.py
# after testing the below init() and run() functions,
# uncomment this cell to create the score.py after.

# remove import from init() from function.

from pyspark.sql.types import *
from pyspark.sql.dataframe import *
from pyspark.sql.functions import *

from pyspark.ml.classification import *
from pyspark.ml import PipelineModel
import json

from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler, VectorIndexer


def init():
    # read in the model file
    global pipeline
    pipeline = PipelineModel.load('pdmrfull.model')
    
def run(input_df):
    response = ''
    try:
        #Get prediction results for the dataframe
        input_features = [
            'volt_rollingmean_3',
            'rotate_rollingmean_3',
            'pressure_rollingmean_3',
            'vibration_rollingmean_3',
            'volt_rollingmean_24',
            'rotate_rollingmean_24',
            'pressure_rollingmean_24',
            'vibration_rollingmean_24',
            'volt_rollingstd_3',
            'rotate_rollingstd_3',
            'pressure_rollingstd_3',
            'vibration_rollingstd_3',
            'volt_rollingstd_24',
            'rotate_rollingstd_24',
            'pressure_rollingstd_24',
            'vibration_rollingstd_24',
            'error1sum_rollingmean_24',
            'error2sum_rollingmean_24',
            'error3sum_rollingmean_24',
            'error4sum_rollingmean_24',
            'error5sum_rollingmean_24',
            'comp1sum',
            'comp2sum',
            'comp3sum',
            'comp4sum',
            'age',
        ]

        va = VectorAssembler(inputCols=(input_features), outputCol='features')
        data = va.transform(input_df).select('machineID','features')
        score = pipeline.transform(data)
        predictions = score.collect()

        #Get each scored result
        preds = [str(x['prediction']) for x in predictions]
        response = ",".join(preds)
    except Exception as e:
        print("Error: {0}",str(e))
        return (str(e))
    
    # Return results
    print(json.dumps(response))
    return json.dumps(response)

if __name__ == "__main__":
    init()
    run("{\"input_df\":[{\"machineID\":114,\"volt_rollingmean_3\":163.375732902,\"rotate_rollingmean_3\":333.149484586,\"pressure_rollingmean_3\":100.183951698,\"vibration_rollingmean_3\":44.0958812638,\"volt_rollingmean_24\":164.114723991,\"rotate_rollingmean_24\":277.191815232,\"pressure_rollingmean_24\":97.6289110707,\"vibration_rollingmean_24\":50.8853505161,\"volt_rollingstd_3\":21.0049565219,\"rotate_rollingstd_3\":67.5287259378,\"pressure_rollingstd_3\":12.9361526861,\"vibration_rollingstd_3\":4.61359760918,\"volt_rollingstd_24\":15.5377738062,\"rotate_rollingstd_24\":67.6519885441,\"pressure_rollingstd_24\":10.528274633,\"vibration_rollingstd_24\":6.94129487555,\"error1sum_rollingmean_24\":0.0,\"error2sum_rollingmean_24\":0.0,\"error3sum_rollingmean_24\":0.0,\"error4sum_rollingmean_24\":0.0,\"error5sum_rollingmean_24\":0.0,\"comp1sum\":489.0,\"comp2sum\":549.0,\"comp3sum\":549.0,\"comp4sum\":564.0,\"age\":18.0}]}")

Overwriting /azureml-share/pdmscore.py


In [13]:
!ls $AZUREML_NATIVE_SHARE_DIRECTORY

pdmrfull.model	pdmscore.py  service_schema.json
/bin/sh: 1: /usr/bin/zip: not found


In [14]:

shutil.make_archive('./outputs/o16n.zip', 'zip', os.environ['AZUREML_NATIVE_SHARE_DIRECTORY'])

'/azureml-run/outputs/o16n.zip.zip'

Now the user will need to navigate to the folder: 
```C:\Users\<username>\.azureml\share\<team account>\<Project Name> ```

Copy the file pdmscore.py to your projects folder for deployment.

# Use the CLI to deploy and manage your web service 

## Pre-requisites 

Use the following commands to set up an environment and account to run the web service. For more info, see the Getting Started Guide and the CLI Command Reference. You can use -h flag at the end of the commands for command help.

• Create the environment (you need to do this once per environment e.g. dev or prod)
```
az ml env setup -c -n <yourclustername> --location <e.g. eastus2>
```

• Create a Model Management account (one time setup)
```
az ml account modelmanagement create --location <e.g. eastus2> -n <your-new-acctname> -g <yourresourcegroupname> --sku-instances 1 --sku-name S1
```

• Set the Model Management account
```
az ml account modelmanagement set -n <youracctname> -g <yourresourcegroupname>
```

• Set the environment. The cluster name is the name used in step 1 above. The resource group name was the output of the same process and would be in the command window when the setup process is completed.
```
az ml env set -n <yourclustername> -g <yourresourcegroupname>
```

## Deploy your web service 

Switch to a bash shell, and run the following commands to deploy your service and run it.

Enter the path where the notebook and other files are saved. Your actual path may be different from this example.
```
cd ~/notebooks/azureml/realtime/
```

This assumes that you saved your model locally.
```
az ml service create realtime -f pdmscore.py -r  spark-py -m pdmrfull.model -s service_schema.json -n pdmservice --cpu 0.1
```

This command will return the sample run command with sample data. You can get the Service Id from the output of the create command above.
```
az ml service show realtime -i <yourserviceid>
```

Call the web service to get a prediction

`
az ml service run realtime -i <yourserviceid> -d "{\"input_df\": [{\"machineID\":114, \"volt_rollingmean_3\":163.375732902, \"rotate_rollingmean_3\":333.149484586, \"pressure_rollingmean_3\":100.183951698, \"vibration_rollingmean_3\":44.0958812638, \"volt_rollingmean_24\":164.114723991, \"rotate_rollingmean_24\":277.191815232, \"pressure_rollingmean_24\":97.6289110707, \"vibration_rollingmean_24\":50.8853505161, \"volt_rollingstd_3\":21.0049565219, \"rotate_rollingstd_3\":67.5287259378, \"pressure_rollingstd_3\":12.9361526861, \"vibration_rollingstd_3\":4.61359760918, \"volt_rollingstd_24\":15.5377738062, \"
rotate_rollingstd_24\":67.6519885441, \"pressure_rollingstd_24\":10.528274633, \"vibration_rollingstd_24\":6.94129487555, \"error1sum_rollingmean_24\":0.0, \"error2sum_rollingmean_24\":0.0, \"error3sum_rollingmean_24\":0.0, \"error4sum_rollingmean_24\":0.0, \"error5sum_rollingmean_24\":0.0, \"comp1sum\":489.0, \"comp2sum\":549.0, \"comp3sum\":549.0, \"comp4sum\":564.0, \"age\":180}]}"
`

Predicted output label is as follows:
```
"0.0"
```