##![LearnAI Header](https://coursematerial.blob.core.windows.net/assets/LearnAI_header.png)

# Model deployment

In this lab, you will learn how to deploy your machine learning solution as a webservice for real-time scoring.

> Please ensure you have run all previous notebooks in sequence before running this.

In [4]:
from azureml.core import Workspace
import azureml.core
import os

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

config_path = '/dbfs/tmp/'

ws = Workspace.from_config(path=os.path.join(config_path, 'aml_config', 'config.json'))
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Resource group: ' + ws.resource_group, sep = '\n')

In [5]:
## NOTE: service deployment always gets the model from the current working dir.
model_name = "PdM_logistic_regression.mml" # 
model_name_dbfs = os.path.join("/dbfs", model_name)

print("copy model from dbfs to local")
model_local = "file:" + os.getcwd() + "/" + model_name
dbutils.fs.cp(model_name, model_local, True)

In [6]:
# register the model
from azureml.core.model import Model
mymodel = Model.register(model_path = model_name, # this points to a local file
                       model_name = model_name, # this is the name the model is registered as, am using same name for both path and name.                 
                       description = "ADB trained model by an amazing data scientist",
                       workspace = ws)

print(mymodel.name, mymodel.description, mymodel.version)

## Converting your data to and from JSON

The most common way to interact with a webservice is using a [REST](https://en.wikipedia.org/wiki/Representational_state_transfer) API, sending and receiving [JSON](https://en.wikipedia.org/wiki/JSON) data.  

We therefore need to convert our dataframe to JSON to send it to the webservice, and the webservice has to then convert it back into a dataframe so that we can use our pyspark model to score the data.

Very often this is straightforward, because json can interpret the schema of our data correctly. However, this is not always the case.  Our usecase is an example where we need to help spark, by explicitly providing the schema when converting the JSON data back to a dataframe.

Let's start with an example to illustrate that.

  **Note**: Explicitly providing the schema of data is generally good practice, because it can speed up reading data and avoids surprises.  This is not only try when working with spark, but also e.g. in *R*  or *scikit-learn*.

In [8]:
df = spark.read.parquet("dbfs:/FileStore/tables/preprocessed").cache()
display(df)

from pyspark.ml.feature import StandardScaler, VectorAssembler
from pyspark.ml import Pipeline

keys = ['machineID', 'datetime']
X_keep = ['diff_maint_1', 'diff_error_1', 'volt_sd_3', 'diff_fail_3', 'pressure_ma_3', 'pressure_sd_3', 'diff_fail_1', 'diff_fail_0', 'age', 'vibration_ma_3', 'rotate_ma_3', 'diff_error_2', 'diff_fail_2', 'diff_error_3', 'diff_maint_2', 'volt_ma_3', 'diff_maint_0', 'vibration_sd_3', 'diff_maint_3', 'rotate_sd_3', 'diff_error_0', 'diff_error_4']
Y_keep = ['y_0', 'y_1', 'y_2', 'y_3']

vassembler = VectorAssembler(inputCols = X_keep, outputCol = "features")
stndscaler = StandardScaler(inputCol = "features", outputCol = "norm_features")

pipeline = Pipeline(stages = [vassembler, stndscaler])
df_norm = pipeline.fit(df).transform(df).select(keys + ["norm_features"] + Y_keep)
display(df_norm)

# from pyspark.sql.types import DateType
from pandas import datetime
from pyspark.sql.functions import col, hour

# we sample every nth row of the data using the `hour` function
df_train = df_norm.filter((col('datetime') < datetime(2015, 10, 1)))
df_test = df_norm.filter(col('datetime') > datetime(2015, 10, 15)).limit(5)

machineID,datetime,norm_features,y_0,y_1,y_2,y_3
16,2015-06-10T23:00:00.000+0000,"List(1, 22, List(), List(0.20952993119406246, 0.3348105818258993, 2.9340970661577592, 1.5719932532053056, 15.161634257041639, 1.8693599215101442, 0.12001661058038639, 0.7754073254874088, 0.5147900362625834, 11.540620742484094, 15.802437535752096, 0.2566995351267684, 0.22541394137012827, 0.5941540693074987, 0.555498311305963, 19.261202705974416, 0.9069586781562903, 1.6933771813998613, 1.1116810754241957, 4.729865538607657, 0.6097039463397917, 1.223219863982335))",0,0,0,0
16,2015-06-11T00:00:00.000+0000,"List(1, 22, List(), List(0.21042920128502413, 0.3358634452907606, 0.3330702411101798, 1.5723905221598684, 14.575100183654946, 2.7730348349603267, 0.12053170332965843, 0.7758708086466959, 0.5147900362625834, 12.075301081601566, 15.013515867331735, 0.2576983660027481, 0.22579406606046573, 0.5951170418671543, 0.5564350706842193, 20.212076705271276, 0.9079103661711447, 2.9111448721925743, 1.1125277479873519, 4.235977976747721, 0.6109507846758649, 1.223790928344605))",0,0,0,0
16,2015-06-11T01:00:00.000+0000,"List(1, 22, List(), List(0.21132847137598576, 0.33691630875562195, 0.8757407723324422, 1.5727877911144312, 14.872604767108934, 2.5567574221353495, 0.12104679607893047, 0.776334291805983, 0.5147900362625834, 13.0095769032335, 14.782783610014254, 0.2586971968787277, 0.22617419075080322, 0.5960800144268099, 0.5573718300624755, 20.429546308533265, 0.9088620541859992, 2.053288703236145, 1.1133744205505083, 4.041983165661366, 0.6121976230119381, 1.224361992706875))",0,0,0,0
16,2015-06-11T02:00:00.000+0000,"List(1, 22, List(), List(0.2122277414669474, 0.3379691722204832, 1.974762526412058, 1.573185060068994, 14.429226792434449, 2.1905131121833645, 0.12156188882820251, 0.7767977749652701, 0.5147900362625834, 12.912661545379082, 15.87728132452455, 0.25969602775470735, 0.2265543154411407, 0.5970429869864655, 0.5583085894407318, 19.946165312632008, 0.9098137422008536, 2.201257835502328, 1.1142210931136645, 1.85574322800626, 0.6134444613480113, 1.224933057069145))",0,0,0,0
16,2015-06-11T03:00:00.000+0000,"List(1, 22, List(), List(0.21312701155790903, 0.33902203568534456, 1.980900321665731, 1.573582329023557, 13.86677006113971, 1.522527241845776, 0.12207698157747456, 0.7772612581245574, 0.5147900362625834, 12.758697269388927, 15.453154429694273, 0.260694858630687, 0.2269344401314782, 0.5980059595461211, 0.559245348818988, 19.637986155549584, 0.9107654302157081, 2.5344969570888316, 1.1150677656768206, 0.7675596954136652, 0.6146912996840845, 1.225504121431415))",0,0,0,0
16,2015-06-11T04:00:00.000+0000,"List(1, 22, List(), List(0.2140262816488707, 0.3400748991502059, 2.47101385326268, 1.57397959797812, 14.008448276294258, 1.1625940182485999, 0.12259207432674661, 0.7777247412838445, 0.5147900362625834, 12.732224691549101, 15.555511673702012, 0.26169368950666666, 0.22731456482181567, 0.5989689321057767, 0.5601821081972443, 20.033344232843074, 0.9117171182305626, 2.4761788691612816, 1.115914438239977, 0.47788266171745436, 0.6159381380201577, 1.226075185793685))",0,0,0,0
16,2015-06-11T05:00:00.000+0000,"List(1, 22, List(), List(0.21492555173983233, 0.3411277626150672, 2.1566803166002755, 1.5743768669326828, 13.691388669660423, 0.6675279113763612, 0.12310716707601865, 0.7781882244431316, 0.5147900362625834, 12.367712341487819, 15.7693547156615, 0.2626925203826463, 0.22769468951215316, 0.5999319046654323, 0.5611188675755006, 19.553655183215596, 0.912668806245417, 2.2160185568558455, 1.1167611108031332, 0.809634439570637, 0.6171849763562309, 1.226646250155955))",0,0,0,0
16,2015-06-11T06:00:00.000+0000,"List(1, 22, List(), List(0.21582482183079396, 0.3421806260799285, 1.876472474043178, 1.5747741358872456, 13.734429573335383, 0.8136589906231091, 0.1236222598252907, 0.7786517076024188, 0.5147900362625834, 12.723694334300843, 16.037952817565746, 0.2636913512586259, 0.22807481420249065, 0.6008948772250878, 0.5620556269537569, 20.38861504616414, 0.9136204942602714, 2.2180488105195746, 1.1176077833662896, 1.042340960735642, 0.618431814692304, 1.227217314518225))",0,0,0,0
16,2015-06-11T07:00:00.000+0000,"List(1, 22, List(), List(0.2167240919217556, 0.3432334895447898, 1.516170548956176, 1.5751714048418084, 14.152950051148023, 1.491828923766658, 0.12413735257456274, 0.779115190761706, 0.5147900362625834, 13.153959305988137, 15.889418845372498, 0.26469018213460555, 0.22845493889282814, 0.6018578497847434, 0.5629923863320131, 20.942852699162145, 0.9145721822751258, 1.3639460510233137, 1.1184544559294458, 1.1917438982265376, 0.6196786530283772, 1.227788378880495))",0,0,0,0
16,2015-06-11T08:00:00.000+0000,"List(1, 22, List(), List(0.21762336201271726, 0.34428635300965116, 1.3709688429175013, 1.5755686737963714, 14.770683628433092, 1.722318683536753, 0.12465244532383478, 0.7795786739209931, 0.5147900362625834, 12.561222981328411, 15.949146498532203, 0.26568901301058523, 0.2288350635831656, 0.602820822344399, 0.5639291457102693, 20.701530140188122, 0.9155238702899803, 0.9536014661530919, 1.1193011284926022, 1.084443385290901, 0.6209254913644504, 1.228359443242765))",0,0,0,0


In [9]:
# test_data_path = "TestData"

# test_data_path_dbfs = os.path.join("/dbfs", test_data_path)

# df_test = spark.read.parquet(test_data_path).limit(5)

display(df_test.limit(5))

machineID,datetime,norm_features,y_0,y_1,y_2,y_3
16,2015-10-15T01:00:00.000+0000,"List(1, 22, List(), List(2.2832467609516076, 2.51529081755369, 1.519700411830627, 2.774129109712572, 14.65828837599718, 1.4499966578532955, 1.6786872698775932, 1.3436376787734596, 0.5147900362625834, 12.714782348518447, 14.398613631067803, 0.25270421162284984, 0.007222369116412204, 0.04140782006519035, 0.017798428186868966, 21.80336393439427, 0.7032974429774381, 1.602436692749368, 1.844899515117534, 3.1141913325801536, 0.6296533597169628, 2.951260624211348))",0,0,0,0
16,2015-10-15T02:00:00.000+0000,"List(1, 22, List(), List(2.2841460310425696, 2.5163436810185513, 1.2612647514705175, 2.7745263786671353, 13.913312489502026, 2.6689505781880656, 1.6792023626268653, 1.3441011619327468, 0.5147900362625834, 11.972818949291788, 15.280736027161465, 0.25370304249882947, 0.007602493806749688, 0.04237079262484594, 0.018735187565125228, 21.929706767556553, 0.7042491309922926, 2.917787940331837, 1.8457461876806904, 1.5608621270179244, 0.630900198053036, 2.951831688573618))",0,0,0,0
16,2015-10-15T03:00:00.000+0000,"List(1, 22, List(), List(2.285045301133531, 2.5173965444834128, 1.555753196551649, 2.774923647621698, 14.28985610522001, 3.426542448341677, 1.6797174553761374, 1.3445646450920339, 0.5147900362625834, 11.369869738948454, 15.052497472245376, 0.25470187337480915, 0.007982618497087172, 0.04333376518450153, 0.01967194694338149, 21.61932052145031, 0.705200819007147, 1.8959090643538812, 1.8465928602438466, 1.8293690246135266, 0.6321470363891092, 2.952402752935888))",0,0,0,0
16,2015-10-15T04:00:00.000+0000,"List(1, 22, List(), List(2.285944571224493, 2.5184494079482738, 3.117309020907023, 2.775320916576261, 13.661653347346657, 3.322156181515168, 1.6802325481254095, 1.345028128251321, 0.5147900362625834, 11.545272644718288, 14.777740364249428, 0.2557007042507888, 0.008362743187424656, 0.04429673774415712, 0.020608706321637752, 20.702627995836455, 0.7061525070220015, 2.295240674600147, 1.847439532807003, 2.3446082047151386, 0.6333938747251824, 2.952973817298158))",0,0,0,0
16,2015-10-15T05:00:00.000+0000,"List(1, 22, List(), List(2.2868438413154544, 2.519502271413135, 2.0559155633955006, 2.7757181855308235, 13.347413778390939, 3.5722335442979083, 1.6807476408746813, 1.3454916114106081, 0.5147900362625834, 12.02089148498528, 14.606011399943256, 0.2566995351267684, 0.008742867877762142, 0.045259710303812706, 0.02154546569989401, 19.84231744673155, 0.7071041950368558, 2.8322702762557097, 1.8482862053701592, 1.9409373106097214, 0.6346407130612556, 2.953544881660428))",0,0,0,0


In [10]:
import json

test_json = json.dumps(df_test.toJSON().collect())

print(test_json)

In [11]:
input_list = json.loads(test_json)
input_rdd = sc.parallelize(input_list)
input_df = spark.read.json(input_rdd)

Now, let's see whether the data look as expected after the rountrip though JSON.

In [13]:
print("This is the schema of the original data frame:")
df_test.printSchema()

print("This is the schema of our data frame after converting it to/from JSON:")
input_df.printSchema()

try:
  assert(df_test.schema == input_df.schema)
except AssertionError:
  print("Sadly, the schemas of the two data frames are not the same.")

## Hands-on Lab

Help spark by explicitly providing the schema when reading the JSON data.

This requires several parts:
1. Identify the schema of the original data
1. Create a schema definition that spark can use when reading the JSON data
1. Tell spark to use that schema definition when reading the JSON data

In [15]:
# Let's identify the schma
df_test.schema

OK. It looks like:
- `norm_features` are encoded as a `VectorUDT`
- `error` is encoded as `IntegerType`

The schema definition further depends on the classes `StructType` and `StructField`.

Try to find where those are defined using the pyspark API, and add the import statements at the top of the next cell. Hint, you need two lines of code.

Use the search function of the pyspark API [documentation](https://spark.apache.org/docs/latest/api/python/index.html) to find the location of most of the definitions of these classes. Unfortunately, `VectorUDT` is a little bit harder to find, and will require some finesse on your side.

In [17]:
#from pyspark.<...> import <...>
#from pyspark.<...> import <...>

# myschema = StructType([
#                       
#                       
#                       ])

In [18]:
# todo                                                                                                                                                                                                                         
from pyspark.sql.types import StructField, StructType, IntegerType
from pyspark.ml.linalg import VectorUDT

myschema = StructType([
                      StructField("norm_features",VectorUDT()),
                      StructField("error",IntegerType())
                      ])

Now that you were able to define the schema, tell spark to use it when reading the JSON data.

Instead of simply writing `spark.read.json(input_rdd)`, tell spark to use your schema while reading the data.

Use this [documentation](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=spark%20read%20schema#pyspark.sql.DataFrameReader.schema) for some hint on how to do this.

In [20]:
input_list = json.loads(test_json)
input_rdd = sc.parallelize(input_list)
# todo: modify the next line 
input_df = spark.read.json(input_rdd)

In [21]:
input_list = json.loads(test_json)
input_rdd = sc.parallelize(input_list)
# todo: modify the next line 
input_df = spark.read.schema(myschema).json(input_rdd)

Now, let's see whether you were successful.

In [23]:
print("This is the schema of the original data frame:")
df_test.printSchema()

print("This is the schema of our data frame after converting it to/from JSON:")
input_df.printSchema()

## End of lab

## Create a score file

The next step of creating a web service is to define a score script that defines what the webservice does.

A typical score script has two methods defined:
- `init` is executed once, when the webservice is started
- `run` is executed everytime a user is interacting with the webservice to score data

Look at this score script below, can you see where we made the changes that are related to explicitly providing the schema when reading JSON data?

There are several places:
1. Importing the modules for defining the schema
1. Defining a global variable for holding the schema
1. Defining the schema
1. Using the schema when reading the data

In [25]:
score_sparkml = """

import json

def init():
    # One-time initialization of PySpark and predictive model
    import pyspark
    from azureml.core.model import Model
    from pyspark.ml import PipelineModel
    from pyspark.sql.types import StructField, StructType, IntegerType
    from pyspark.ml.linalg import VectorUDT

    global trainedModel
    global spark
    global schema
    
    spark = pyspark.sql.SparkSession.builder.appName("ADB and AML notebook by an amazing data scientist").getOrCreate()
    model_name = "{model_name}" #interpolated
    model_path = Model.get_model_path(model_name)
    trainedModel = PipelineModel.load(model_path)
    
    schema = StructType([StructField("norm_features",VectorUDT()), StructField("error",IntegerType())])
    
def run(input_json):
    if isinstance(trainedModel, Exception):
        return json.dumps({{"trainedModel":str(trainedModel)}})
      
    try:
        sc = spark.sparkContext
        input_list = json.loads(input_json)
        input_rdd = sc.parallelize(input_list)
        input_df = spark.read.schema(schema).json(input_rdd)
        
        # Compute prediction
        prediction = trainedModel.transform(input_df)
        #result = prediction.first().prediction
        predictions = prediction.collect()

        #Get each scored result
        preds = [str(x['prediction']) for x in predictions]
        result = ",".join(preds)
        # you can return any data type as long as it is JSON-serializable
        return json.dumps({{"result":result}})        
    except Exception as e:
        result = str(e)
        return json.dumps({{"error":result}})
    
""".format(model_name=model_name)

exec(score_sparkml)

with open("score_sparkml.py", "w") as file:
    file.write(score_sparkml)

Creating a webservice requires creating a docker container in which to run our score script. 

This can all be done with the python AML sdk. 

First we create a conda environment, which makes sure that all the python dependencies are installed in the docker container.  Then we create the container.

In [27]:
from azureml.core.conda_dependencies import CondaDependencies 

myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) #showing how to add libs as an example - not needed for this model.

with open("mydeployenv.yml","w") as f:
    f.write(myacienv.serialize_to_string())

In [28]:
with open("mydeployenv.yml","r") as f:
  print(f.read())

In [29]:
# this will take 5 minutes to finish

service_name = "myaci"
image_name = 'myimage'
runtime = "spark-py" 
driver_file = "score_sparkml.py"
my_conda_file = "mydeployenv.yml"

# image creation
from azureml.core.image import ContainerImage
myimage_config = ContainerImage.image_configuration(execution_script = driver_file, 
                                    runtime = runtime, 
                                    conda_file = my_conda_file)

# Create container Image
myimage = ContainerImage.create(
  workspace=ws, 
  name=image_name,
  models = [mymodel],
  image_config = myimage_config)

myimage.wait_for_creation(show_output=True)

In [30]:
help(ContainerImage)

Now we create the actual webservice, using the Docker image that is stored in the Azure Container Registry. 

Before you continue, try to find your container image in the Azure portal.

In [32]:
# deploy to ACI
from azureml.core.webservice import AciWebservice, Webservice

myaci_config = AciWebservice.deploy_configuration(
    cpu_cores = 2, 
    memory_gb = 2, 
    tags = {'name':'Databricks Azure ML ACI'}, 
    description = 'This is for ADB and AML example. Azure Databricks & Azure ML SDK demo with ACI.',
    location='westus2')

In [33]:
help(azureml.core.webservice)

In [34]:
# Webservice creation
myservice = Webservice.deploy_from_image(
  workspace=ws, 
  name=service_name,
  image=myimage,
  deployment_config = myaci_config)

myservice.wait_for_deployment(show_output=True)

Let's see what we created above. Here is a summary.

In [36]:
print(myservice.serialize())

You can also print individual properties of your webservice, for example the URL used by the webservice.

In [38]:
#for using the Web HTTP API 
print(myservice.scoring_uri)

## Test Webservice

In [40]:
# We can use the test_json data we created above. 
myservice.run(input_data = test_json)

In [41]:
# comment below line to not delete the web service
myservice.delete()

In [42]:
assert isinstance(myservice, azureml.core.webservice.aci.AciWebservice)

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.