#Install the Azure ML SDK on your Azure Databricks Cluster

The `Azure Machine Learning Python SDK` is required for leveraging the experimentation, model management and model deployment capabilities of Azure Machine Learning services.

If your cluster is not already provisioned with the Azure Machine Learning Python SDK, you easily add it to your cluster by adding the following libraries. 

For reference, to use this notebook in your own Databricks environment, you will need to create libraries, using the [Create Library](https://docs.azuredatabricks.net/user-guide/libraries.html) interface in Azure Databricks, for the following and attach them to your cluster:

**azureml-sdk**
* Source: Upload Python Egg or PyPi
* PyPi Name: `azureml-sdk[databricks]`
* Select Install Library

Verify that the Azure ML SDK is installed on your cluster by running the following cell:

In [4]:
import azureml.core
azureml.core.VERSION

If you see a version number output in the above cell, your cluster is ready to go.

#Initialize Azure ML Workspace

In this notebook, you will use the Azure Machine Learning SDK to create a new Azure Machine Learning Workspace in your Azure Subscription.

Please specify the Azure subscription Id, resource group name, workspace name, and the region in which you want to create the Azure Machine Learning Workspace. 

You can get the value of your Azure subscription ID from the Azure Portal, and then selecting Subscriptions from the menu on the left.

For the `resource_group`, use the name of the resource group that contains your Azure Databricks Workspace. 

NOTE: If you provide a resource group name that does not exist, the resource group will be automatically created. This may or may not succeed in your environment, depending on the permissions you have on your Azure Subscription.

In [7]:
#Provide the Subscription ID of your existing Azure subscription
subscription_id = "e223f1b3-d19b-4cfa-98e9-bc9be62717bc"#"<you-azure-subscription-id>"

#Provide a name for the new Resource Group that will contain Azure ML related services 
resource_group = "LinoBigDataGroup"#"<resource-group-name>"

# Proivde the name and region for the Azure Machine Learning Workspace that will be created
workspace_name = "lino-ml-workspace"#"<azure-ml-workspace-name>"
workspace_region = "eastus2"#'eastus2' # eastus, westcentralus, southeastasia, australiaeast, westeurope

In [8]:
#Provide the Subscription ID of your existing Azure subscription
subscription_id = "3cad1c3c-17f2-4845-85c1-be8dea7565e6"#"<you-azure-subscription-id>"

#Provide a name for the new Resource Group that will contain Azure ML related services 
resource_group = "ml_bricksasia"#"<resource-group-name>"

# Proivde the name and region for the Azure Machine Learning Workspace that will be created
workspace_name = "ml-workbench-mcw"#"<azure-ml-workspace-name>"
workspace_region = "australiaeast"#'eastus2' # eastus, westcentralus, southeastasia, australiaeast, westeurope

#Create an Azure ML Workspace

Run the following cell and follow the instructions printed in the output. 

You will see instructions that read:

`Performing interactive authentication. Please follow the instructions on the terminal.`

`To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code SOMECODE to authenticate.`

When you see this, open a new browser window, navigate to the provided URL. At the code prompt, enter the code provided (be sure to delete any trailing spaces).

Login with the same credentials you use to access your Azure subscription.

Once you have authenticated, the output will continue.

When you see `Provisioning complete.` your Workspace has been created and you can move on to the next cell.

In [11]:
import azureml.core

# import the Workspace class and check the azureml SDK version
from azureml.core import Workspace

ws = Workspace.create(
    name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group, 
    location = workspace_region)

print("Provisioning complete.")

#Persist the Workspace configuration

Run the following cells to retrieve the configuration of the deployed Workspace and persist it to local disk and then to the Databricks Filesystem.

In [14]:
import os
import shutil

ws = Workspace(
    workspace_name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group)

# persist the subscription id, resource group name, and workspace name in aml_config/config.json.
aml_config = 'aml_config'
if os.path.isfile(aml_config) or os.path.isdir(aml_config):
    shutil.rmtree(aml_config)
ws.write_config()

Take a look at the contents of the generated configuration file by running the following cell:

In [16]:
%sh
cat /databricks/driver/aml_config/config.json

Copy the config file to DBFS

In [18]:
#persist the config file to dbfs so that it can be used for the other notebooks.
aml_config_local = 'file:' + os.getcwd() + '/' + aml_config
aml_config_dbfs = '/dbfs/' + 'aml_config'

if os.path.isfile(aml_config_dbfs) or os.path.isdir(aml_config_dbfs):
    shutil.rmtree(aml_config_dbfs)
    #dbutils.fs.rm(aml_config, recurse=True)

dbutils.fs.cp(aml_config_local, aml_config, recurse=True)

#Deploy model to Azure Container Instance (ACI)

In this notebook, you will deploy the best performing model you selected previously as a web service hosted in Azure Container Service.

In [21]:
import os
#import urllib
#import pandas as pd

from pyspark.ml import PipelineModel

## Copy the model from DBFS

You previously saved the model in DBFS, but to deploy it using Azure Machine Learning services, you will need to copy the model to local storage on the driver node.

Run the following cells to copy the model from DBFS to local and verify that you can load the model.

In [24]:
##NOTE: service deployment always gets the model from the current working dir. 
model_name = "flightDelayModel"
model_path_dbfs = "/flightDelayModel/"#os.path.join("/dbfs/models", model_name)
model_path_local = "file:" + os.getcwd() + "/" + model_name + "/"

print("copy model from dbfs {} to local {}".format(model_path_dbfs, model_path_local))
dbutils.fs.cp(model_path_dbfs, model_path_local, recurse=True)

In [25]:
%fs ls 

path,name,size
dbfs:/AdultCensus.mml/,AdultCensus.mml/,0
dbfs:/AdultCensusIncome.csv,AdultCensusIncome.csv,4007034
dbfs:/AdultCensusIncomeTest/,AdultCensusIncomeTest/,0
dbfs:/AdultCensusIncomeTrain/,AdultCensusIncomeTrain/,0
dbfs:/AdultCensus_runHistory.mml/,AdultCensus_runHistory.mml/,0
dbfs:/FileStore/,FileStore/,0
dbfs:/aml_config/,aml_config/,0
dbfs:/databricks/,databricks/,0
dbfs:/databricks-datasets/,databricks-datasets/,0
dbfs:/databricks-results/,databricks-results/,0


# Register the model with Azure Machine Learning

Begin by loading your Azure Machine Learning Workspace configuration from disk.

In [28]:
import azureml.core
from azureml.core.workspace import Workspace

#get the config file from dbfs
aml_config = '/aml_config'
dbutils.fs.cp(aml_config, 'file:'+os.getcwd()+aml_config, recurse=True)

ws = Workspace.from_config()

In the following, you register the model file with Azure Machine Learning (which saves a copy of the model in the cloud).

In [30]:
#Register the model
from azureml.core.model import Model
mymodel = Model.register(model_path = model_name, # this points to a local file or folder in the current working dir
                       model_name = model_name, # this is the name the model is registered with                 
                       description = "MCW Flight Delay Prediction Model",
                       workspace = ws)

print(mymodel.name, mymodel.description, mymodel.version)

## PoC Challenge
Can you show Margie's Travel the model you just registered in the `Azure Machine Learning service workspace` in the Azure Portal?

In [32]:
#Go into Azure Portal, locate Machine Learning service workspace -> look under Model
#flightDelayModel 1  Flight Delay Prediction Model  ‎03‎/‎02‎/‎2019‎ ‎2‎:‎40‎:‎02‎ ‎AM‎ ‎GMT

#Create the scoring web service

When deploying models for scoring with Azure Machine Learning services, you need to define the code for a simple web service that will load your model and use it for scoring. By convention this service has two methods `init` which loads the model and `run` which scores data using the loaded model. 

This scoring service code will later be deployed inside of a specially prepared Docker container.

In [35]:
#%%writefile score_sparkml.py
score_sparkml = """

import json

def init():
    try:
        # One-time initialization of PySpark and predictive model
        import pyspark
        from pyspark.ml import PipelineModel
        from azureml.core.model import Model
        
        global trainedModel
        global spark
        
        spark = pyspark.sql.SparkSession.builder.appName("Scoring").getOrCreate()
      
        model_name = "flightDelayModel" 
        
        model_path = Model.get_model_path(model_name)

        trainedModel = PipelineModel.load(model_path)

    except Exception as e:
        print("Exception in init: " + str(e))
        trainedModel = e

def run(input_df):
    response = ''    

    if isinstance(trainedModel, Exception):
        return json.dumps({"Exception":trainedModel})

    try:
        print("received: " + input_df)
        
        sc = spark.sparkContext
      
        # Set inferSchema=true to prevent the float values from being seen as strings
        # which can later cause the VectorAssembler to throw an error: 'Data type StringType is not supported.'
        df = spark.read.option("inferSchema", "true").json(sc.parallelize([input_df]))
      
        #Get prediction results for the dataframe
        score = trainedModel.transform(df)
        predictions = score.collect()
        
        #Get each scored result (prediction and confidence)
        preds = [{"prediction":str(result['prediction']), "confidence":str(result['probability'])} for result in predictions]
        
        response = json.dumps(preds)
        
        print("response: " + str(response))
        
    except Exception as e:
        print("Exception in run: " + str(e))
        return (str(e))

    # Return results
    return response
    
"""

exec(score_sparkml)

with open("score_sparkml.py", "w") as file:
    file.write(score_sparkml)

Test the scoring script locally and confirm that it works as desired.

In [37]:
import json

# Create two records for testing the prediction
test_input1 = {"OriginAirportCode":"SAT","Month":5,"DayofMonth":5,"CRSDepHour":13,"DayOfWeek":7,"Carrier":"MQ","DestAirportCode":"ORD","WindSpeed":9,"SeaLevelPressure":30.03,"HourlyPrecip":0}

test_input2 = {"OriginAirportCode":"ATL","Month":2,"DayofMonth":5,"CRSDepHour":8,"DayOfWeek":4,"Carrier":"MQ","DestAirportCode":"MCO","WindSpeed":3,"SeaLevelPressure":31.03,"HourlyPrecip":0}

# test init() in local notebook# test  
init()

# package the inputs into a JSON string and test run() in local notebook
test_inputs = [test_input1, test_input2] 
json_str_test_inputs = json.dumps(test_inputs)
run(json_str_test_inputs)

# Create a Conda dependencies environment file

Your scoring service can have dependencies install by using a Conda environment file. Items listed in this file will be conda or pip installed within the Docker container that is created and thus be available to your scoring web service logic.

In [40]:
from azureml.core.conda_dependencies import CondaDependencies 

myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas'])

with open("mydeployenv.yml","w") as f:
    f.write(myacienv.serialize_to_string())

#Deployment

In the following cells you will use the Azure Machine Learning SDK to package the model and scoring script in a container, and deploy that container to an Azure Container Instance.

Run the following cells.

Create a configuration of the ACI web service instance that provides the number of CPU cores, size of memory, a collection of tags and a description.

In [44]:
#http://www.linotadros.com/blogpostdetails/lino-tadros-blogs/2018/11/19/getting-the-ml-api-key-in-azure-databricks
#need to add auth_enabled=True to deploymwnt to ACS

In [45]:
from azureml.core.webservice import AciWebservice, Webservice

aci_config = AciWebservice.deploy_configuration(
    cpu_cores = 1, 
    memory_gb = 1, 
    tags = {'name':'MCW Flight Delay Prediction'}, 
    description = 'Predicts if a flight will be delayed by 15 minutes or more.',
    auth_enabled=True)

Next, build up a container image configuration that names the scoring service script, the runtime (python or Spark), and provides the conda file.

In [47]:
service_name = "sparkmlservicedb01"
runtime = "spark-py" #"python" #
driver_file = "score_sparkml.py"
conda_file = "mydeployenv.yml"

from azureml.core.image import ContainerImage

image_config = ContainerImage.image_configuration(execution_script = driver_file,
                                                  runtime = runtime,
                                                  conda_file = conda_file)

Now you are ready to begin your deployment to the Azure Container Instance. 

Run the following cell. This may take between **5-15 minutes** to complete.

You will see output similar to the following when your web service is ready:
`SucceededACI service creation operation finished, operation "Succeeded"`

In [49]:
webservice = Webservice.deploy_from_model(
  workspace=ws, 
  name=service_name, 
  deployment_config=aci_config,
  models = [mymodel], 
  image_config=image_config, 
  )

webservice.wait_for_deployment(show_output=True)

In [50]:
primary, secondary = webservice.get_keys()
print(primary)
print(secondary

#Test the deployed service

Now you are ready to test scoring using the deployed web service. The following cell invokes the web service. 

Run the following cells to test scoring using a single input row against the deployed web service.

In [53]:
webservice.run(input_data = json_str_test_inputs)

In [54]:
%fs ls

path,name,size
dbfs:/AdultCensus.mml/,AdultCensus.mml/,0
dbfs:/AdultCensusIncome.csv,AdultCensusIncome.csv,4007034
dbfs:/AdultCensusIncomeTest/,AdultCensusIncomeTest/,0
dbfs:/AdultCensusIncomeTrain/,AdultCensusIncomeTrain/,0
dbfs:/AdultCensus_runHistory.mml/,AdultCensus_runHistory.mml/,0
dbfs:/FileStore/,FileStore/,0
dbfs:/aml_config/,aml_config/,0
dbfs:/databricks/,databricks/,0
dbfs:/databricks-datasets/,databricks-datasets/,0
dbfs:/databricks-results/,databricks-results/,0


In [55]:
%fs ls /mnt/mlstorage

path,name,size
dbfs:/mnt/mlstorage/FlightsAndWeather/,FlightsAndWeather/,0


#Clean up

When you are finished experimenting with your deployed web service, you can also use the Azure Machine Learning Python SDK to delete the deployed service.

Run the following cell to cleanup.

In [58]:
webservice.delete()

# You are done!

Congratulations, you have completed this team challenge!