# Predicting Car Battery Failure

Your goal in this notebook is to **predict how much time a car battery has left until it is expected to fail**. You are provided training data that includes telemetry from different vehicles, as well as the expected battery life that remains. From this you will train a model that given just the vehicle telemetry predicts the expected battery life. 

You will use compute resources provided by Azure Machine Learning (AML) to **remotely** train a **set** of models using **Automated Machine Learning**, evaluate performance of each model and pick the best performing model to deploy as a web service hosted by **Azure Kubernetes Service**.

Because you will be using the Azure Machine Learning SDK, you will be able to provision all your required Azure resources directly from this notebook, without having to use the Azure Portal to create any resources.

## Setup
To begin, you will need to provide the following information about your Azure Subscription. 

In the following cell, be sure to set the values for `subscription_id`, `resource_group`, `workspace_name` and `workspace_region` as directed by the comments (*these values can be acquired from the Azure Portal*). Execute the following cell by selecting the `>|Run` button in the command bar above.

In [2]:
#Provide the Subscription ID of your existing Azure subscription
subscription_id = "30fc406c-c745-44f0-be2d-63b1c860cde0"

#Provide values for the new Resource Group and Workspace that will be created
resource_group = "aml-workspace-tech-immersion"
workspace_name = "aml-workspace"

#Optionally, set the Azure Region in which to deploy your Azure Machine Learning Workspace
workspace_region = "westcentralus" # other options include eastus, westcentralus, southeastasia, australiaeast, westeurope

In [3]:
# constants, you can leave these values as they are or experiment with changing them after you have completed the notebook once
experiment_name = 'automl-regression'
project_folder = './automl-regression'

# this is the URL to the CSV file containing the training data
data_url = "https://databricksdemostore.blob.core.windows.net/data/connected-car/training-formatted.csv"

# this is the URL to the CSV file containing a small set of test data
test_data_url = "https://databricksdemostore.blob.core.windows.net/data/connected-car/fleet-formatted.csv"

cluster_name = "cpucluster"
aks_cluster_name = 'my-aks-cluster' 
aks_service_name ='contoso-service'
resource_id = '/subscriptions/2a779d6f-0806-4359-a6e8-f1fd57bb5dd7/resourceGroups/devintersection-2018-aml-demo/providers/Microsoft.BatchAI/workspaces/devintersection-workspace/clusters/cpucluster1c848275bca'

### Import required packages

The Azure Machine Learning SDK provides a comprehensive set of a capabilities that you can use directly within a notebook including:
- Creating a **Workspace** that acts as the root object to organize all artifacts and resources used by Azure Machine Learning.
- Creating **Experiments** in your Workspace that capture versions of the trained model along with any desired model performance telemetry. Each time you train a model and evaluate its results, you can capture that run (model and telemetry) within an Experiment.
- Creating **Compute** resources that can be used to scale out model training, so that while your notebook may be running in a lightweight container in Azure Notebooks, your model training can actually occur on a powerful cluster that can provide large amounts of memory, CPU or GPU. 
- Using **Automated Machine Learning (AutoML)** to automatically train multiple versions of a model using a mix of different ways to prepare the data and different algorithms and hyperparameters (algorithm settings) in search of the model that performs best according to a performance metric that you specify. 
- Packaging a Docker **Image** that contains everything your trained model needs for scoring (prediction) in order to run as a web service.
- Deploying your Image to either Azure Kubernetes or Azure Container Instances, effectively hosting the **Web Service**.

In Azure Notebooks, all of the libraries needed for Azure Machine Learning are pre-installed. To use them, you just need to import them. Run the following cell to do so:

In [4]:
import logging
import os
import random
import re

from matplotlib import pyplot as plt
from matplotlib.pyplot import imshow
import numpy as np
import pandas as pd
from sklearn import datasets

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import Webservice, AksWebservice
from azureml.core.image import Image
from azureml.core.model import Model
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.core import Workspace

## Create and connect to an Azure Machine Learning Workspace

Run the following cell to create a new Azure Machine Learning **Workspace** and save the configuration to disk (next to the Jupyter notebook). 

**Important Note**: You will be prompted to login in the text that is output below the cell. Be sure to navigate to the URL displayed and enter the code that is provided. Once you have entered the code, return to this notebook and wait for the output to read `Library configuration succeeded`.

In [5]:
# By using the exist_ok param, if the worskpace already exists you get a reference to the existing workspace
# allowing you to re-run this cell multiple times as desired (which is fairly common in notebooks).
ws = Workspace.create(
    name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group, 
    location = workspace_region,
    exist_ok = True)

ws.write_config()
print('Library configuration succeeded')


Falling back to use azure cli credentials. This fall back to use azure cli credentials will be removed in the next release. 
Make sure your code doesn't require 'az login' to have happened before using azureml-sdk, except the case when you are specifying AzureCliAuthentication in azureml-sdk.


Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FQ7PP66YA to authenticate.
Interactive authentication successfully completed.
Wrote the config file config.json to: /home/nbuser/library/aml_config/config.json
Library configuration succeeded


## Create a Workspace Experiment

Notice in the first line of the cell below, we can re-load the config we saved previously and then display a summary of the environment.

In [9]:
ws = Workspace.from_config()

# Display a summary of the current environment 
output = {}
output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Project Directory'] = project_folder
pd.set_option('display.max_colwidth', -1)
pd.DataFrame(data=output, index=['']).T

Found the config file in: /home/nbuser/library/aml_config/config.json


Unnamed: 0,Unnamed: 1
Location,westcentralus
Project Directory,./automl-regression
Resource Group,aml-workspace-tech-immersion
SDK version,1.0.15
Subscription ID,30fc406c-c745-44f0-be2d-63b1c860cde0
Workspace,aml-workspace


Next, create a new Experiment. 

In [10]:
experiment = Experiment(ws, experiment_name)

## Get and explore the Vehicle Telemetry Data

Run the following cell to download and examine the vehicle telemetry data. The model you will build will try to predict how many days until the battery has a freeze event. Which features (columns) do you think will be useful?

In [19]:
data = pd.read_csv(data_url)
data

Unnamed: 0,Survival_In_Days,Province,Region,Trip_Length_Mean,Trip_Length_Sigma,Trips_Per_Day_Mean,Trips_Per_Day_Sigma,Battery_Rated_Cycles,Manufacture_Month,Manufacture_Year,...,Sensor_Reading_52,Sensor_Reading_53,Sensor_Reading_54,Sensor_Reading_55,Sensor_Reading_56,Sensor_Reading_57,Sensor_Reading_58,Sensor_Reading_59,Sensor_Reading_60,Sensor_Reading_61
0,1283,Bretagne,West,18.103250,6.034416,4.733162,1.183291,275,M8,Y2010,...,16.418910,17.441310,24.718290,11.812310,19.437210,15.079740,16.982440,18.893610,13.590000,14.510940
1,1427,Occitanie,South,14.637070,4.879023,4.325950,1.081487,250,M8,Y2014,...,14.703280,16.154500,27.789550,22.292230,29.158610,21.739530,23.830780,19.480210,10.264120,18.009700
2,1436,Auvergne_Rhone_Alpes,South,14.505640,4.835215,4.418737,1.104684,250,M9,Y2018,...,22.389700,21.834420,28.743260,26.313940,15.589060,15.317560,19.613730,28.397800,19.807990,15.425770
3,894,Martinique,West,20.850520,6.950172,4.284968,1.071242,200,M10,Y2003,...,2.794836,13.993500,15.524580,6.298875,11.355190,14.396860,2.890394,6.362495,10.916070,10.004320
4,1539,Reunion,South,11.579590,3.859862,4.561532,1.140383,200,M10,Y2007,...,26.631860,26.116980,18.011900,25.257760,25.320780,26.894640,18.863220,25.744930,24.027720,23.657220
5,1872,Marseille,South,14.070980,4.690325,4.697100,1.174275,300,M11,Y2011,...,11.889280,7.358676,10.700270,8.218617,13.397930,2.973648,11.031080,3.532511,12.841720,8.153067
6,151,Ile_de_France,MidWest,13.388510,4.462836,4.539887,1.134972,300,M12,Y2015,...,-13.361990,12.716510,-25.999620,-0.855164,-19.726040,5.154581,-9.921854,-0.260530,-20.260940,6.349902
7,1975,Normandie,MidWest,16.718670,5.572891,4.641222,1.160305,275,M12,Y2000,...,21.916700,14.228060,11.378330,-0.157791,13.303480,7.164655,10.716000,4.709601,19.316740,-0.762613
8,1957,Paris,MidWest,12.280450,4.093483,4.417785,1.104446,275,M1,Y2005,...,-10.007310,5.053398,-12.770300,4.404034,-8.639040,3.357793,-12.641930,-0.040223,-11.433320,4.471225
9,1150,Corse,South,19.615720,6.538573,4.318250,1.079563,250,M2,Y2009,...,17.566970,0.334354,11.381730,-9.591447,9.592852,-2.911261,8.209939,-2.191684,12.873960,-3.348420


## Remotely train multiple models using Auto ML and Azure ML Compute

In the following cells, you will *not* train the model against the data you just downloaded using the resources provided by Azure Notebooks. Instead, you will deploy an Azure ML Compute cluster that will download the data and use Auto ML to train multiple models, evaluate the performance and allow you to retrieve the best model that was trained. In other words, all of the training will be performed remotely with respect to this notebook. 


As you will see this is almost entirely done thru configuration, with very little code required. 

### Create the data loading script for remote compute

The Azure Machine Learning Compute cluster needs to know how to get the data to train against. You can package this logic in a script that will be executed by the compute when it starts executing the training.

Run the following cells to locally create the **get_data.py** script that will be deployed to remote compute. You will also use this script when you want train the model locally. 

Observe that the get_data method returns the features (`X`) and the labels (`Y`) in an object. This structure is expected later when you will configure Auto ML.

In [20]:
# create project folder
if not os.path.exists(project_folder):
    os.makedirs(project_folder)

In [21]:
%%writefile $project_folder/get_data.py

import pandas as pd
import numpy as np

def get_data():
    
    data = pd.read_csv("https://databricksdemostore.blob.core.windows.net/data/connected-car/training-formatted.csv")
    
    X = data.iloc[:,1:73]
    Y = data.iloc[:,0].values.flatten()

    return { "X" : X, "y" : Y }

Overwriting ./automl-regression/get_data.py


### Create AML Compute Cluster

Now you are ready to create the compute cluster. Run the following cell to create a new compute cluster (or retrieve the existing cluster if it already exists). The code below will create a *CPU based* cluster where each node in the cluster is of the size `STANDARD_D12_V2`, and the cluster will have at most *4* such nodes. 

In [22]:
### Create AML CPU based Compute Cluster
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D12_V2',
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# Use the 'status' property to get a detailed status for the current AmlCompute. 
print(compute_target.status.serialize())

Found existing compute target.
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-03-07T18:20:32.447000+00:00', 'creationTime': '2019-03-07T16:13:20.574361+00:00', 'currentNodeCount': 0, 'errors': None, 'modifiedTime': '2019-03-07T16:14:52.676046+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 0, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D12_V2'}


### Instantiate an Automated ML Config

Run the following cell to configure the Auto ML run. In short what you are configuring here is the training of a regressor model that will attempt to predict the value of the first feature (`Survival_in_days`) based on all the other features in the data set. The run is configured to try at most 3 iterations where no iteration can run longer that 2 minutes. 

Additionally, the data will be automatically pre-processed in different ways as a part of the automated model training (as indicated by the `preprocess` attribute having a value of `True`. This is a very powerful feature of Auto ML as it tries many best practices approaches for you, and saves you a lot of time and effort in the process.

The goal of Auto ML in this case is to find the best models that result, as measure by the normalized root mean squared error metric (as indicated by the `primary_metric` attribute). The error is basically a measure of what the model predicts versus what was provided as the "answer" in the training data. In short, AutoML will try to get the error as low as possible when trying its combination of approaches.  

The local path to the script you created to retrieve the data is supplied to the AutoMLConfig, ensuring the file is made available to the remote cluster. The actual execution of this training will occur on the compute cluster you created previously. 

In general, the AutoMLConfig is very flexible, allowing you to specify all of the following:
- Task type (classification, regression, forecasting)
- Number of algorithm iterations and maximum time per iteration
- Accuracy metric to optimize
- Algorithms to blacklist (skip)/whitelist (include)
- Number of cross-validations
- Compute targets
- Training data

Run the following cell to create the configuration.

In [25]:
automl_config = AutoMLConfig(task = 'regression',
                             iterations = 3,
                             iteration_timeout_minutes = 2, 
                             max_cores_per_iteration = 10,
                             preprocess= True,
                             primary_metric='normalized_root_mean_squared_error',
                             n_cross_validations = 5,
                             debug_log = 'automl.log',
                             verbosity = logging.DEBUG,
                             data_script = project_folder + "/get_data.py",
                             path = project_folder)

## Run locally

You can run AutomML locally, that is it will use the resource provided by your Azure Notebook environment. 

Run the following cell to run the experiment locally. Note this will take **a few minutes** because the local environment is not very fast or powerful (which we will remedy shortly).

In [26]:
local_run = experiment.submit(automl_config, show_output=True)
local_run

Running on local machine
Parent Run ID: AutoML_2b399222-b0d2-4662-90fa-eb0f15384180
*******************************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
TRAINFRAC: Fraction of the training data to train on.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
*******************************************************************************************************************

 ITERATION   PIPELINE                                       TRAINFRAC  DURATION      METRIC      BEST
         0   StandardScalerWrapper ElasticNet               1.0000     0:00:59       0.1005    0.1005
         1   TruncatedSVDWrapper ElasticNet                 1.0000     0:00:32       0.1202    0.1005
         2   Ensemble                                       1.0000

Experiment,Id,Type,Status,Details Page,Docs Page
automl-regression,AutoML_2b399222-b0d2-4662-90fa-eb0f15384180,automl,Completed,Link to Azure Portal,Link to Documentation


### Run our Experiment on AML Compute

Let's increase the performance by performing the training the AML Compute cluster. This will remotely train multiple models, evaluate them and allow you review the performance characteristics of each one, as well as to pick the *best model* that was trained and download it. 

We will alter the configuration slightly to perform more iterations. Run the following cell to execute the experiment on the remote compute cluster.

In [27]:
automl_config = AutoMLConfig(task = 'regression',
                             iterations = 4,
                             iteration_timeout_minutes = 10, 
                             max_cores_per_iteration = 10,
                             preprocess= True,
                             primary_metric='normalized_root_mean_squared_error',
                             n_cross_validations = 5,
                             debug_log = 'automl.log',
                             verbosity = logging.DEBUG,
                             data_script = project_folder + "/get_data.py",
                             compute_target = compute_target,
                             path = project_folder)
remote_run = experiment.submit(automl_config, show_output=False)
remote_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-regression,AutoML_c48adbe5-e51d-4564-a99e-ebb6fc1c9929,automl,Preparing,Link to Azure Portal,Link to Documentation


Once the above cell completes, the run is starting but will likely have a status of `Preparing` for you. To wait for the run to complete before continuing (and to view the training status updates as they happen), run the following cell:

In [28]:
remote_run.wait_for_completion(show_output=True)


*******************************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
TRAINFRAC: Fraction of the training data to train on.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
*******************************************************************************************************************

 ITERATION   PIPELINE                                       TRAINFRAC  DURATION      METRIC      BEST
         0   StandardScalerWrapper ElasticNet               1          0:01:00       0.1005    0.1005
         1   TruncatedSVDWrapper ElasticNet                 1          0:00:47       0.1199    0.1005
         2   StandardScalerWrapper ElasticNet               1          0:01:03       0.1270    0.1005
         3   StandardScalerWrapper ElasticNet  

{'runId': 'AutoML_c48adbe5-e51d-4564-a99e-ebb6fc1c9929',
 'target': 'cpucluster',
 'status': 'Completed',
 'startTimeUtc': '2019-03-08T22:47:15.401759Z',
 'endTimeUtc': '2019-03-08T22:56:34.246012Z',
 'properties': {'num_iterations': '10',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'normalized_root_mean_squared_error',
  'train_split': '0',
  'MaxTimeSeconds': '600',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'cpucluster',
  'DataPrepJsonString': None,
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'dependencies_versions': '{"azureml-widgets": "1.0.15", "azureml-train": "1.0.15", "azureml-train-restclients-hyperdrive": "1.0.15", "azureml-train-core": "1.0.15", "azureml-train-automl": "1.0.15", "azureml-telemetry": "1.0.15", "azureml-sdk": "1.0.15.1", "azureml-pipeline": "1.0.15", "azureml-pipeline-steps": "1.0.15", "azureml-pipeline-core": "1.0.15", "azureml-explain-m

### List the Experiments from your Workspace

Using the Azure Machine Learning SDK, you can retrieve any of the experiments in your Workspace and drill into the details of any runs the experiment contains. Run the following cell to explore the number of runs by experiment name.

In [32]:
ws = Workspace.from_config()
experiment_list = Experiment.list(workspace=ws)

summary_df = pd.DataFrame(index = ['No of Runs'])
pattern = re.compile('^AutoML_[^_]*$')
for experiment in experiment_list:
    all_runs = list(experiment.get_runs())
    automl_runs = []
    for run in all_runs:
        if(pattern.match(run.id)):
            automl_runs.append(run)    
    summary_df[experiment.name] = [len(automl_runs)]
    
pd.set_option('display.max_colwidth', -1)
summary_df.T

Found the config file in: /home/nbuser/library/aml_config/config.json


Unnamed: 0,No of Runs
automl-regression,6


### List the Automated ML Runs for the Experiment

Similarly, you can view all of the runs that ran supporting Auto ML:

In [33]:
proj = ws.experiments[experiment_name]
summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])
pattern = re.compile('^AutoML_[^_]*$')
all_runs = list(proj.get_runs(properties={'azureml.runsource': 'automl'}))
for run in all_runs:
    if(pattern.match(run.id)):
        properties = run.get_properties()
        tags = run.get_tags()
        amlsettings = eval(properties['RawAMLSettingsString'])
        if 'iterations' in tags:
            iterations = tags['iterations']
        else:
            iterations = properties['num_iterations']
        summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]
    
from IPython.display import HTML
projname_html = HTML("<h3>{}</h3>".format(proj.name))

from IPython.display import display
display(projname_html)
display(summary_df.T)

Unnamed: 0,Type,Status,Primary Metric,Iterations,Compute,Name
AutoML_c48adbe5-e51d-4564-a99e-ebb6fc1c9929,regression,Completed,normalized_root_mean_squared_error,10,cpucluster,automl-regression
AutoML_2b399222-b0d2-4662-90fa-eb0f15384180,regression,Completed,normalized_root_mean_squared_error,3,local,automl-regression
AutoML_4fde9917-479a-4531-ab0e-b27dea36ea90,regression,Canceled,normalized_root_mean_squared_error,10,local,automl-regression
AutoML_c68492de-f8c0-4cd0-aafc-0d4acf106943,regression,Completed,normalized_root_mean_squared_error,10,cpucluster,automl-regression
AutoML_d7ceb6dc-2572-4be3-b284-1712fffd47c2,regression,Completed,normalized_root_mean_squared_error,10,local,automl-regression
AutoML_f85a80f5-dd65-4a53-881c-f2fad4a2ae22,regression,Completed,normalized_root_mean_squared_error,10,local,automl-regression


### Display Automated ML Run Details
For a particular run, you can display the details of how th run performed against the performance metric. The Azure Machine Learning SDK includes a built-in widget that graphically summarizes the run. 

Execute the following cell to see it.

In [34]:
run_id = remote_run.id

from azureml.widgets import RunDetails

experiment = Experiment(ws, experiment_name)
ml_run = AutoMLRun(experiment=experiment, run_id=run_id)

RunDetails(ml_run).show() 

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': True, 'log_level': 'INFO', 'sd…

### Get the best run and the trained model

At this point you have multiple runs, each with a different trained models. How can you get the model that performed the best? Run the following cells to learn how.

In [35]:
best_run, fitted_model = remote_run.get_output()
print(best_run)
print(fitted_model)

Run(Experiment: automl-regression,
Id: AutoML_c48adbe5-e51d-4564-a99e-ebb6fc1c9929_9,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
     steps=[('datatransformer', DataTransformer(logger=None, task=None)), ('prefittedsoftvotingregressor', PreFittedSoftVotingRegressor(estimators=[('LightGBM', Pipeline(memory=None,
     steps=[('standardscalerwrapper', <automl.client.core.common.model_wrappers.StandardScalerWrapper object at 0x7f0b773dd5f8>), ('lightgbmregressor', <automl.client.core.common.model_wrappers.LightGBMRegressor object at 0x7f0b773dd978>)]))],
               flatten_transform=None, weights=[1.0]))])


You can query for the best run when evaluated using a specific metric. 

In [36]:
# show run and model by a specific metric
lookup_metric = "root_mean_squared_error"
best_run, fitted_model = remote_run.get_output(metric = lookup_metric)
print(best_run)
print(fitted_model)

Run(Experiment: automl-regression,
Id: AutoML_c48adbe5-e51d-4564-a99e-ebb6fc1c9929_9,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
     steps=[('datatransformer', DataTransformer(logger=None, task=None)), ('prefittedsoftvotingregressor', PreFittedSoftVotingRegressor(estimators=[('LightGBM', Pipeline(memory=None,
     steps=[('standardscalerwrapper', <automl.client.core.common.model_wrappers.StandardScalerWrapper object at 0x7f0b77e7a8d0>), ('lightgbmregressor', <automl.client.core.common.model_wrappers.LightGBMRegressor object at 0x7f0b77f6ffd0>)]))],
               flatten_transform=None, weights=[1.0]))])


You can retrieve a specific iteration from a run.

In [37]:
# show run and model from iteration 3
iteration = 3
third_run, third_model = remote_run.get_output(iteration=iteration)
print(third_run)
print(third_model)

Run(Experiment: automl-regression,
Id: AutoML_c48adbe5-e51d-4564-a99e-ebb6fc1c9929_3,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
     steps=[('datatransformer', DataTransformer(logger=None, task=None)), ('standardscalerwrapper', <automl.client.core.common.model_wrappers.StandardScalerWrapper object at 0x7f0b7725c518>), ('elasticnet', ElasticNet(alpha=0.2113157894736842, copy_X=True, fit_intercept=True,
      l1_ratio=0.7394736842105263, max_iter=1000, normalize=False,
      positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False))])


At this point you now have a model you could use for predicting the time until battery failure. You would typically use this model in one of two ways:
- Use the model file within other notebooks to batch score predictions.
- Deploy the model file as a web service that applications can call. 

In the following, you will explore the latter option to deploy the best model as a web service.

## Download the best model 
With a run object in hand, it is trivial to download the model. 

In [38]:
# fetch the best model
best_run.download_file("outputs/model.pkl",
                       output_file_path = "./model.pkl")

## Deploy the Model as a Web Service

Azure Machine Learning provides a Model Registry that acts like a version controlled repository for each of your trained models. To version a model, you use  the SDK as follows. Run the following cell to register the best model with Azure Machine Learning. 

In [39]:
# register the model for deployment
model = Model.register(model_path = "model.pkl",
                       model_name = "model.pkl",
                       tags = {'area': "auto", 'type': "regression"},
                       description = "Contoso Auto model to predict battery failure",
                       workspace = ws)

print(model.name, model.description, model.version)

Registering model model.pkl
model.pkl Contoso Auto model to predict battery failure 3


Once you have a model added to the registry in this way, you can deploy web services that pull their model directly from this repository when they first start up.

### Create Scoring File

Azure Machine Learning SDK gives you control over the logic of the web service, so that you can define how it retrieves the model and how the model is used for scoring. This is an important bit of flexibility. For example, you often have to prepare any input data before sending it to your model for scoring. You can define this data preparation logic (as well as the model loading approach) in the scoring file. 

Run the following cell to create a scoring file that will be included in the Docker Image that contains your deployed web service.

In [40]:
%%writefile score.py
import pickle
import json
import numpy
import pandas as pd
import azureml.train.automl
from sklearn.externals import joblib
from azureml.core.model import Model

def init():
    global model
    model_path = Model.get_model_path('model.pkl') # this name is model.id of model that we want to deploy
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)

def run(rawdata):
    try:
        data = pd.read_json(rawdata,orient="split")
        result = model.predict(data)
    except Exception as e:
        result = str(e)
        return json.dumps({"error": result})
    return json.dumps({"result":result.tolist()})

Overwriting score.py


### Create Environment Dependency File

When you deploy a model as web service to either Azure Container Instance or Azure Kubernetes Service, you are deploying a Docker container. The first steps towards deploying involve defining the contents of that container. In the following cell, you create Conda Dependencies YAML file that describes what Python packages need to be installed in the container- in this case you specify scikit-learn, numpy, pandas and the Azure ML SDK. 

Execute the following cell. The output will show the conda dependencies file content.

In [41]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies.create(conda_packages=['numpy','pandas','scikit-learn'],pip_packages=['azureml-sdk[notebooks,automl]'])
print(myenv.serialize_to_string())

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

# Conda environment specification. The dependencies defined in this file will
# be automatically provisioned for runs with userManagedDependencies=False.

# Details about the Conda environment file format:
# https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  - azureml-sdk[notebooks,automl]==1.0.15
- numpy
- pandas
- scikit-learn



### Create Container Image

To create a Container Image, you need three things: the scoring script file, the runtime configuration (defining whether Python or PySpark should be used) and the Conda Dependencies file. Calling `ContainerImage.image_configuration` will capture all of the container image configuration in a single object. This Image will be stored in an instance of Azure Container Registry that is associated with your Azure Machine Learning Workspace. 

Execute the following cell.

In [77]:
from azureml.core.image import ContainerImage

image_config = ContainerImage.image_configuration(execution_script = "score.py",
                                                  runtime = "python",
                                                  conda_file = "myenv.yml",
                                                  description = "Image with regression model",
                                                  tags = {'area': "auto", 'type': "regression"}
                                                 )

image = ContainerImage.create(name = "contosoimage",
                              # this is the model object
                              models = [model],
                              image_config = image_config,
                              workspace = ws)

image.wait_for_creation(show_output = True)

Creating image
Running............................................................
SucceededImage creation operation finished for image contosoimage:2, operation "Succeeded"


### Create AKS Compute Cluster

With the Container Image configuration in hand, you are almost ready to deploy to AKS.The first step is to create your AKS cluster. 

Execute the following cell to provision a small AKS cluster. This step will take about **15-20 minutes**.

In [44]:
%%time

# Use the default configuration (can also provide parameters to customize)
prov_config = AksCompute.provisioning_configuration(location='westus2')

# Create the cluster
aks_target = ComputeTarget.create(workspace = ws, 
                          name = aks_cluster_name, 
                          provisioning_configuration = prov_config)

aks_target.wait_for_completion(True)
print("state:" + aks_target.provisioning_state)
print(aks_target.provisioning_errors)

SucceededProvisioning operation finished, operation "Succeeded"
state:Succeeded
None
CPU times: user 995 ms, sys: 863 ms, total: 1.86 s
Wall time: 3.18 s


### Deploy AKS Hosted Web Service

Now you are ready to deploy your web service to the AKS cluster. To deploy the container that operationalizes your model as a webservice, you can use `Webservice.deploy_from_image` which will use your registered Docker Image, pulling it from the Container Registry, and run the created container in AKS. 

Notice in the creation of aks_config, `collect_model_data` and `enable_app_insights` are both enabled. Model data can be collected for the purposes of monitoring the model in production- the inputs to the model and the predictions that result are logged to Azure Storage Blobs. The model data can analyzed using Power BI or Azure Databricks, or any other tool that can process CSV files. This is important for you to be able to keep tabs on the model in production and spot signals that might mean you need to retrain the model. Application Insights will collect diagnostic telemetry such as request rates, response times, failure rates and errors. This telemetry can be monitored and explored using the Application Insights instance accessed within the Azure Portal.  

Execute the following cell to deploy your webservice to AKS. This step will take **2-3 minutes** to complete.

In [85]:
%%time
aks_service_name ='aks-automl-service'

aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)
aks_service = Webservice.deploy_from_image(workspace = ws, 
                                           name = aks_service_name,
                                           image = image,
                                           deployment_config = aks_config,
                                           deployment_target = aks_target
                                           )
aks_service.wait_for_deployment(show_output = True)
print(aks_service.state)

Creating service
Running....................
SucceededAKS service creation operation finished, operation "Succeeded"
Healthy
CPU times: user 1.61 s, sys: 607 ms, total: 2.21 s
Wall time: 2min 15s


### Test the deployed web service

With the deployed web service ready, you are now ready to test calling the service with some car telemetry to see the scored results. There are three ways to approach this:
1. You could use the `aks_service` object that you acquired in the previous cell to call the service directly.
2. You could use the `Webservice` class to get a reference to a deployed web service by name.
3. You could use any client capable of making a REST call.

In this notebook, we will take the second approach. Run the following cells to retrieve the web service by name and then to invoke it using some sample car telemetry.

The output of this cell will be an array of numbers, where each number represents the expected battery lifetime in days for the corresponding row of vehicle data.

In [47]:
%%time
# connect to the deployed webservice
aks_service_name = 'aks-automl-service'
aks_service = Webservice(ws,aks_service_name)

# load some test vehicle data that the model has not seen
test_data = pd.read_csv(test_data_url)

# prepare the data and select five vehicles
test_data = test_data.drop(columns=["Car_ID", "Battery_Age"])
test_data.rename(columns={'Twelve_hourly_temperature_forecast_for_next_31_days_reversed': 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first'}, inplace=True)
test_data_json = test_data.iloc[:5].to_json(orient="split")
prediction = aks_service.run(input_data = test_data_json)
print(prediction)

{"result": [1577.0654482150553, 1299.0697260622546, 1523.4073833156306, 1472.6954902231262, 1625.8477164874794]}
CPU times: user 1.26 s, sys: 1.39 s, total: 2.65 s
Wall time: 3.16 s
