Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/NotebookVM/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.png)

# Azure Machine Learning Pipeline with AutoMLStep
This notebook demonstrates the use of AutoMLStep in Azure Machine Learning Pipeline.

## Introduction
In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline. 

The general steps for opreationalizing machine learning will be:
1. Create an `Experiment` in an existing `Workspace`.
2. Create or Attach existing AmlCompute to a workspace.
3. Define data loading in a `TabularDataset`.
4. Configure AutoML using `AutoMLConfig`.
5. Configure AutoMLStep
6. Train the model using AmlCompute
7. Explore the results.
8. Test the best fitted model.
9. Deploy the model to a REST Endpoint
10. Add logging to the Endpoint
11. Check the Swagger documentation for the endpoint
12. Benchmark the endpoint
13. Publish and run from the REST endpoint

## Azure Machine Learning and Pipeline SDK-specific imports

In [1]:
# %load_ext lab_black

import logging
import os
import csv

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset

from azureml.pipeline.steps import AutoMLStep

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.19.0


## Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

In [2]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep="\n")

quick-starts-ws-132681
aml-quickstarts-132681
southcentralus
1b944a9b-fdae-4f97-aeb1-b7eea0beac53


## Create an Azure ML experiment
Let's create an experiment named "automlstep-classification" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step.

*Udacity Note:* There is no need to create an Azure ML experiment, this needs to re-use the experiment that was already created


In [3]:
# Choose a name for the run history container in the workspace.
# NOTE: update these to match your existing experiment name
experiment_name = "bank-marketing-prediction-automl"

experiment = Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
bank-marketing-prediction-automl,quick-starts-ws-132681,Link to Azure Machine Learning studio,Link to Documentation


### Create or Attach an AmlCompute cluster
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you get the default `AmlCompute` as your training compute resource.

**Udacity Note** There is no need to create a new compute target, it can re-use the previous cluster

In [4]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# NOTE: update the cluster name to match the existing cluster
# Choose a name for your CPU cluster
amlcompute_cluster_name = "automl-compute"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print("Found existing cluster, use it.")
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(
        vm_size="STANDARD_D2_V2",  # for GPU, use "STANDARD_NC6"
        min_nodes=1,
        max_nodes=4,
    )
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(
    show_output=True, min_node_count=1, timeout_in_minutes=5
)
# For a more detailed view of current AmlCompute status, use get_status().

Found existing cluster, use it.
Succeeded..........................................................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


## Data

**Udacity note:** Make sure the `key` is the same name as the dataset that is uploaded, and that the description matches. If it is hard to find or unknown, loop over the `ws.datasets.keys()` and `print()` them.
If it *isn't* found because it was deleted, it can be recreated with the link that has the CSV 

In [5]:
# Try to load the dataset from the Workspace. Otherwise, create it from the file
# NOTE: update the key to match the dataset name
found = False
key = "Bank-marketing"
description_text = "Bank Marketing DataSet for Udacity Course 2"

if key in ws.datasets.keys():
    print("Found existing dataset, using")
    found = True
    dataset = ws.datasets[key]

if not found:
    # Create AML Dataset and register it into Workspace
    print(f"Did not find existing dataset with key {key}, creating")
    example_data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv"
    dataset = Dataset.Tabular.from_delimited_files(example_data)
    # Register Dataset in Workspace
    dataset = dataset.register(workspace=ws, name=key, description=description_text)


df = dataset.to_pandas_dataframe()
df.describe()

Found existing dataset, using


Unnamed: 0,age,duration,campaign,pdays,previous,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed
count,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0
mean,40.040212,257.335205,2.56173,962.17478,0.17478,0.076228,93.574243,-40.51868,3.615654,5166.859608
std,10.432313,257.3317,2.763646,187.646785,0.496503,1.572242,0.578636,4.623004,1.735748,72.208448
min,17.0,0.0,1.0,0.0,0.0,-3.4,92.201,-50.8,0.634,4963.6
25%,32.0,102.0,1.0,999.0,0.0,-1.8,93.075,-42.7,1.344,5099.1
50%,38.0,179.0,2.0,999.0,0.0,1.1,93.749,-41.8,4.857,5191.0
75%,47.0,318.0,3.0,999.0,0.0,1.4,93.994,-36.4,4.961,5228.1
max,98.0,4918.0,56.0,999.0,7.0,1.4,94.767,-26.9,5.045,5228.1


### Review the Dataset Result

You can peek the result of a TabularDataset at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only `j` records for all the steps in the TabularDataset, which makes it fast even against large datasets.

`TabularDataset` objects are composed of a list of transformation steps (optional).

In [6]:
dataset.take(5).to_pandas_dataframe()

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,57,technician,married,high.school,no,no,yes,cellular,may,mon,...,1,999,1,failure,-1.8,92.893,-46.2,1.299,5099.1,no
1,55,unknown,married,unknown,unknown,yes,no,telephone,may,thu,...,2,999,0,nonexistent,1.1,93.994,-36.4,4.86,5191.0,no
2,33,blue-collar,married,basic.9y,no,no,no,cellular,may,fri,...,1,999,1,failure,-1.8,92.893,-46.2,1.313,5099.1,no
3,36,admin.,married,high.school,no,no,no,telephone,jun,fri,...,4,999,0,nonexistent,1.4,94.465,-41.8,4.967,5228.1,no
4,27,housemaid,married,high.school,no,yes,no,cellular,jul,fri,...,2,999,0,nonexistent,1.4,93.918,-42.7,4.963,5228.1,no


## Train
This creates a general AutoML settings object.
**Udacity notes:** These inputs must match what was used when training in the portal. `label_column_name` has to be `y` for example.

In [7]:
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 4,
    "primary_metric": "AUC_weighted",
}
automl_config = AutoMLConfig(
    compute_target=compute_target,
    task="classification",
    training_data=dataset,
    label_column_name="y",
#     path=project_folder,
    enable_early_stopping=True,
    featurization="auto",
    debug_log="automl_errors.log",
    model_explainability=True,
    **automl_settings
)

#### Create Pipeline and AutoMLStep

You can define outputs for the AutoMLStep using TrainingOutput.

In [8]:
from azureml.pipeline.core import PipelineData, TrainingOutput

ds = ws.get_default_datastore()
metrics_output_name = "metrics_output"
best_model_output_name = "best_model_output"

metrics_data = PipelineData(
    name="metrics_data",
    datastore=ds,
    pipeline_output_name=metrics_output_name,
    training_output=TrainingOutput(type="Metrics"),
)
model_data = PipelineData(
    name="model_data",
    datastore=ds,
    pipeline_output_name=best_model_output_name,
    training_output=TrainingOutput(type="Model"),
)

Create an AutoMLStep.

In [9]:
automl_step = AutoMLStep(
    name="automl_module",
    automl_config=automl_config,
    outputs=[metrics_data, model_data],
    allow_reuse=True,
)

In [10]:
from azureml.pipeline.core import Pipeline

pipeline = Pipeline(
    description="pipeline_with_automlstep", workspace=ws, steps=[automl_step]
)

In [11]:
pipeline_run = experiment.submit(pipeline)

Created step automl_module [3f0e8227][28f4422c-8ea2-4716-8dde-f3b90cf0a6a3], (This step will run and generate new outputs)
Submitted PipelineRun d390b98e-7e66-4c59-8420-20e8bddd3a9a
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/bank-marketing-prediction-automl/runs/d390b98e-7e66-4c59-8420-20e8bddd3a9a?wsid=/subscriptions/1b944a9b-fdae-4f97-aeb1-b7eea0beac53/resourcegroups/aml-quickstarts-132681/workspaces/quick-starts-ws-132681


In [12]:
# pipeline_run = filtered_list_runs[0]

In [13]:
from azureml.widgets import RunDetails

RunDetails(pipeline_run).show()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

In [14]:
pipeline_run.wait_for_completion()

PipelineRunId: d390b98e-7e66-4c59-8420-20e8bddd3a9a
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/bank-marketing-prediction-automl/runs/d390b98e-7e66-4c59-8420-20e8bddd3a9a?wsid=/subscriptions/1b944a9b-fdae-4f97-aeb1-b7eea0beac53/resourcegroups/aml-quickstarts-132681/workspaces/quick-starts-ws-132681
PipelineRun Status: NotStarted
PipelineRun Status: Running


This usually indicates a package conflict with one of the dependencies of azureml-core or azureml-pipeline-core.
Please check for package conflicts in your python environment






PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': 'd390b98e-7e66-4c59-8420-20e8bddd3a9a', 'status': 'Completed', 'startTimeUtc': '2020-12-31T21:06:36.611942Z', 'endTimeUtc': '2020-12-31T21:51:27.038724Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}'}, 'inputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://mlstrg132681.blob.core.windows.net/azureml/ExperimentRun/dcid.d390b98e-7e66-4c59-8420-20e8bddd3a9a/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=nsInCXsQWglqyvtyvcdjI%2BnZZRRsfM%2F4opeLefHkFVw%3D&st=2020-12-31T20%3A56%3A58Z&se=2021-01-01T05%3A06%3A58Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://mlstrg132681.blob.core.windows.net/azureml/ExperimentRun/dcid.d390b98e-7e66-4c59-8420-20e8bddd3a9a/logs/azureml/stderrlogs.txt?sv=2019-02-02&sr=b&sig=Kb%2BVUPLvSGZvZb2Af00Th7lLeOPL3dwIszGJv869WS4%3D&st=2020-12-31T20%3A56%3A58Z&se=2021-01-01T05%3A06%3A58Z&sp=r', 

'Finished'

## Examine Results

### Retrieve the metrics of all child runs
Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will examine the outputs by retrieve output data and running some tests.

In [15]:
metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)
num_file_downloaded = metrics_output.download('.', show_progress=True)

Downloading azureml/4e731276-35c2-4e61-86fd-7d7ecb073f7b/metrics_data
Downloaded azureml/4e731276-35c2-4e61-86fd-7d7ecb073f7b/metrics_data, 1 files out of an estimated total of 1


In [16]:
import json
with open(metrics_output._path_on_datastore) as f:
    metrics_output_result = f.read()
    
deserialized_metrics_output = json.loads(metrics_output_result)
df = pd.DataFrame(deserialized_metrics_output)
df

Unnamed: 0,4e731276-35c2-4e61-86fd-7d7ecb073f7b_19,4e731276-35c2-4e61-86fd-7d7ecb073f7b_18,4e731276-35c2-4e61-86fd-7d7ecb073f7b_25,4e731276-35c2-4e61-86fd-7d7ecb073f7b_39,4e731276-35c2-4e61-86fd-7d7ecb073f7b_48,4e731276-35c2-4e61-86fd-7d7ecb073f7b_57,4e731276-35c2-4e61-86fd-7d7ecb073f7b_27,4e731276-35c2-4e61-86fd-7d7ecb073f7b_46,4e731276-35c2-4e61-86fd-7d7ecb073f7b_13,4e731276-35c2-4e61-86fd-7d7ecb073f7b_20,...,4e731276-35c2-4e61-86fd-7d7ecb073f7b_3,4e731276-35c2-4e61-86fd-7d7ecb073f7b_9,4e731276-35c2-4e61-86fd-7d7ecb073f7b_24,4e731276-35c2-4e61-86fd-7d7ecb073f7b_47,4e731276-35c2-4e61-86fd-7d7ecb073f7b_49,4e731276-35c2-4e61-86fd-7d7ecb073f7b_22,4e731276-35c2-4e61-86fd-7d7ecb073f7b_10,4e731276-35c2-4e61-86fd-7d7ecb073f7b_28,4e731276-35c2-4e61-86fd-7d7ecb073f7b_0,4e731276-35c2-4e61-86fd-7d7ecb073f7b_52
average_precision_score_macro,[0.787108791806625],[0.7297089639785652],[0.8004617629290862],[0.8022470859728432],[0.7929135412011218],[0.8241388424230657],[0.806928657498049],[0.8203814244080463],[0.7985126174047921],[0.7998321444303222],...,[0.7520966936363142],[0.7225755399233886],[0.8078135890376441],[0.8129298237558922],[0.7879276914464364],[0.8085204474402641],[0.7230533553111855],[0.7921758616931668],[0.8151093723721079],[0.8116395163424088]
recall_score_weighted,[0.9004552352048558],[0.7265553869499242],[0.8880121396054628],[0.9101669195751139],[0.9040971168437025],[0.9156297420333839],[0.9101669195751139],[0.9150227617602428],[0.9062215477996965],[0.9101669195751139],...,[0.7981790591805766],[0.7244309559939301],[0.9128983308042489],[0.9116843702579667],[0.9053110773899848],[0.9128983308042489],[0.7089529590288316],[0.8995447647951441],[0.9116843702579667],[0.9122913505311078]
matthews_correlation,[0.3256750549961802],[0.36811511448928846],[0.0],[0.5312309488431649],[0.3748582669141539],[0.5343874870140247],[0.5248270958682698],[0.5223051913676189],[0.3976739324324451],[0.4867731611986173],...,[0.4055126776029401],[0.31167561298057944],[0.5265756963424171],[0.5240377538867557],[0.40291493725011523],[0.5357619178216784],[0.29952562949499734],[0.31289262190926104],[0.5323740218566827],[0.5304371020063181]
AUC_micro,[0.9744804861368561],[0.8330698326659467],[0.9744062484888817],[0.9773262012383687],[0.9736378980429722],[0.980495761960574],[0.9784101077413011],[0.9800715205132161],[0.9758990146932517],[0.9766515228619257],...,[0.8740458827349112],[0.8359787326638742],[0.9788788365136859],[0.9791669449043361],[0.9752051782141056],[0.9793693944704005],[0.8490246637545737],[0.9737535835092945],[0.979695082216353],[0.9791838924567272]
AUC_macro,[0.9304904908242521],[0.8695250691399601],[0.9343744616530238],[0.9386150149949892],[0.9230531984062151],[0.9473471187206746],[0.9422938351051316],[0.9456540464242646],[0.9308878256246677],[0.9342679499932389],...,[0.8925315876535389],[0.8523188051429387],[0.9428782599514307],[0.943455275290962],[0.9297921448114003],[0.9448491887516277],[0.8591174906964381],[0.926112861607085],[0.9450464668693166],[0.9437803674003931]
precision_score_macro,[0.8202786854702324],[0.6216241137161077],[0.4440060698027314],[0.7764328692696766],[0.8167916410896852],[0.7997581162207845],[0.7775343175343176],[0.8017943637164244],[0.822098675416211],[0.7882750842617063],...,[0.6456979281148342],[0.6045222774422356],[0.7886171396772399],[0.7838506906551768],[0.8000565029792479],[0.7860811293290488],[0.5992428599201375],[0.8205384914463453],[0.7819118765348991],[0.7848448881130385]
precision_score_weighted,[0.8859664258327548],[0.894573519269382],[0.788565560086672],[0.9069013800026041],[0.8898175166222407],[0.9084860968456713],[0.9057541902822632],[0.9066567374208951],[0.8929725418691179],[0.9000274768383943],...,[0.8926227236405488],[0.878033866252836],[0.9065343959710289],[0.9058629036989744],[0.890219299216519],[0.9080335867085474],[0.8767193420377741],[0.8849783769177578],[0.9072720074188747],[0.9070348410200512]
precision_score_micro,[0.9004552352048558],[0.7265553869499242],[0.8880121396054628],[0.9101669195751139],[0.9040971168437025],[0.9156297420333839],[0.9101669195751139],[0.9150227617602428],[0.9062215477996965],[0.9101669195751139],...,[0.7981790591805766],[0.7244309559939301],[0.9128983308042489],[0.9116843702579667],[0.9053110773899848],[0.9128983308042489],[0.7089529590288316],[0.8995447647951441],[0.9116843702579667],[0.9122913505311078]
AUC_weighted,[0.9304904908242522],[0.86952506913996],[0.934374461653024],[0.9386150149949893],[0.923053198406215],[0.9473471187206747],[0.9422938351051316],[0.9456540464242648],[0.9308878256246675],[0.9342679499932388],...,[0.8925315876535389],[0.8523188051429387],[0.9428782599514307],[0.9434552752909621],[0.9297921448114004],[0.9448491887516278],[0.8591174906964382],[0.9261128616070851],[0.9450464668693167],[0.9437803674003931]
norm_macro_recall,[0.16558117392520466],[0.5570800615730014],[0.0],[0.5104427736006683],[0.2217841351345844],[0.4763340353840997],[0.4962331919969918],[0.45196787237865554],[0.24549085203770704],[0.4109757023749321],...,[0.5643200758733493],[0.46469370025210854],[0.4803629546890138],[0.4837324278916064],[0.2705164611454727],[0.5016773270945287],[0.45200028897076394],[0.1527145654231663],[0.5026785366965085],[0.49388900929337387]


### Retrieve the Best Model

In [17]:
# Retrieve best model from Pipeline Run
best_model_output = pipeline_run.get_pipeline_output(best_model_output_name)
num_file_downloaded = best_model_output.download('.', show_progress=True)

Downloading azureml/4e731276-35c2-4e61-86fd-7d7ecb073f7b/model_data
Downloaded azureml/4e731276-35c2-4e61-86fd-7d7ecb073f7b/model_data, 1 files out of an estimated total of 1


In [18]:
import pickle

with open(best_model_output._path_on_datastore, "rb" ) as f:
    best_model = pickle.load(f)
best_model

PipelineWithYTransformations(Pipeline={'memory': None,
                                       'steps': [('datatransformer',
                                                  DataTransformer(enable_dnn=None,
                                                                  enable_feature_sweeping=None,
                                                                  feature_sweeping_config=None,
                                                                  feature_sweeping_timeout=None,
                                                                  featurization_config=None,
                                                                  force_text_dnn=None,
                                                                  is_cross_validation=None,
                                                                  is_onnx_compatible=None,
                                                                  logger=None,
                                                              

In [19]:
best_model.steps

[('datatransformer',
  DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                  feature_sweeping_config=None, feature_sweeping_timeout=None,
                  featurization_config=None, force_text_dnn=None,
                  is_cross_validation=None, is_onnx_compatible=None, logger=None,
                  observer=None, task=None, working_dir=None)),
 ('prefittedsoftvotingclassifier',
  PreFittedSoftVotingClassifier(classification_labels=None,
                                estimators=[('36',
                                             Pipeline(memory=None,
                                                      steps=[('standardscalerwrapper',
                                                              <azureml.automl.runtime.shared.model_wrappers.StandardScalerWrapper object at 0x7fc1d8e20860>),
                                                             ('xgboostclassifier',
                                                              XGBoostClassifier(ba

### Get the best run and fitted model, explanations

get the actual AutoML object

In [20]:
from azureml.train.automl.run import AutoMLRun

# workaround to get the automl run as its the last step in the pipeline 
# and get_steps() returns the steps from latest to first

for step in pipeline_run.get_steps():
    automl_step_run_id = step.id
    print(step.name)
    print(automl_step_run_id)
    break

automl_run = AutoMLRun(experiment = experiment, run_id=automl_step_run_id)
RunDetails(automl_run).show()

automl_module
4e731276-35c2-4e61-86fd-7d7ecb073f7b


_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [21]:
best_run, fitted_model = automl_run.get_output()

In [23]:
from azureml.interpret import ExplanationClient

client = ExplanationClient.from_run(best_run)
engineered_explanations = client.download_model_explanation(raw=False)
print(engineered_explanations.get_feature_importance_dict())

{'duration_MeanImputer': 0.8075675095714876, 'nr.employed_MeanImputer': 0.4796247695092727, 'emp.var.rate_MeanImputer': 0.27326187125382717, 'cons.conf.idx_MeanImputer': 0.1894796817021288, 'euribor3m_MeanImputer': 0.18070663533322176, 'cons.price.idx_MeanImputer': 0.06092214117711359, 'age_MeanImputer': 0.03734697146198916, 'poutcome_CharGramCountVectorizer_success': 0.025293530298991708, 'day_of_week_CharGramCountVectorizer_wed': 0.024291001094010427, 'poutcome_CharGramCountVectorizer_failure': 0.023782547629363084, 'default_CharGramCountVectorizer_no': 0.02315698654931591, 'pdays_CharGramCountVectorizer_999': 0.022659449265281878, 'contact_ModeCatImputer_LabelEncoder': 0.018596438766382974, 'job_CharGramCountVectorizer_blue-collar': 0.01722090786956815, 'month_CharGramCountVectorizer_oct': 0.015410807064013172, 'campaign_CharGramCountVectorizer_2': 0.012589764436053943, 'education_CharGramCountVectorizer_university.degree': 0.010843924538443009, 'month_CharGramCountVectorizer_may': 

### Test the Model
#### Load Test Data
For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step.

In [24]:
dataset_test = Dataset.Tabular.from_delimited_files(
    path="https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_test.csv"
)
df_test = dataset_test.to_pandas_dataframe()
df_test = df_test[pd.notnull(df_test["y"])]

y_test = df_test["y"]
X_test = df_test.drop(["y"], axis=1)

#### Testing Our Best Fitted Model

We will use confusion matrix to see how our model works.

In [25]:
from sklearn.metrics import confusion_matrix
ypred = best_model.predict(X_test)
cm = confusion_matrix(y_test, ypred)

In [26]:
# Visualize the confusion matrix
pd.DataFrame(cm).style.background_gradient(cmap='Blues', low=0, high=0.9)

Unnamed: 0,0,1
0,3541,95
1,249,235


## Publish and run from REST endpoint

Run the following code to publish the pipeline to your workspace. In your workspace in the portal, you can see metadata for the pipeline including run history and durations. You can also run the pipeline manually from the portal.

Additionally, publishing the pipeline enables a REST endpoint to rerun the pipeline from any HTTP library on any platform.


In [27]:
published_pipeline = pipeline_run.publish_pipeline(
    name="Bankmarketing Train", description="Training bankmarketing pipeline", version="1.0")

published_pipeline


Name,Id,Status,Endpoint
Bankmarketing Train,363d38b3-7be2-4b3b-bd17-947226b95f77,Active,REST Endpoint


Authenticate once again, to retrieve the `auth_header` so that the endpoint can be used

In [28]:
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()



Get the REST url from the endpoint property of the published pipeline object. You can also find the REST url in your workspace in the portal. Build an HTTP POST request to the endpoint, specifying your authentication header. Additionally, add a JSON payload object with the experiment name and the batch size parameter. As a reminder, the process_count_per_node is passed through to ParallelRunStep because you defined it is defined as a PipelineParameter object in the step configuration.

Make the request to trigger the run. Access the Id key from the response dict to get the value of the run id.


In [29]:
import requests

rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": "pipeline-rest-endpoint"}
                        )

In [30]:
try:
    response.raise_for_status()
except Exception:    
    raise Exception("Received bad response from the endpoint: {}\n"
                    "Response Code: {}\n"
                    "Headers: {}\n"
                    "Content: {}".format(rest_endpoint, response.status_code, response.headers, response.content))

run_id = response.json().get('Id')
print('Submitted pipeline run: ', run_id)

Submitted pipeline run:  82535e70-b332-4159-8797-2e7c4ef481fd


Use the run id to monitor the status of the new run. This will take another 10-15 min to run and will look similar to the previous pipeline run, so if you don't need to see another pipeline run, you can skip watching the full output.

In [31]:
from azureml.pipeline.core.run import PipelineRun
from azureml.widgets import RunDetails

published_pipeline_run = PipelineRun(ws.experiments["pipeline-rest-endpoint"], run_id)
RunDetails(published_pipeline_run).show()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

### Register/Deploy the best model

In [32]:
description = 'Best Model from AutoML'
model = automl_run.register_model(description = description,
                                tags={'area': 'bank-marketing'})

deployment is slightly more complicated for autoML, deployed using webUI

In [35]:
# from azureml.core import Model

# service_name = 'bank-marketing-predict-service'

# service = model.deploy(ws, service_name, [model], overwrite=True)
# service.wait_for_deployment(show_output=True)

Add logging

In [34]:
%%writefile logs.py

from azureml.core import Workspace
from azureml.core.webservice import Webservice

# Requires the config to be downloaded first to the current working directory
ws = Workspace.from_config()

# Set with the deployment name
name = "voting-ensemble"

# load existing web service
service = Webservice(name=name, workspace=ws)
# enable application insights
service.update(enable_app_insights=True)
# get logs
logs = service.get_logs()

for line in logs.split("\n"):
    print(line)


Writing logs.py


In [35]:
!python logs.py

Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (pyOpenSSL 20.0.0 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages), Requirement.parse('pyopenssl<20.0.0'), {'azureml-core'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (pyOpenSSL 20.0.0 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages), Requirement.parse('pyopenssl<20.0.0'), {'azureml-core'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (pyOpenSSL 20.0.0 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages), Requirement.parse('pyopenssl<20.0.0'), {'azureml-core'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:St

test the deployed endpoint

In [53]:
%%writefile endpoint.py

import requests
import json
from azureml.core.webservice import Webservice
from azureml.core.workspace import Workspace

# URL for the web service, should be similar to:
# 'http://8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io/score'

ws = Workspace.from_config()
deployed_webservice = Webservice.list(ws)[0]
scoring_uri = deployed_webservice.scoring_uri

# If the service is authenticated, set the key or token
key = deployed_webservice.get_keys()[0]

# Two sets of data to score, so we get two results back
data = {
    "data": [
        {
            "age": 17,
            "campaign": 1,
            "cons.conf.idx": -46.2,
            "cons.price.idx": 92.893,
            "contact": "cellular",
            "day_of_week": "mon",
            "default": "no",
            "duration": 971,
            "education": "university.degree",
            "emp.var.rate": -1.8,
            "euribor3m": 1.299,
            "housing": "yes",
            "job": "blue-collar",
            "loan": "yes",
            "marital": "married",
            "month": "may",
            "nr.employed": 5099.1,
            "pdays": 999,
            "poutcome": "failure",
            "previous": 1,
        },
        {
            "age": 87,
            "campaign": 1,
            "cons.conf.idx": -46.2,
            "cons.price.idx": 92.893,
            "contact": "cellular",
            "day_of_week": "mon",
            "default": "no",
            "duration": 471,
            "education": "university.degree",
            "emp.var.rate": -1.8,
            "euribor3m": 1.299,
            "housing": "yes",
            "job": "blue-collar",
            "loan": "yes",
            "marital": "married",
            "month": "may",
            "nr.employed": 5099.1,
            "pdays": 999,
            "poutcome": "failure",
            "previous": 1,
        },
    ]
}
# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {"Content-Type": "application/json"}
# If authentication is enabled, set the authorization header
headers["Authorization"] = f"Bearer {key}"

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.json())

Overwriting endpoint.py


In [54]:
!python endpoint.py

Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (pyOpenSSL 20.0.0 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages), Requirement.parse('pyopenssl<20.0.0'), {'azureml-core'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (pyOpenSSL 20.0.0 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages), Requirement.parse('pyopenssl<20.0.0'), {'azureml-core'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (pyOpenSSL 20.0.0 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages), Requirement.parse('pyopenssl<20.0.0'), {'azureml-core'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:St

Benchmark the endpoint

In [7]:
%%writefile benchmark.sh

 ab -n 10 -v 4 -p data.json -T 'application/json' -H 'Authorization: Bearer 9Z09MfzYgZwIqDDiTZvIPkQkveabeVyD' http://b5463ec6-87c1-4861-90a5-4801000b1c7a.southcentralus.azurecontainer.io/score

Overwriting benchmark.sh


In [11]:
!sh benchmark.sh

This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking b5463ec6-87c1-4861-90a5-4801000b1c7a.southcentralus.azurecontainer.io (be patient)...INFO: POST header == 
---
POST /score HTTP/1.0
Content-length: 812
Content-type: application/json
Authorization: Bearer 9Z09MfzYgZwIqDDiTZvIPkQkveabeVyD
Host: b5463ec6-87c1-4861-90a5-4801000b1c7a.southcentralus.azurecontainer.io
User-Agent: ApacheBench/2.3
Accept: */*


---
LOG: header received:
HTTP/1.0 200 OK
Content-Length: 32
Content-Type: application/json
Date: Thu, 31 Dec 2020 22:37:27 GMT
Server: nginx/1.10.3 (Ubuntu)
X-Ms-Request-Id: 01e6fa3d-0605-4881-a4d7-c1488fc23c53
X-Ms-Run-Function-Failed: False

"{"result": ["no", "no"]}"
LOG: Response code = 200
LOG: header received:
HTTP/1.0 200 OK
Content-Length: 32
Content-Type: application/json
Date: Thu, 31 Dec 2020 22:37:28 GMT
Server: nginx