Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/NotebookVM/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.png)

# Azure Machine Learning Pipeline with AutoMLStep
This notebook demonstrates the use of AutoMLStep in Azure Machine Learning Pipeline.

## Introduction
In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline. 

If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.

In this notebook you will learn how to:
1. Create an `Experiment` in an existing `Workspace`.
2. Create or Attach existing AmlCompute to a workspace.
3. Define data loading in a `TabularDataset`.
4. Configure AutoML using `AutoMLConfig`.
5. Use AutoMLStep
6. Train the model using AmlCompute
7. Explore the results.
8. Test the best fitted model.

## Azure Machine Learning and Pipeline SDK-specific imports

In [1]:
import logging
import os
import csv

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset

from azureml.pipeline.steps import AutoMLStep

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.20.0


## Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

In [3]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

quick-starts-ws-138723
aml-quickstarts-138723
southcentralus
2c48c51c-bd47-40d4-abbe-fb8eabd19c8c


## Create an Azure ML experiment
Let's create an experiment named "automlstep-classification" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step.

*Udacity Note:* There is no need to create an Azure ML experiment, this needs to re-use the experiment that was already created


In [4]:
# Choose a name for the run history container in the workspace.
# NOTE: update these to match your existing experiment name
experiment_name = 'website_classification'
project_folder = './pipeline-project'

experiment = Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
website_classification,quick-starts-ws-138723,Link to Azure Machine Learning studio,Link to Documentation


### Create or Attach an AmlCompute cluster
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you get the default `AmlCompute` as your training compute resource.

**Udacity Note** There is no need to create a new compute target, it can re-use the previous cluster

In [5]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# NOTE: update the cluster name to match the existing cluster
# Choose a name for your CPU cluster
amlcompute_cluster_name = "autoML-compute"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=4)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)
# For a more detailed view of current AmlCompute status, use get_status().

Creating
Succeeded.................................................................................................................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [6]:
# Try to load the dataset from the Workspace. Otherwise, create it from the file
# NOTE: update the key to match the dataset name
found = False
key = "Website Classification Dataset"
description_text = "Website Classification DataSet for Capstone project"

if key in ws.datasets.keys(): 
        found = True
        dataset = ws.datasets[key] 

if not found:
        # Create AML Dataset and register it into Workspace
        example_data = 'https://raw.githubusercontent.com/Panth-Shah/nd00333-capstone/master/Dataset/malicious_website_dataset.csv'
        dataset = Dataset.Tabular.from_delimited_files(example_data)        
        #Register Dataset in Workspace
        dataset = dataset.register(workspace=ws,
                                   name=key,
                                   description=description_text)

In [7]:
df = dataset.to_pandas_dataframe()
df.describe()

Unnamed: 0,URL_LENGTH,NUMBER_SPECIAL_CHARACTERS,TCP_CONVERSATION_EXCHANGE,DIST_REMOTE_TCP_PORT,REMOTE_IPS,APP_BYTES,SOURCE_APP_PACKETS,REMOTE_APP_PACKETS,SOURCE_APP_BYTES,REMOTE_APP_BYTES,APP_PACKETS,DNS_QUERY_TIMES,Type
count,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0,1780.0,1781.0
mean,56.961258,11.111735,16.261089,5.472768,3.06064,2982.339,18.540146,18.74621,15892.55,3155.599,18.540146,2.263483,0.12128
std,27.555586,4.549896,40.500975,21.807327,3.386975,56050.57,41.627173,46.397969,69861.93,56053.78,41.627173,2.930853,0.326544
min,16.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,39.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,49.0,10.0,7.0,0.0,2.0,672.0,8.0,9.0,579.0,735.0,8.0,0.0,0.0
75%,68.0,13.0,22.0,5.0,5.0,2328.0,26.0,25.0,9806.0,2701.0,26.0,4.0,0.0
max,249.0,43.0,1194.0,708.0,17.0,2362906.0,1198.0,1284.0,2060012.0,2362906.0,1198.0,20.0,1.0


### Review the Dataset Result

You can peek the result of a TabularDataset at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only `j` records for all the steps in the TabularDataset, which makes it fast even against large datasets.

`TabularDataset` objects are composed of a list of transformation steps (optional).

In [8]:
dataset.take(5).to_pandas_dataframe()

Unnamed: 0,URL,URL_LENGTH,NUMBER_SPECIAL_CHARACTERS,CHARSET,SERVER,CONTENT_LENGTH,WHOIS_COUNTRY,WHOIS_STATEPRO,WHOIS_REGDATE,WHOIS_UPDATED_DATE,...,DIST_REMOTE_TCP_PORT,REMOTE_IPS,APP_BYTES,SOURCE_APP_PACKETS,REMOTE_APP_PACKETS,SOURCE_APP_BYTES,REMOTE_APP_BYTES,APP_PACKETS,DNS_QUERY_TIMES,Type
0,M0_109,16,7,iso-8859-1,nginx,263,,,10/10/2015 18:21,,...,0,2,700,9,10,1153,832,9,2,1
1,B0_2314,16,6,UTF-8,Apache/2.4.10,15087,,,,,...,7,4,1230,17,19,1265,1230,17,0,0
2,B0_911,16,6,us-ascii,Microsoft-HTTPAPI/2.0,324,,,,,...,0,0,0,0,0,0,0,0,0,0
3,B0_113,17,6,ISO-8859-1,nginx,162,US,AK,7/10/1997 4:00,12/09/2013 0:45,...,22,3,3812,39,37,18784,4380,39,8,0
4,B0_403,17,6,UTF-8,,124140,US,TX,12/05/1996 0:00,11/04/2017 0:00,...,2,5,4278,61,62,129889,4586,61,4,0


In [35]:
from train import data_cleaning

clean_data = data_cleaning(df)
clean_data

Unnamed: 0,URL_LENGTH,NUMBER_SPECIAL_CHARACTERS,TCP_CONVERSATION_EXCHANGE,DIST_REMOTE_TCP_PORT,REMOTE_IPS,APP_BYTES,SOURCE_APP_PACKETS,REMOTE_APP_PACKETS,SOURCE_APP_BYTES,REMOTE_APP_BYTES,...,WS_west midlands,WS_wi,WS_widestep@mail.ru,WS_wisconsin,WS_worcs,WS_wv,WS_zh,WS_zhejiang,WS_zug,Type
0,-1.486913,-0.903952,-0.228728,-0.251031,-0.313241,-0.040731,-0.229245,-0.188557,-0.211040,-0.041465,...,0,0,0,0,0,0,0,0,0,1
1,-1.486913,-1.123799,0.018249,0.070053,0.277423,-0.031272,-0.037009,0.005471,-0.209437,-0.034362,...,0,0,0,0,0,0,0,0,0,0
2,-1.486913,-1.123799,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0
3,-1.450613,-1.123799,0.364017,0.758088,-0.017909,0.014806,0.491640,0.393528,0.041400,0.021849,...,0,0,0,0,0,0,0,0,0,0
4,-1.450613,-1.123799,1.006157,-0.159292,0.572754,0.023122,1.020290,0.932496,1.632198,0.025526,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1776,4.974572,1.074670,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.339468,-0.224886,-0.056312,...,0,0,0,0,0,0,0,0,0,1
1777,5.119773,1.294517,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.361027,-0.225774,-0.056312,...,0,0,0,0,0,0,0,0,0,1
1778,5.228675,5.031916,1.648297,-0.159292,0.868086,0.065114,1.645057,1.514582,1.665014,0.067622,...,0,0,0,0,0,0,0,0,0,0
1779,6.426591,5.031916,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0


x_df = clean_data.drop('Type', axis=1).values
y_df = clean_data['Type'].values
x_df

y_df = x_df.pop("Type")
y_df

In [36]:
# prepare dataset for model training
x_df = clean_data.drop('Type', axis=1)
y_df = clean_data.pop("Type")

from sklearn.model_selection import train_test_split

(x_train, x_test, y_train, y_test) = train_test_split(x_df, y_df, test_size= 0.3, random_state = 0)
x_train

Unnamed: 0,URL_LENGTH,NUMBER_SPECIAL_CHARACTERS,TCP_CONVERSATION_EXCHANGE,DIST_REMOTE_TCP_PORT,REMOTE_IPS,APP_BYTES,SOURCE_APP_PACKETS,REMOTE_APP_PACKETS,SOURCE_APP_BYTES,REMOTE_APP_BYTES,...,WS_wc1n,WS_west midlands,WS_wi,WS_widestep@mail.ru,WS_wisconsin,WS_worcs,WS_wv,WS_zh,WS_zhejiang,WS_zug
579,-0.579401,-0.464259,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0
651,-0.506800,-0.464259,0.092342,-0.067554,1.163417,-0.026258,0.083139,0.048589,-0.020941,-0.026635,...,0,0,0,0,0,0,0,0,0,0
1629,1.562327,1.954058,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.339468,-0.224886,-0.056312,...,0,0,0,0,0,0,0,0,0,0
960,-0.180096,-0.244412,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0
527,-0.615702,-0.684106,3.451229,0.528743,2.640076,0.364071,3.447271,4.036952,3.999301,0.369774,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
835,-0.325298,-0.244412,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0
1216,0.146608,-0.464259,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0
1653,1.598627,1.954058,0.709785,1.583731,0.277423,0.031028,0.731936,1.191201,0.952180,0.033074,...,0,0,0,0,0,0,0,0,0,0
559,-0.579401,-0.464259,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0


In [37]:
# merge the output x and y dataframes into a single table for AutoML experiment
train_data_df = pd.concat([x_train, y_train], axis=1)
test_data_df = pd.concat([x_test, y_train], axis=1)

# view train dataset
train_data_df.head()

Unnamed: 0,URL_LENGTH,NUMBER_SPECIAL_CHARACTERS,TCP_CONVERSATION_EXCHANGE,DIST_REMOTE_TCP_PORT,REMOTE_IPS,APP_BYTES,SOURCE_APP_PACKETS,REMOTE_APP_PACKETS,SOURCE_APP_BYTES,REMOTE_APP_BYTES,...,WS_west midlands,WS_wi,WS_widestep@mail.ru,WS_wisconsin,WS_worcs,WS_wv,WS_zh,WS_zhejiang,WS_zug,Type
579,-0.579401,-0.464259,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0
651,-0.5068,-0.464259,0.092342,-0.067554,1.163417,-0.026258,0.083139,0.048589,-0.020941,-0.026635,...,0,0,0,0,0,0,0,0,0,0
1629,1.562327,1.954058,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.339468,-0.224886,-0.056312,...,0,0,0,0,0,0,0,0,0,1
960,-0.180096,-0.244412,-0.401611,-0.251031,-0.903904,-0.053223,-0.445511,-0.404144,-0.227549,-0.056312,...,0,0,0,0,0,0,0,0,0,0
527,-0.615702,-0.684106,3.451229,0.528743,2.640076,0.364071,3.447271,4.036952,3.999301,0.369774,...,0,0,0,0,0,0,0,0,0,0


In [38]:
# save training data in tabular format to allow for remote run
if not os.path.isdir('data'):
    os.mkdir('data')
    
# Save the train data to a csv to be uploaded to the datastore
pd.DataFrame(train_data_df).to_csv("data/train_data.csv", index=False)

In [39]:
from azureml.core import Workspace, Dataset
from azureml.data.datapath import DataPath

# Upload the training data as a tabular dataset for access during training on remote compute
# upload to data store
datastore = ws.get_default_datastore()
datastore.upload(src_dir='./data', target_path='data', overwrite=True, show_progress=True)

datastore_path =[
    DataPath(datastore, 'data/train_data.csv')
]

# Upload the training data as a tabular dataset for access during training on remote compute
train_data = Dataset.Tabular.from_delimited_files(path=datastore_path)

Uploading an estimated of 1 files
Uploading ./data/train_data.csv
Uploaded ./data/train_data.csv, 1 files out of an estimated total of 1
Uploaded 1 files


In [40]:
train_data

{
  "source": [
    "('workspaceblobstore', 'data/train_data.csv')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes"
  ]
}

## Train
This creates a general AutoML settings object.
**Udacity notes:** These inputs must match what was used when training in the portal. `label_column_name` has to be `y` for example.

In [41]:
from azureml.train.automl import AutoMLConfig

automl_settings = {
    "max_concurrent_iterations": 9,
    "primary_metric" : 'accuracy'
}

project_folder = './capstone-project'

automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=train_data,
                             label_column_name="Type",   
                             path = project_folder,
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             n_cross_validations=5,
                             **automl_settings
                            )

In [42]:
# Submit your automl run
from azureml.widgets import RunDetails

automl_exp = Experiment(workspace=ws, name="capestone_AutoML")  
automl_run = automl_exp.submit(automl_config, show_output = True)
RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=True)

Running on remote.
No run_configuration provided, running on autoML-compute with default configuration
Running on remote compute: autoML-compute
Parent Run ID: AutoML_0a073912-a9c0-4a6f-b758-118e7461c200

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias tow

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+---------------------------------+--------------------------------------+
|Size of the smallest class       |Name/Label of the smallest class |Number of samples in the training data|
|161                              |1                                |1246                                  |
+---------------------------------+---------------------------------+--------------------------------------+

********************************************

{'runId': 'AutoML_0a073912-a9c0-4a6f-b758-118e7461c200',
 'target': 'autoML-compute',
 'status': 'Completed',
 'startTimeUtc': '2021-02-12T18:41:13.234829Z',
 'endTimeUtc': '2021-02-12T19:08:56.676529Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'autoML-compute',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"55d2877f-f95f-48e3-9770-c3de78a01bc7\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"data/train_data.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-138723\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"2c48c51c-bd47-40d4-abbe-fb8eabd19c8c\\\\\\", \\\\\

In [43]:
from azureml.widgets import RunDetails
RunDetails(automl_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [44]:
# Retrieve and save your best automl model.

best_autoML_run, best_autoML_fitted_model = automl_run.get_output()
print(best_autoML_run)
print(best_autoML_fitted_model)

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


Run(Experiment: capestone_AutoML,
Id: AutoML_0a073912-a9c0-4a6f-b758-118e7461c200_47,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('stackensembleclassifier',
                 StackE...
                                         meta_learner=LogisticRegressionCV(Cs=10,
                                                                           class_weight=None,
                                                                    

In [45]:
print(best_autoML_run)

Run(Experiment: capestone_AutoML,
Id: AutoML_0a073912-a9c0-4a6f-b758-118e7461c200_47,
Type: azureml.scriptrun,
Status: Completed)


In [46]:
get_best_autoML_metrics = best_autoML_run.get_metrics()
for run_metric in get_best_autoML_metrics:
    metric = get_best_autoML_metrics[run_metric]
    print(run_metric,metric)

AUC_weighted 0.9828216307918007
average_precision_score_weighted 0.9895791743311267
AUC_micro 0.9938351454202351
f1_score_weighted 0.9659351065326373
average_precision_score_micro 0.9939922267981036
recall_score_micro 0.9671004016064257
precision_score_micro 0.9671004016064257
AUC_macro 0.9828216307918005
accuracy 0.9671004016064257
average_precision_score_macro 0.9673101372009653
precision_score_weighted 0.9669463346140642
matthews_correlation 0.8480225614332744
f1_score_micro 0.9671004016064257
norm_macro_recall 0.7855987677571512
f1_score_macro 0.9215356754624979
precision_score_macro 0.958109509843573
recall_score_weighted 0.9671004016064257
balanced_accuracy 0.8927993838785756
log_loss 0.10011818946701082
recall_score_macro 0.8927993838785756
weighted_accuracy 0.9882431268579092
accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_0a073912-a9c0-4a6f-b758-118e7461c200_47/accuracy_table
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_0a073912-a9c0-4a6f-b758-118e7461

In [47]:
best_autoML_run.get_file_names()

['accuracy_table',
 'automl_driver.py',
 'azureml-logs/55_azureml-execution-tvmps_be4149b8697fca82c0b1c79d46246c164bbd85832b4223a749c7f0ee21216305_d.txt',
 'azureml-logs/65_job_prep-tvmps_be4149b8697fca82c0b1c79d46246c164bbd85832b4223a749c7f0ee21216305_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_be4149b8697fca82c0b1c79d46246c164bbd85832b4223a749c7f0ee21216305_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'confusion_matrix',
 'logs/azureml/104_azureml.log',
 'logs/azureml/azureml_automl.log',
 'logs/azureml/job_prep_azureml.log',
 'logs/azureml/job_release_azureml.log',
 'outputs/conda_env_v_1_0_0.yml',
 'outputs/env_dependencies.json',
 'outputs/internal_cross_validated_models.pkl',
 'outputs/model.pkl',
 'outputs/pipeline_graph.json',
 'outputs/scoring_file_v_1_0_0.py']

In [48]:
# Get your best run and save the model from that run.

model = best_autoML_run.register_model(model_name = 'best_autoML_model', model_path =  'outputs/model.pkl')

In [49]:
model

Model(workspace=Workspace.create(name='quick-starts-ws-138723', subscription_id='2c48c51c-bd47-40d4-abbe-fb8eabd19c8c', resource_group='aml-quickstarts-138723'), name=best_autoML_model, id=best_autoML_model:1, version=1, tags={}, properties={})

In [50]:
best_autoML_run.download_file("outputs/model.pkl","outputs/best_model_autoML.pkl")

In [51]:
best_autoML_run

Experiment,Id,Type,Status,Details Page,Docs Page
capestone_AutoML,AutoML_0a073912-a9c0-4a6f-b758-118e7461c200_47,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [52]:
from pprint import pprint

def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0]+ ' - ')
        elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):
            print("\nMeta Learner")
            pprint(step[1]._meta_learner)
            print()
            for estimator in step[1]._base_learners:
                print_model(estimator[1], estimator[0]+ ' - ')
        else:
            pprint(step[1].get_params())
            print()
            
print_model(best_autoML_fitted_model)

datatransformer
{'enable_dnn': None,
 'enable_feature_sweeping': None,
 'feature_sweeping_config': None,
 'feature_sweeping_timeout': None,
 'featurization_config': None,
 'force_text_dnn': None,
 'is_cross_validation': None,
 'is_onnx_compatible': None,
 'logger': None,
 'observer': None,
 'task': None,
 'working_dir': None}

stackensembleclassifier

Meta Learner
LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,
                     fit_intercept=True, intercept_scaling=1.0, l1_ratios=None,
                     max_iter=100, multi_class='auto', n_jobs=None,
                     penalty='l2', random_state=None, refit=True,
                     scoring=<azureml.automl.runtime.stack_ensemble_base.Scorer object at 0x7f89cc61d6a0>,
                     solver='lbfgs', tol=0.0001, verbose=0)

1 - maxabsscaler
{'copy': True}

1 - xgboostclassifier
{'base_score': 0.5,
 'booster': 'gbtree',
 'colsample_bylevel': 1,
 'colsample_bynode': 1,
 'colsample_bytree': 1,
 'gamma': 0,


## Model Deployment

Create an inference config and deploy the model as a web service.

In [53]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import Model
from azureml.core.environment import Environment

In [54]:
# Download scoring file 
best_autoML_run.download_file('outputs/scoring_file_v_1_0_0.py', 'outputs/score.py')
script_file_name = 'outputs/score.py'

In [55]:
# Download environment file
best_autoML_run.download_file('outputs/conda_env_v_1_0_0.yml', 'outputs/envFile.yml')

In [56]:
inference_config = InferenceConfig(entry_script=script_file_name)

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                               tags = {'area': "bmData", 'type': "capstone_autoML_Classifier"}, 
                                               description = 'sample service for Capstone Project AutoML Classifier for Websites')

In [57]:
# deploy
aci_service_name = 'capstone-automl-sample'
print(aci_service_name)
aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)
print(aci_service.scoring_uri)
print(aci_service.swagger_uri)

capstone-automl-sample
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.....................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://5e99f72b-cbec-4b27-bfc7-15232cf97a11.southcentralus.azurecontainer.io/score
http://5e99f72b-cbec-4b27-bfc7-15232cf97a11.southcentralus.azurecontainer.io/swagger.json


In [98]:
import json
# import test data from test dataframe
# select first two rows of the test data set
test_data_df_copy = test_data_df
test_data = test_data_df[4:7]
test_data_df

Unnamed: 0,URL_LENGTH,NUMBER_SPECIAL_CHARACTERS,TCP_CONVERSATION_EXCHANGE,DIST_REMOTE_TCP_PORT,REMOTE_IPS,APP_BYTES,SOURCE_APP_PACKETS,REMOTE_APP_PACKETS,SOURCE_APP_BYTES,REMOTE_APP_BYTES,...,WS_west midlands,WS_wi,WS_widestep@mail.ru,WS_wisconsin,WS_worcs,WS_wv,WS_zh,WS_zhejiang,WS_zug,Type
0,,,,,,,,,,,...,,,,,,,,,,1.00
1,-1.49,-1.12,0.02,0.07,0.28,-0.03,-0.04,0.01,-0.21,-0.03,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
2,-1.49,-1.12,-0.40,-0.25,-0.90,-0.05,-0.45,-0.40,-0.23,-0.06,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
3,,,,,,,,,,,...,,,,,,,,,,0.00
4,-1.45,-1.12,1.01,-0.16,0.57,0.02,1.02,0.93,1.63,0.03,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1776,4.97,1.07,-0.40,-0.25,-0.90,-0.05,-0.45,-0.34,-0.22,-0.06,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
1777,5.12,1.29,-0.40,-0.25,-0.90,-0.05,-0.45,-0.36,-0.23,-0.06,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,
1778,,,,,,,,,,,...,,,,,,,,,,0.00
1779,,,,,,,,,,,...,,,,,,,,,,0.00


In [96]:
# remove label column
label_data = test_data.pop('Type')

# convert test input data to dictionary form
input_data = json.dumps({'data': test_data.to_dict(orient='records')})

# print test input data
print(input_data)

{"data": [{"URL_LENGTH": -1.4506126577981058, "NUMBER_SPECIAL_CHARACTERS": -1.1237994686265507, "TCP_CONVERSATION_EXCHANGE": 1.0061573213500983, "DIST_REMOTE_TCP_PORT": -0.15929249227770212, "REMOTE_IPS": 0.572754281271214, "APP_BYTES": 0.023122417015062534, "SOURCE_APP_PACKETS": 1.0202897732863336, "REMOTE_APP_PACKETS": 0.9324963862046076, "SOURCE_APP_BYTES": 1.6321975559769688, "REMOTE_APP_BYTES": 0.025525543421527186, "APP_PACKETS": 1.0202897732863336, "CH_iso-8859": 0.0, "CH_iso-8859-1": 0.0, "CH_none": 0.0, "CH_us-ascii": 0.0, "CH_utf-8": 1.0, "CH_windows-1251": 0.0, "CH_windows-1252": 0.0, "SV_.v01 apache": 0.0, "SV_294": 0.0, "SV_aeria games & entertainment": 0.0, "SV_akamaighost": 0.0, "SV_amazons3": 0.0, "SV_apache": 0.0, "SV_apache-coyote/1.1": 0.0, "SV_apache/1.3.27 (unix)  (red-hat/linux) mod_perl/1.26 php/4.3.3 frontpage/5.0.2 mod_ssl/2.8.12 openssl/0.9.6b": 0.0, "SV_apache/1.3.27 (unix) php/4.4.1": 0.0, "SV_apache/1.3.31 (unix) php/4.3.9 mod_perl/1.29 rus/pl30.20": 0.0, "

In [100]:
output = aci_service.run(input_data)
print(output)

{"result": [0, 0, 0]}


In [33]:
print(aci_service.get_logs())

2021-02-10T02:45:22,287857393+00:00 - gunicorn/run 
2021-02-10T02:45:22,288928400+00:00 - iot-server/run 
2021-02-10T02:45:22,289685605+00:00 - nginx/run 
2021-02-10T02:45:22,288244796+00:00 - rsyslog/run 
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

#### Create Pipeline and AutoMLStep

You can define outputs for the AutoMLStep using TrainingOutput.

In [8]:
from azureml.pipeline.core import PipelineData, TrainingOutput

ds = ws.get_default_datastore()
metrics_output_name = 'metrics_output'
best_model_output_name = 'best_model_output'

metrics_data = PipelineData(name='metrics_data',
                           datastore=ds,
                           pipeline_output_name=metrics_output_name,
                           training_output=TrainingOutput(type='Metrics'))
model_data = PipelineData(name='model_data',
                           datastore=ds,
                           pipeline_output_name=best_model_output_name,
                           training_output=TrainingOutput(type='Model'))

Create an AutoMLStep.

In [9]:
automl_step = AutoMLStep(
    name='automl_module',
    automl_config=automl_config,
    outputs=[metrics_data, model_data],
    allow_reuse=True)

In [10]:
from azureml.pipeline.core import Pipeline
pipeline = Pipeline(
    description="pipeline_with_automlstep",
    workspace=ws,    
    steps=[automl_step])

In [11]:
pipeline_run = experiment.submit(pipeline)

Created step automl_module [8d934bc8][9564fa22-904b-4d51-a869-d16b92ac3ee2], (This step will run and generate new outputs)
Submitted PipelineRun d0620ce6-fa31-4033-ac9e-a0ab708cc54c
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/ml-experiment-1/runs/d0620ce6-fa31-4033-ac9e-a0ab708cc54c?wsid=/subscriptions/6b4af8be-9931-443e-90f6-c4c34a1f9737/resourcegroups/aml-quickstarts-135297/workspaces/quick-starts-ws-135297


In [12]:
from azureml.widgets import RunDetails
RunDetails(pipeline_run).show()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

In [13]:
pipeline_run.wait_for_completion()

PipelineRunId: d0620ce6-fa31-4033-ac9e-a0ab708cc54c
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/ml-experiment-1/runs/d0620ce6-fa31-4033-ac9e-a0ab708cc54c?wsid=/subscriptions/6b4af8be-9931-443e-90f6-c4c34a1f9737/resourcegroups/aml-quickstarts-135297/workspaces/quick-starts-ws-135297
PipelineRun Status: Running


StepRunId: 8e092fb7-5f7d-4c10-878e-f3b71d69e8ee
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/ml-experiment-1/runs/8e092fb7-5f7d-4c10-878e-f3b71d69e8ee?wsid=/subscriptions/6b4af8be-9931-443e-90f6-c4c34a1f9737/resourcegroups/aml-quickstarts-135297/workspaces/quick-starts-ws-135297
StepRun( automl_module ) Status: Running

StepRun(automl_module) Execution Summary
StepRun( automl_module ) Status: Finished



PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': 'd0620ce6-fa31-4033-ac9e-a0ab708cc54c', 'status': 'Completed', 'startTimeUtc': '2021-01-19T06:18:41.287852Z', 'endTimeUtc': '2021-01-19T06:58:29.533118Z', 

'Finished'

## Examine Results

### Retrieve the metrics of all child runs
Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will examine the outputs by retrieve output data and running some tests.

In [14]:
metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)
num_file_downloaded = metrics_output.download('.', show_progress=True)

Downloading azureml/8e092fb7-5f7d-4c10-878e-f3b71d69e8ee/metrics_data
Downloaded azureml/8e092fb7-5f7d-4c10-878e-f3b71d69e8ee/metrics_data, 1 files out of an estimated total of 1


In [15]:
import json
with open(metrics_output._path_on_datastore) as f:
    metrics_output_result = f.read()
    
deserialized_metrics_output = json.loads(metrics_output_result)
df = pd.DataFrame(deserialized_metrics_output)
df

Unnamed: 0,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_43,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_40,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_30,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_45,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_60,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_0,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_33,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_2,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_41,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_59,...,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_36,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_37,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_39,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_56,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_4,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_12,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_42,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_18,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_31,8e092fb7-5f7d-4c10-878e-f3b71d69e8ee_54
recall_score_weighted,[0.9132018209408195],[0.9083459787556905],[0.8880121396054628],[0.910773899848255],[0.8880121396054628],[0.9116843702579667],[0.9119878603945372],[0.891350531107739],[0.8880121396054628],[0.9159332321699545],...,[0.9125948406676783],[0.9110773899848255],[0.9159332321699545],[0.9110773899848255],[0.8880121396054628],[0.74597875569044],[0.9040971168437025],[0.7089529590288316],[0.9128983308042489],[0.910773899848255]
precision_score_micro,[0.9132018209408195],[0.9083459787556905],[0.8880121396054628],[0.910773899848255],[0.8880121396054628],[0.9116843702579667],[0.9119878603945372],[0.891350531107739],[0.8880121396054628],[0.9159332321699545],...,[0.9125948406676783],[0.9110773899848255],[0.9159332321699545],[0.9110773899848255],[0.8880121396054628],[0.74597875569044],[0.9040971168437025],[0.7089529590288316],[0.9128983308042489],[0.910773899848255]
f1_score_micro,[0.9132018209408195],[0.9083459787556905],[0.8880121396054628],[0.9107738998482551],[0.8880121396054628],[0.9116843702579667],[0.9119878603945372],[0.891350531107739],[0.8880121396054628],[0.9159332321699545],...,[0.9125948406676783],[0.9110773899848255],[0.9159332321699545],[0.9110773899848255],[0.8880121396054628],[0.74597875569044],[0.9040971168437025],[0.7089529590288317],[0.9128983308042489],[0.9107738998482551]
average_precision_score_weighted,[0.9541411484948241],[0.9499599601915379],[0.948588659974036],[0.9501487309224028],[0.9556872458271025],[0.9531771295804466],[0.953625132213001],[0.9302839274463027],[0.9434384837698533],[0.9554343247872955],...,[0.9512709076945858],[0.952268137874364],[0.9523358020200623],[0.9503543255163533],[0.9201069392678627],[0.9238035248097693],[0.9481378382396821],[0.9201307201002716],[0.9526538456636413],[0.9527388952648794]
average_precision_score_micro,[0.9805583578526404],[0.9778699660657124],[0.9755997293287618],[0.978824384448086],[0.9781805163955288],[0.9806603102489483],[0.9804926577142863],[0.9651350920340517],[0.9732705157966541],[0.9813847692561776],...,[0.9797387760522972],[0.9800805516651045],[0.9805253710646454],[0.9780965027036886],[0.9645579445500554],[0.8268089210322341],[0.9767246022580298],[0.835636085228748],[0.9805134721211332],[0.9798758954137006]
recall_score_micro,[0.9132018209408195],[0.9083459787556905],[0.8880121396054628],[0.910773899848255],[0.8880121396054628],[0.9116843702579667],[0.9119878603945372],[0.891350531107739],[0.8880121396054628],[0.9159332321699545],...,[0.9125948406676783],[0.9110773899848255],[0.9159332321699545],[0.9110773899848255],[0.8880121396054628],[0.74597875569044],[0.9040971168437025],[0.7089529590288316],[0.9128983308042489],[0.910773899848255]
log_loss,[0.18012736943904312],[0.3339806816963599],[0.29313359336803707],[0.21321407629825834],[0.34087207391407254],[0.17775706110025447],[0.17977421208783076],[0.2529079217007151],[0.24275331984516882],[0.20389804103104314],...,[0.1879296926979376],[0.18183445704003637],[0.18011527626106247],[0.3210386383780024],[0.2744992358523854],[0.5938207513235457],[0.2184722094165244],[0.5805682106981083],[0.18194487595378997],[0.20906686358242352]
weighted_accuracy,[0.954939715235299],[0.9754280386730326],[0.9843450583187134],[0.951660433747695],[0.9843450583187134],[0.9514937218005303],[0.9544760260746684],[0.9833417973180532],[0.9843450583187134],[0.9562035067685627],...,[0.9504450461659876],[0.9540547622302472],[0.9553215430811842],[0.9540547622302472],[0.9843450583187134],[0.7463509916543999],[0.9780679510998219],[0.7032506645195702],[0.954309314127504],[0.9537183490182449]
balanced_accuracy,[0.7450888862955616],[0.638151179871334],[0.5],[0.7460900958975414],[0.5],[0.7513392683482543],[0.7408529638953258],[0.5208258080530224],[0.5],[0.7537316128458619],...,[0.7601408361998863],[0.7379720550452258],[0.7572840082467811],[0.7379720550452258],[0.5],[0.7444794543639217],[0.6061555403660667],[0.7319208034869139],[0.7461021363460387],[0.7378011732953966]
f1_score_weighted,[0.9096764096913943],[0.890304648521594],[0.8353395018439429],[0.9079096209492071],[0.8353395018439429],[0.9091539479147899],[0.9082853054497467],[0.8459253474510218],[0.8353395018439429],[0.9127003984466622],...,[0.9107366491128376],[0.9072716281032431],[0.9130582824505085],[0.9072716281032431],[0.8353395018439429],[0.789523643878405],[0.8803480078182236],[0.7613886578751758],[0.9095487882822817],[0.9070202751800881]


### Retrieve the Best Model

In [16]:
# Retrieve best model from Pipeline Run
best_model_output = pipeline_run.get_pipeline_output(best_model_output_name)
num_file_downloaded = best_model_output.download('.', show_progress=True)

Downloading azureml/8e092fb7-5f7d-4c10-878e-f3b71d69e8ee/model_data
Downloaded azureml/8e092fb7-5f7d-4c10-878e-f3b71d69e8ee/model_data, 1 files out of an estimated total of 1


In [17]:
import pickle

with open(best_model_output._path_on_datastore, "rb" ) as f:
    best_model = pickle.load(f)
best_model

PipelineWithYTransformations(Pipeline={'memory': None,
                                       'steps': [('datatransformer',
                                                  DataTransformer(enable_dnn=None,
                                                                  enable_feature_sweeping=None,
                                                                  feature_sweeping_config=None,
                                                                  feature_sweeping_timeout=None,
                                                                  featurization_config=None,
                                                                  force_text_dnn=None,
                                                                  is_cross_validation=None,
                                                                  is_onnx_compatible=None,
                                                                  logger=None,
                                                              

In [18]:
best_model.steps

[('datatransformer',
  DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                  feature_sweeping_config=None, feature_sweeping_timeout=None,
                  featurization_config=None, force_text_dnn=None,
                  is_cross_validation=None, is_onnx_compatible=None, logger=None,
                  observer=None, task=None, working_dir=None)),
 ('prefittedsoftvotingclassifier',
  PreFittedSoftVotingClassifier(classification_labels=None,
                                estimators=[('39',
                                             Pipeline(memory=None,
                                                      steps=[('standardscalerwrapper',
                                                              <azureml.automl.runtime.shared.model_wrappers.StandardScalerWrapper object at 0x7efcbaeea978>),
                                                             ('xgboostclassifier',
                                                              XGBoostClassifier(ba

### Test the Model
#### Load Test Data
For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step.

In [19]:
dataset_test = Dataset.Tabular.from_delimited_files(path='https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv')
df_test = dataset_test.to_pandas_dataframe()
df_test = df_test[pd.notnull(df_test['y'])]

y_test = df_test['y']
X_test = df_test.drop(['y'], axis=1)

#### Testing Our Best Fitted Model

We will use confusion matrix to see how our model works.

In [20]:
from sklearn.metrics import confusion_matrix
ypred = best_model.predict(X_test)
cm = confusion_matrix(y_test, ypred)

In [21]:
# Visualize the confusion matrix
pd.DataFrame(cm).style.background_gradient(cmap='Blues', low=0, high=0.9)

Unnamed: 0,0,1
0,28882,376
1,934,2758


## Publish and run from REST endpoint

Run the following code to publish the pipeline to your workspace. In your workspace in the portal, you can see metadata for the pipeline including run history and durations. You can also run the pipeline manually from the portal.

Additionally, publishing the pipeline enables a REST endpoint to rerun the pipeline from any HTTP library on any platform.


In [22]:
published_pipeline = pipeline_run.publish_pipeline(
    name="Bankmarketing Train", description="Training bankmarketing pipeline", version="1.0")

published_pipeline


Name,Id,Status,Endpoint
Bankmarketing Train,af201a79-dd49-4173-a2ee-192d8b009915,Active,REST Endpoint


Authenticate once again, to retrieve the `auth_header` so that the endpoint can be used

In [23]:
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()



Get the REST url from the endpoint property of the published pipeline object. You can also find the REST url in your workspace in the portal. Build an HTTP POST request to the endpoint, specifying your authentication header. Additionally, add a JSON payload object with the experiment name and the batch size parameter. As a reminder, the process_count_per_node is passed through to ParallelRunStep because you defined it is defined as a PipelineParameter object in the step configuration.

Make the request to trigger the run. Access the Id key from the response dict to get the value of the run id.


In [24]:
import requests

rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": "pipeline-rest-endpoint"}
                        )

In [25]:
try:
    response.raise_for_status()
except Exception:    
    raise Exception("Received bad response from the endpoint: {}\n"
                    "Response Code: {}\n"
                    "Headers: {}\n"
                    "Content: {}".format(rest_endpoint, response.status_code, response.headers, response.content))

run_id = response.json().get('Id')
print('Submitted pipeline run: ', run_id)

Submitted pipeline run:  31d430de-b326-494d-ab54-62d3c29045ce


Use the run id to monitor the status of the new run. This will take another 10-15 min to run and will look similar to the previous pipeline run, so if you don't need to see another pipeline run, you can skip watching the full output.

In [26]:
from azureml.pipeline.core.run import PipelineRun
from azureml.widgets import RunDetails

published_pipeline_run = PipelineRun(ws.experiments["pipeline-rest-endpoint"], run_id)
RunDetails(published_pipeline_run).show()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …