Author: Kevin ALBERT  

Created: April 2020  

# Automated Machine Learning
_**Classification project with data residing on a data lake gen2 using remote compute with autoML and model registration**_

## Contents
1. [AutoML](#AutoML)
1. [Setup](#Setup)
1. [Train](#Train)
1. [Results](#Results)
1. [Register](#Register)
1. [Deploy](#Deploy)
1. [Test](#Test)
1. [CustomML](#CustomML)
1. [Finetuning](#Finetuning)
1. [ONNX](#ONNX)

## Introduction

Cleaned datasets created in datafactory onto a delta lake Gen2.  
This notebook is using delta lake data and remote compute to autoML train a classification model.  
We use example data to detect diabetic or non-diabetic based on 8 features.  

This notebook show how to:
1. Setup packages
1. Setup workspace
1. Create an experiment
1. Load data
1. Setup compute
1. Configure autoML
1. Train pipelines
1. Explore the best pipeline
1. Inspect model properties
1. Register the model
1. Deploy model as webservice
1. Webservice inference test
1. customML inline method
1. customML script method
1. HyperParametertuning
1. ONNX

## Setup

* required
  * disable shield on Brave webbrowser for the widgets to work
  * download **config.json** from the machine learning workspace portal
  * install extra azureml packages on **py37_default** when using **'local'** compute  
  * split the data up in train and test dataset on data lake, validation dataset is not needed due to cross_validation
* optional
  * register datastore(s) manually
  * register dataset(s) manually
  * register compute cluster(s) manually

In [None]:
! /anaconda/envs/py37_default/bin/python -m pip install -U azureml-sdk[explain,automl] azureml-widgets

### Import open-source packages

In [124]:
import logging
import os
import pandas as pd
import numpy as np
import json
import requests
import joblib

### Import azure machine learning SDK packages

In [103]:
from azureml.core import Workspace, Dataset, Datastore, Run
from azureml.core.experiment import Experiment
from azureml.data.datapath import DataPath
from azureml.core.compute import ComputeTarget, AmlCompute, AksCompute
from azureml.core.model import Model, InferenceConfig
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.widgets import RunDetails
from azureml.core.webservice import Webservice, AciWebservice, AksWebservice
from azureml.exceptions import WebserviceException
from azureml.core.environment import Environment
from azureml.train.estimator import Estimator
from azureml.core.conda_dependencies import CondaDependencies
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.sampling import RandomParameterSampling, GridParameterSampling
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.parameter_expressions import choice
import azureml.core
print("azureml.core version:", azureml.core.__version__)

azureml.core version: 1.2.0


### Workspace

In [3]:
# load the workspace
ws = Workspace.from_config()

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


### Experiment

In [4]:
# choose an experiment name
experiment = Experiment(ws, 'automl-classification')

### Data

Data Factory has prepped data from /bronze to /silver to /gold and /platinum for model training  
**note:** this demonstration had files in the Data Lake Gen2 datalake container /platinum folder  
  * /datalake/platinum/diabetes.csv
  * /datalake/platinum/diabetes.parquet
  * copy from ../data/platinum/*

Register the datastore 'data lake gen2' as a **blob container**  
**optional:** manually register in ML workspace

In [5]:
ds = Datastore.register_azure_blob_container(
    workspace=ws,
    datastore_name="datalakestoragegen2",
    container_name="datalake",
    account_name="datalake21032020",
    account_key="Ck/4hMq3Zrzq5toZ96zE6cDncjbw2VdkR9ny1xXA3GLBwQXIv7V1ycSc/KpqyNRcoPWKtzKljjpcZVqjWOu+3Q==",
    create_if_not_exists=False)
# list available datastores
ws.datastores

{'datalakestoragegen2': {
   "name": "datalakestoragegen2",
   "container_name": "datalake",
   "account_name": "datalake21032020",
   "protocol": "https",
   "endpoint": "core.windows.net"
 },
 'workspaceblobstore': {
   "name": "workspaceblobstore",
   "container_name": "azureml-blobstore-de4d27fa-16a7-42c6-acc9-62fb5ff16179",
   "account_name": "mlworkspace5331841681",
   "protocol": "https",
   "endpoint": "core.windows.net"
 },
 'workspacefilestore': {
   "name": "workspacefilestore",
   "container_name": "azureml-filestore-de4d27fa-16a7-42c6-acc9-62fb5ff16179",
   "account_name": "mlworkspace5331841681",
   "protocol": "https",
   "endpoint": "core.windows.net"
 }}

Register file(s) into a tabular dataset  
**Note:** do not import Delta lake parquet file(s)  
**Fix:** you can import pandas single gold/*.csv or gold/*.parquet file(s)  

In [6]:
# load datastore
ds = Datastore.get(ws, 'datalakestoragegen2')
# show datastore settings
ds

{
  "name": "datalakestoragegen2",
  "container_name": "datalake",
  "account_name": "datalake21032020",
  "protocol": "https",
  "endpoint": "core.windows.net"
}

**Option 1 Tabular:** loading *.parquet

In [7]:
# setup parquet file(s) into a tabular dataset
ds_path = [DataPath(ds, 'platinum/diabetes.parquet')] # {path/*.parquet}
dataset = Dataset.Tabular.from_parquet_files(path=ds_path)
# show dataset settings
dataset

{
  "source": [
    "('datalakestoragegen2', 'platinum/diabetes.parquet')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ReadParquetFile",
    "DropColumns"
  ]
}

**Option 2 Tabular:** loading *.csv

In [None]:
# setup csv file(s) into a tabular dataset
ds_path = [DataPath(ds, 'platinum/diabetes.csv')]
dataset = Dataset.Tabular.from_delimited_files(path=ds_path)
# show dataset settings
dataset

**Option 3 Registered:** loading a registered dataset (manually register in ML workspace)

In [8]:
# list available datasets
ws.datasets

{}

In [None]:
# load a registered dataset
dataset = Dataset.get_by_name(ws, 'diabetes_parquet_from_datastore_datalakegen2')
# show dataset settings
dataset

### Compute

Check possible compute type **names** to create auto-scaling cluster

In [60]:
# example: list all with 1=vCPUs 2>GB and no-GPU
vm_df = pd.DataFrame(AmlCompute.supported_vmsizes(ws))
vm_df[(vm_df.vCPUs == 1) & (vm_df.memoryGB >= 2) & (vm_df.gpus == 0)]

Unnamed: 0,gpus,maxResourceVolumeMB,memoryGB,name,vCPUs
0,0,51200,3.5,Standard_D1_v2,1
8,0,7168,3.5,Standard_DS1_v2,1
18,0,51200,3.5,Standard_D1,1


option 1: Create training cluster  

In [10]:
# Specify a name for the compute (unique within the workspace)
compute_name = 'aml-cluster'
# Define compute configuration
compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D1_v2',
                                                       min_nodes=0, # you are not paying if not using
                                                       max_nodes=10, # depending quota limits
                                                       vm_priority='dedicated', # {lowpriority, dedicated}
                                                       admin_username='ubuntu',
                                                       admin_user_password='ABCD1234abcd',
                                                       idle_seconds_before_scaledown=120, # {default: 120}
                                                      )
# Create the compute
training_cluster = ComputeTarget.create(ws, compute_name, compute_config)
training_cluster.wait_for_completion(show_output=True)

Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


option 2: Load already known training cluster

In [11]:
# list all available training cluster(s):
for cluster in ws.compute_targets:
    print(cluster)

aml-cluster


In [12]:
# load the training cluster
compute_name = 'aml-cluster'
training_cluster = ComputeTarget(ws, name=compute_name)

## Train

### Configure autoML
Define settings to run the experiment.

|Property|Description|Options|
|-|-|-|
|**task**||<i>classification</i><br><i>regression</i><br><i>forecasting</i>|
|**compute_target**|execution on local DSVM serialized<br>execution on remote AML or AKS parallel|<i>local</i><br><i>training_cluster</i>|
|**primary_metric**|the metric you want to optimize<br>[metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml)|**classification:**<br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i><br><br>**regression:**<br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|
|**training_data**|input dataset, containing both X_train and y_train|<i>DataFrame</i><br><i>Dataset</i><br><i>DatasetDefinition</i><br><i>TabularDataset</i>|
|**validation_data**|input dataset, covered with cross validation|N/A|
|**label_column_name**|the name of the 'target' or 'label' column||
|**enable_early_stopping**|stop the run if metric score is not improving|<i>True</i><br><i>False</i>|
|**n_cross_validations**|number of cross validation splits|5|
|**experiment_timeout_hours**|max time in hours the experiment terminates (+15min)|<i>0.25</i>|
|**max_concurrent_iterations**|less or equal to the number of cores per node|2|



**_You can find more information_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train)

In [13]:
automl_settings = {
    "enable_early_stopping":True,
    "experiment_timeout_hours":0.25,
    "iterations":10, # number of runs
    "iteration_timeout_minutes":5,
    "max_concurrent_iterations":1,
    "max_cores_per_iteration":-1,
    #"experiment_exit_score":0.9920,
    "model_explainability":True,
    "n_cross_validations":5,
    "primary_metric":'AUC_weighted',
    "featurization":'auto',
    "verbosity":logging.INFO, # {INFO, DEBUG, CRITICAL, ERROR, WARNING} -- debug_log=<*.log>
}

automl_config = AutoMLConfig(task='classification',
                             debug_log='automl_errors.log',
                             compute_target='local', # {training_cluster or 'local'}
                             #blacklist_models=['KNN','LinearSVM'],
                             enable_onnx_compatible_models=True,
                             training_data=dataset,
                             label_column_name="Diabetic",
                             **automl_settings
                            )
# ouputs "model.pkl" and "automl_errors.log"

### Train pipelines

In [14]:
automl_run = experiment.submit(automl_config, show_output=True)

Running on local machine
Parent Run ID: AutoML_baca9fc8-a785-4e52-aa2d-dec9d390b819

Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetFeaturizationCompleted. Completed fit featurizers and featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Classes are balanced in the training data.

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.

******************************************************************************************

### Optional: retrieve a run

In [15]:
runId = 'AutoML_baca9fc8-a785-4e52-aa2d-dec9d390b819'
automl_run = AutoMLRun(experiment, run_id=runId)

## Results

### Explore the best pipeline

In [16]:
RunDetails(automl_run).show()
automl_run.wait_for_completion() # get more parameter info

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

{'runId': 'AutoML_baca9fc8-a785-4e52-aa2d-dec9d390b819',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2020-04-13T09:33:33.634736Z',
 'endTimeUtc': '2020-04-13T09:37:01.647311Z',
 'properties': {'num_iterations': '10',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'local',
  'DataPrepJsonString': None,
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.2.0", "azureml-train": "1.2.0", "azureml-train-restclients-hyperdrive": "1.2.0", "azureml-train-core": "1.2.0", "azureml-train-automl": "1.2.0", "azureml-train-automl-runtime": "1.2.0", "azureml-train-automl-client": "1.2.0", "azureml-tensorboard": "1.0.85", "azureml-telemetry": "1.2.0", "azureml-sdk": "1.2.0", "azureml-pipeline": "1.2.0", "az

**option 1:** select any pipeline iteration 

In [17]:
best_run, fitted_model = automl_run.get_output(iteration=2)

**option 2:** select best pipeline iteration automatically

In [18]:
best_run, fitted_model = automl_run.get_output()

### inspect model properties

In [19]:
# pipeline steps
for step in fitted_model.named_steps:
    print(step)

datatransformer
MaxAbsScaler
LightGBMClassifier


In [20]:
# model properties
fitted_model.named_steps

{'datatransformer': DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
         feature_sweeping_config=None, feature_sweeping_timeout=None,
         featurization_config=None, force_text_dnn=None,
         is_cross_validation=None, is_onnx_compatible=None, logger=None,
         observer=None, task=None, working_dir=None),
 'MaxAbsScaler': MaxAbsScaler(copy=True),
 'LightGBMClassifier': LightGBMClassifier(boosting_type='gbdt', class_weight=None,
           colsample_bytree=1.0, importance_type='split', learning_rate=0.1,
           max_depth=-1, min_child_samples=20, min_child_weight=0.001,
           min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31,
           objective=None, random_state=None, reg_alpha=0.0, reg_lambda=0.0,
           silent=True, subsample=1.0, subsample_for_bin=200000,
           subsample_freq=0, verbose=-10)}

In [21]:
# show all metrics
best_run.get_metrics()

{'average_precision_score_micro': 0.9916787674263302,
 'balanced_accuracy': 0.9440819974618796,
 'accuracy_table': 'aml://artifactId/ExperimentRun/dcid.AutoML_baca9fc8-a785-4e52-aa2d-dec9d390b819_0/accuracy_table',
 'precision_score_macro': 0.9468968210340906,
 'average_precision_score_macro': 0.9885533270268901,
 'precision_score_micro': 0.9516,
 'f1_score_macro': 0.9454409908953988,
 'f1_score_weighted': 0.951525872795924,
 'AUC_macro': 0.9904426831775627,
 'weighted_accuracy': 0.9576144671870581,
 'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_baca9fc8-a785-4e52-aa2d-dec9d390b819_0/confusion_matrix',
 'log_loss': 0.11810503522790523,
 'recall_score_micro': 0.9516,
 'f1_score_micro': 0.9516,
 'recall_score_weighted': 0.9516,
 'norm_macro_recall': 0.8881639949237587,
 'recall_score_macro': 0.9440819974618796,
 'average_precision_score_weighted': 0.9907832843118888,
 'AUC_weighted': 0.9904426831775627,
 'accuracy': 0.9516,
 'matthews_correlation': 0.8909693671988277,
 

## Register

### Prepare

autoML generated a scoring script, environment file and model

In [22]:
# get the score and environment files
model_name = best_run.properties['model_name'] # score.py script will look for the name of the registered model

# make a local copy of the best scoring script, environment file and the model file
script_file_name = 'inference/score.py'
conda_env_file_name = 'inference/env.yml'
model_pickle_file_name = 'inference/model.pkl'
model_onnx_file_name = 'inference/model.onnx'
best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file_name)
best_run.download_file('outputs/conda_env_v_1_0_0.yml', conda_env_file_name)
best_run.download_file('outputs/model.pkl', model_pickle_file_name)
best_run.download_file('outputs/model.onnx', model_onnx_file_name)

In [23]:
! cat inference/env.yml

# Conda environment specification. The dependencies defined in this file will
# be automatically provisioned for runs with userManagedDependencies=False.

# Details about the Conda environment file format:
# https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  - azureml-train-automl-runtime==1.2.0
  - inference-schema
  - azureml-explain-model==1.2.0
  - azureml-defaults==1.2.0
- numpy>=1.16.0,<=1.16.2
- pandas>=0.21.0,<=0.23.4
- scikit-learn>=0.19.0,<=0.20.3
- py-xgboost<=0.90
- fbprophet==0.5
- psutil>=5.2.2,<6.0.0
channels:
- anaconda
- conda-forge


In [24]:
! cat inference/score.py

# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
import json
import pickle
import numpy as np
import pandas as pd
import azureml.train.automl
from sklearn.externals import joblib
from azureml.core.model import Model

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType


input_sample = pd.DataFrame({'PatientID': pd.Series(['1354778'], dtype='int64'), 'Pregnancies': pd.Series(['0'], dtype='int64'), 'PlasmaGlucose': pd.Series(['171'], dtype='int64'), 'DiastolicBloodPressure': pd.Series(['80'], dtype='int64'), 'TricepsThickness': pd.Series(['34'], dtype='int64'), 'SerumInsulin': pd.Series(['23'], dtype='int64'), 'BMI': pd.Series(['43.50972593'], dtype='f

### Register the model

**Option 1:** from workspace /outputs folder with .register_model()

In [25]:
model = best_run.register_model(model_name=model_name, # registered model name used in scoring script init()
                                model_framework=Model.Framework.SCIKITLEARN, # {TensorFlow, ScikitLearn, Onnx, Custom}
                                model_framework_version='0.22.2',
                                model_path='outputs/model.pkl', # fixed path in workspace {'model.pkl', 'model.onnx'}
                                tags={'Training context': 'autoML Training'},
                                properties={'AUC': best_run.get_metrics()['AUC_weighted'],
                                            'Accuracy': best_run.get_metrics()['accuracy']},
                                description="Classification model to predict diabetes")

**Option 2:** from local /path/model folder with Model.register()

In [26]:
model = Model.register(workspace=ws,
                       model_name=model_name, # registered model name used in scoring script init()
                       model_framework=Model.Framework.SCIKITLEARN, # {TensorFlow, ScikitLearn, Onnx, Custom}
                       model_framework_version='0.22.2',
                       model_path='inference/model.pkl', # local file {'model.pkl', 'model.onnx'}
                       tags={'Training context': 'autoML Training'},
                       properties={'AUC': best_run.get_metrics()['AUC_weighted'],
                                   'Accuracy': best_run.get_metrics()['accuracy']},
                       description="Classification model to predict diabetes")

Registering model AutoMLbaca9fc8a0


**Optional:** Load the model

In [27]:
# list all registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

AutoMLbaca9fc8a0 version: 2
	 Training context : autoML Training
	 AUC : 0.9904426831775627
	 Accuracy : 0.9516


AutoMLbaca9fc8a0 version: 1
	 Training context : autoML Training
	 AUC : 0.9904426831775627
	 Accuracy : 0.9516




In [28]:
# load the registered model for deployment (latest version)
model = ws.models[model_name] # or replace with any registered modelname from Model.list(ws)
model

Model(workspace=Workspace.create(name='ml_workspace', subscription_id='43c1f93a-903d-4b23-a4bf-92bd7a150627', resource_group='myResourceGroup'), name=AutoMLbaca9fc8a0, id=AutoMLbaca9fc8a0:2, version=2, tags={'Training context': 'autoML Training'}, properties={'AUC': '0.9904426831775627', 'Accuracy': '0.9516'})

## Deploy

### Deploy model as webservice (ACI)

Linux Azure Container Instance with 1 vCPU and 1GB of RAM cost €28 per month

In [29]:
# Configure the scoring environment
service_name = "automl-projname-service" # only lowercase letters, numbers, or dashes

# Remove any existing service under the same name
try:
    Webservice(ws, service_name).delete()
except WebserviceException:
    print('"' + service_name + '" does not exist, creating the webservice...')

myenv = Environment.from_conda_specification(name="myenv", file_path=conda_env_file_name)
inference_config = InferenceConfig(entry_script=script_file_name, environment=myenv)

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1,
                                                       memory_gb=1)

# build container from environment, start webservice ACI and deploy inference scrips 
service = Model.deploy(ws, service_name, [model], inference_config, deployment_config)
service.wait_for_deployment(show_output=True)

"automl-projname-service" does not exist, creating the webservice...
Running.........................................................................................................................................................................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"


**Optional:** load a running webservice

In [30]:
# list available webservices
for i in ws.webservices:
    print(i)

automl-projname-service


In [31]:
service_name = "automl-projname-service" # only lowercase letters, numbers, or dashes
service = Webservice(ws, service_name)

In [32]:
# get webservice logs
print(service.get_logs())

2020-04-13T10:00:50,785891561+00:00 - gunicorn/run 
2020-04-13T10:00:50,786579158+00:00 - rsyslog/run 
2020-04-13T10:00:50,786049161+00:00 - iot-server/run 
2020-04-13T10:00:50,818970215+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_29526d93bcbca0513e9c1ca0d57832a0/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_29526d93bcbca0513e9c1ca0d57832a0/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_29526d93bcbca0513e9c1ca0d57832a0/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_29526d93bcbca0513e9c1ca0d57832a0/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_29526d93bcbca0513e9c1ca0d57832a0/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

## Test

### Webservice inference test

Send a HTTP triggered webrequest with testdata to the model for a prediction value.  
In this example we test a person is diabetic (1) or not-diabetic (0).  
The testdata must be a list of 9 features to predict a binary classification.  
We demonstrate the use of **service** or **requests** method to send a prediction request.  
Know that 'Postman' application or 'Rest Client' plugin in VSCode work as well.  

|Web API|Example value|Options|
|-|-|-|
|**HTTP method**|POST|<i>POST</i><br><i>GET</i>|
|**URI**|http://3bb0618b-ef7b-4b17-af32-a52f9c64f4d5.northeurope.azurecontainer.io/score||
|**Header**|{Content-Type: Application/json}||
|**Body**|{"data": [[5, 2, 180, 74, 24, 21, 24, 1.5, 22], <br>[6, 0, 148, 58, 11, 179, 39, 0.16, 45]]}|<i>one or </i><br><i>more records</i>|
|**Response**|{"result": [1, 0]}|<i>json object</i>|

In [33]:
# get webservice URI
endpoint = service.scoring_uri

# raw test data
rawdata = [[5, 2, 180, 74, 24, 21, 24, 1.5, 22],
           [6, 0, 148, 58, 11, 179, 39, 0.16, 45]]

print("URI: " + endpoint)
print("Body: " + json.dumps({"data": rawdata})) # convert array to a serialized JSON formatted string object

URI: http://0b9d52d8-283f-45e6-8bf6-d27217a1dcb4.northeurope.azurecontainer.io/score
Body: {"data": [[5, 2, 180, 74, 24, 21, 24, 1.5, 22], [6, 0, 148, 58, 11, 179, 39, 0.16, 45]]}


**Test 1:** service.run()

In [34]:
service.run(json.dumps({"data": rawdata}))

'{"result": [1, 0]}'

**Test 2:** requests.post()

In [35]:
response = requests.post(endpoint, json={"data": rawdata})
response.json()

'{"result": [1, 0]}'

When you are finished testing your service, clean up the deployment with service.delete()

In [36]:
service.delete()

# CustomML

Inspired from autoML results is an alternative customML development.  
Using inline method to test and develop, train local or with remote compute and deploy and test the model.  

1. option1: inline method
1. option2: script method
  * create training script
  * create training environment
  * creating and register dataset (File)
  * train model
1. create an inference script
1. create an inference environment
1. register the model
1. deploy the model
1. inference test

In [37]:
ws = Workspace.from_config()

### Option 1: Inline method

|log metric function|Description|Example|
|-|-|-|
|**log**|<i>Record a single named value</i>|run.log("accuracy", 0.95)|
|**log_list**|<i>Record a named list of values</i>|run.log_list("accuracies", [0.6, 0.7, 0.87])|
|**log_row**|<i>Record a row with multiple columns</i>|run.log_row("Y over X", x=1, y=0.4)|
|**log_table**|<i>Record a dictionary as a table</i>|run.log_table("Y over X", {"x":[1, 2, 3], "y":[0.6, 0.7, 0.89]})|
|**log_image**|<i>Record an image file or a plot</i>|run.log_image("ROC", plot=plt)|
|**upload_file**|<i>Upload any file to "./outputs"</i>|run.upload_file("best_model.pkl", "./model.pkl")|

https://aka.ms/AA70zf6

In [38]:
from azureml.core import Experiment
from azureml.core import Model
from azureml.core import Datastore
from azureml.core import Dataset
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace=ws, name="diabetes-training")
run = experiment.start_logging()
print("Starting experiment:", experiment.name)

# load the diabetes dataset (File method)
print("Loading data lake gen2 data in a pandas dataframe...")
ds = Datastore.get(ws, 'datalakestoragegen2')
ds_path = [DataPath(ds, 'platinum/diabetes.parquet')] # {path/*.parquet or path/**}
dataset = Dataset.File.from_files(path=ds_path)
mount_context = dataset.mount(mount_point='/tmp/platinum') # read-only mount from delta lake
mount_context.start()
diabetes = pd.read_parquet('/tmp/platinum/diabetes.parquet') # {'/tmp/path/'} can load latest delta lake parquet files
mount_context.stop()

# load the diabetes dataset (Tabular method)
# print("Loading data lake gen2 data in a pandas dataframe...")
# ds = Datastore.get(ws, 'datalakestoragegen2')
# ds_path = [DataPath(ds, 'platinum/diabetes.parquet')] # {path/*.parquet or path/**}
# dataset = Dataset.Tabular.from_parquet_files(path=ds_path) # {delimited, json, parquet, sql}
# diabetes = dataset.to_pandas_dataframe() # create a pandas dataframe

# Separate features and labels as numpy array
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a decision tree model
print('Training a decision tree model')
model = DecisionTreeClassifier().fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# Save the trained model
model_file = 'diabetes_model.pkl'
joblib.dump(value=model, filename=model_file) # backup model local
run.upload_file(name='outputs/' + model_file,
                path_or_stream='./' + model_file) # save model to workspace

# Complete the run
run.complete()

Starting experiment: diabetes-training
Loading data lake gen2 data in a pandas dataframe...
Training a decision tree model
Accuracy: 0.8873333333333333
AUC: 0.8741181153291208


### Option 2: Script method

Create training script

In [14]:
# Create a local folder for the experiment files
folder_name = 'diabetes_service'
experiment_folder = './' + folder_name
os.makedirs(folder_name, exist_ok=True)
print(folder_name, 'folder created')

diabetes_service folder created


In [40]:
%%writefile $experiment_folder/diabetes_training.py
# Import libraries
import argparse
from azureml.core import Workspace, Dataset, Experiment, Run
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import glob
print("libraries imported...")

# Set regularization hyperparameter (passed as an argument to the script)
parser = argparse.ArgumentParser()
parser.add_argument('--regularization', type=float, dest='reg_rate', default=0.01, help='regularization rate')
args = parser.parse_args()
reg = args.reg_rate
print("argparse parameters loaded...")

# Get the experiment run context
run = Run.get_context()
print("run context loaded...")

# load the diabetes dataset (File method)
# Get the training data from the estimator input identified as 'diabetes'
mount = run.input_datasets['diabetes'] # read-only mount from delta lake as '/mnt/data'
print("delta lake mounted...")
diabetes = pd.read_parquet('/mnt/data/diabetes.parquet') # load any file(s) from this delta lake mounted folder
print("dataset loaded...")

# save data into workspace
diabetes.to_csv("outputs/dataset.csv", index=False) # {logs/  outputs/}
print("test: write dataset to workspace 'outputs/dataset.csv'")

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')

run.complete()

Writing ./diabetes_service/diabetes_training.py


Create training environment

In [41]:
myenv = Environment("training_environment")
myenv.docker.enabled = True
myenv.python.user_managed_dependencies = False
conda_packages = ['scikit-learn', 'joblib', 'python==3.6.2']
pip_packages = ['azureml-defaults', 'azureml-dataprep[pandas,fuse]', 'pyarrow', 'fastparquet']
myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=conda_packages, pip_packages=pip_packages)
myenv.register(ws)

{
    "name": "training_environment",
    "version": "1",
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "python": {
        "userManagedDependencies": false,
        "interpreterPath": "python",
        "condaDependenciesFile": null,
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-forge"
            ],
            "dependencies": [
                "python=3.6.2",
                {
                    "pip": [
                        "azureml-defaults",
                        "azureml-dataprep[pandas,fuse]",
                        "pyarrow",
                        "fastparquet"
                    ]
                },
                "scikit-learn",
                "joblib"
            ],
            "name": "azureml_b9a1534962684a800c586e9fce04292e"
        }
    },
    "docker": {
        "enabled": true,
        "baseImage": "mcr.microsoft.com/azu

In [42]:
# list environments
env_names = Environment.list(workspace=ws)
for env_name in env_names:
    print('Name:',env_name)

Name: training_environment
Name: AzureML-PyTorch-1.3-GPU
Name: AzureML-TensorFlow-2.0-CPU
Name: AzureML-Tutorial
Name: AzureML-PyTorch-1.3-CPU
Name: AzureML-TensorFlow-2.0-GPU
Name: AzureML-Chainer-5.1.0-GPU
Name: AzureML-Minimal
Name: AzureML-PyTorch-1.2-CPU
Name: AzureML-TensorFlow-1.12-CPU
Name: AzureML-TensorFlow-1.13-CPU
Name: AzureML-PyTorch-1.1-CPU
Name: AzureML-TensorFlow-1.10-CPU
Name: AzureML-PyTorch-1.0-GPU
Name: AzureML-TensorFlow-1.12-GPU
Name: AzureML-TensorFlow-1.13-GPU
Name: AzureML-Chainer-5.1.0-CPU
Name: AzureML-PyTorch-1.0-CPU
Name: AzureML-Scikit-learn-0.20.3
Name: AzureML-PyTorch-1.2-GPU
Name: AzureML-PyTorch-1.1-GPU
Name: AzureML-TensorFlow-1.10-GPU
Name: AzureML-PySpark-MmlSpark-0.15
Name: AzureML-AutoML
Name: AzureML-PyTorch-1.4-GPU
Name: AzureML-PyTorch-1.4-CPU
Name: AzureML-VowpalWabbit-8.8.0
Name: AzureML-Hyperdrive-ForecastDNN
Name: AzureML-AutoML-GPU
Name: AzureML-AutoML-DNN-GPU
Name: AzureML-AutoML-DNN
Name: AzureML-Designer-R
Name: AzureML-Designer-Recomm

Creating and register dataset (File)

In [43]:
# load the diabetes dataset (File method)
ds = Datastore.get(ws, 'datalakestoragegen2')
ds_path = [DataPath(ds, 'platinum/**')] # {path/*.parquet or path/**}
file_ds = Dataset.File.from_files(path=ds_path)
   
# Register the file dataset
try:
    file_ds = file_ds.register(workspace=ws,
                               name='diabetes file dataset',
                               description='diabetes files',
                               tags = {'format':'parquet'},
                               create_new_version=True)
except Exception as ex:
    print(ex)
print('Dataset registered')

Dataset registered


In [44]:
# show a list of registered dataset(s)
print("Datasets:")
for dataset_name in list(ws.datasets.keys()):
    dataset = Dataset.get_by_name(ws, dataset_name)
    print("\t", dataset.name, '\t version', dataset.version)

Datasets:
	 diabetes file dataset 	 version 1


In [45]:
# list of the file path(s)
for file_path in file_ds.to_path():
    print(file_path)

/diabetes.csv
/diabetes.parquet
/folder/diabetes2.csv


Train model

In [46]:
# Set the script parameters
script_params = {
    '--regularization': 0.1
}

# get the registered dataset by name
file_ds = Dataset.get_by_name(ws, "diabetes file dataset")

# Get the docker environment
training_env = Environment.get(ws, 'training_environment')

# get the training compute cluster
training_cluster = ComputeTarget(ws, 'aml-cluster')

estimator = Estimator(source_directory=experiment_folder, # All the files in this directory are uploaded into the cluster nodes for execution
                      compute_target='local', # {'local', training_cluster}
                      entry_script='diabetes_training.py',
                      script_params=script_params,
                      environment_definition=training_env,
                      inputs=[file_ds.as_named_input('diabetes').as_mount(path_on_compute='/mnt/data')],
                     )

# Create an experiment
experiment_name = 'diabetes-training'
experiment = Experiment(workspace=ws, name=experiment_name)
# Run the experiment
run = experiment.submit(config=estimator)

# Show the run details while running
RunDetails(run).show()
run.wait_for_completion() # get more parameter info

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

{'runId': 'diabetes-training_1586772329_209a6e9f',
 'target': 'local',
 'status': 'Finalizing',
 'startTimeUtc': '2020-04-13T10:05:37.127525Z',
 'properties': {'_azureml.ComputeTargetType': 'local',
  'ContentSnapshotId': '248d42ac-0c5c-4ca1-94d2-40aa06848d22',
  'azureml.git.repository_uri': 'https://github.com/albert-kevin/azuremachinelearning.git',
  'mlflow.source.git.repoURL': 'https://github.com/albert-kevin/azuremachinelearning.git',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': 'f834abad85fa91c96a624045592b837ae8834960',
  'mlflow.source.git.commit': 'f834abad85fa91c96a624045592b837ae8834960',
  'azureml.git.dirty': 'True'},
 'inputDatasets': [{'dataset': {'id': 'd44da7dc-672f-401c-bde9-6c7abfb03e2c'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'diabetes', 'mechanism': 'Mount', 'pathOnCompute': '/mnt/data'}}],
 'runDefinition': {'script': 'diabetes_training.py',
  'useAbsolutePath': False,
  'arguments': ['--regul

### Create inference script

In [47]:
# Create a local folder for the experiment files
folder_name = 'diabetes_service'
experiment_folder = './' + folder_name
os.makedirs(folder_name, exist_ok=True)
print(folder_name, 'folder created')

diabetes_service folder created


In [48]:
%%writefile $folder_name/diabetes_score.py
import json
import joblib
import numpy as np
from azureml.core.model import Model

# Called when the service is loaded
def init():
    global model
    # Get the path to the deployed model file and load a registered model
    model_path = Model.get_model_path(model_name='diabetes_model')
    model = joblib.load(model_path)

# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    # Get the corresponding classname for each prediction (0 or 1)
    classnames = ['not-diabetic', 'diabetic']
    predicted_classes = []
    for prediction in predictions:
        predicted_classes.append(classnames[prediction])
    # Return the predictions as JSON
    return json.dumps(predicted_classes)

Writing diabetes_service/diabetes_score.py


### Create inference environment

In [49]:
# Add the dependencies for our model (AzureML defaults is already included)
myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")

# Save the environment config as a .yml file
env_file = folder_name + "/diabetes_env.yml"
with open(env_file, "w") as f:
    f.write(myenv.serialize_to_string())
print("Saved inference environment file in", env_file)

# Print the .yml file
with open(env_file,"r") as f:
    print(f.read())

Saved inference environment file in diabetes_service/diabetes_env.yml
# Conda environment specification. The dependencies defined in this file will
# be automatically provisioned for runs with userManagedDependencies=False.

# Details about the Conda environment file format:
# https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
    # Required packages for AzureML execution, history, and data preparation.
  - azureml-defaults

- scikit-learn
channels:
- anaconda
- conda-forge



### Register the model

In [50]:
# define model name
model_name = 'diabetes_model'

# register model from the workspace 
run.register_model(model_name=model_name, # registered model name used in scoring script init()
                   model_path='outputs/diabetes_model.pkl', # fixed path in workspace {'model.pkl', 'model.onnx'}
                   tags={'Training context': 'Custom Training'},
                   properties={'AUC': run.get_metrics()['AUC'],
                               'Accuracy': run.get_metrics()['Accuracy']},
                   description="Classification model to predict diabetes",
                   model_framework=Model.Framework.SCIKITLEARN, # {TensorFlow, ScikitLearn, Onnx, Custom}
                   model_framework_version='0.22.2')

print('Model trained and registered')

Model trained and registered


### Deploy the model

In [51]:
service_name = "diabetes-service"

# Remove any existing service under the same name
try:
    Webservice(ws, service_name).delete()
except WebserviceException:
    print('"' + service_name + '" does not exist, creating the webservice...')

# Configure the scoring environment
inference_config = InferenceConfig(runtime="python",
                                   source_directory=folder_name,
                                   entry_script="diabetes_score.py",
                                   conda_file="diabetes_env.yml")

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1,
                                                       memory_gb=1)

# load the registered model
model = ws.models['diabetes_model']

service = Model.deploy(ws, service_name, [model], inference_config, deployment_config)

service.wait_for_deployment(show_output=True)
print(service.state)

"diabetes-service" does not exist, creating the webservice...
Running........................................................................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


### Inference test

In [52]:
# get webservice URI
endpoint = service.scoring_uri

# raw test data
rawdata = [[2, 180, 74, 24, 21, 24, 1.5, 22],
           [0, 148, 58, 11, 179, 39, 0.16, 45]]

print("URI: " + endpoint)
print("Body: " + json.dumps({"data": rawdata})) # convert array to a serialized JSON formatted string object

service.run(json.dumps({"data": rawdata}))

URI: http://2c134555-40dd-4061-a2cb-3a0a8ee7bf34.northeurope.azurecontainer.io/score
Body: {"data": [[2, 180, 74, 24, 21, 24, 1.5, 22], [0, 148, 58, 11, 179, 39, 0.16, 45]]}


'["not-diabetic", "not-diabetic"]'

When you are finished testing your service, clean up the deployment with service.delete()

In [53]:
service.delete()

## Finetuning

Hyperparameter tuning of the model using HyperDrive.  
Hyperdrive runs enable comparison for metrics on all different hyper parameter combinations tried.  

[doc: how to tune hyperparameters](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters)  
[git: examples](https://github.com/microsoft/MLHyperparameterTuning)  

In [61]:
# Initialize workspace
ws = Workspace.from_config()

In [62]:
# Create AmlCompute
training_cluster = ComputeTarget(ws, 'aml-cluster')

In [63]:
# Create a project directory
project_folder = './diabetes_hyperdrive'
os.makedirs(project_folder, exist_ok=True)

In [64]:
# Experiment folder
experiment_folder = './' + project_folder

Prepare training script

In [65]:
%%writefile $experiment_folder/diabetes_training.py

import argparse
from azureml.core import Workspace, Dataset, Experiment, Run
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import glob
print("libraries imported...")

# Get the experiment run context
run = Run.get_context()
print("run context loaded...")

# Set regularization hyperparameter (passed as an argument to the script)
parser = argparse.ArgumentParser()
parser.add_argument('--regularization', type=float, dest='reg_rate', default=0.01, help='regularization rate')
parser.add_argument('--C', type=float, default=1.0, help='Inverse of regularization strength')
parser.add_argument('--solver', type=str, default='lbfgs', help='Algorithm to use in the optimization problem')
args = parser.parse_args()
reg = args.reg_rate
run.log('Inverse of regularization strength', np.float(args.C))
run.log('Algorithm to use in the optimization problem', np.str(args.solver))
print("argparse parameters loaded...")

# load the diabetes dataset (File method)
# Get the training data from the estimator input identified as 'diabetes'
mount = run.input_datasets['diabetes'] # read-only mount from delta lake as '/mnt/data'
print("delta lake mounted...")
diabetes = pd.read_parquet('/mnt/data/diabetes.parquet') # load any file(s) from this delta lake mounted folder
print("dataset loaded...")

# save data into workspace
diabetes.to_csv("outputs/dataset.csv", index=False) # {logs/  outputs/}
print("test: write dataset to workspace 'outputs/dataset.csv'")

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=args.C, solver=args.solver).fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test, y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')

run.complete()

Writing ././diabetes_hyperdrive/diabetes_training.py


In [12]:
# Create an experiment name
experiment = Experiment(ws, 'diabetes-hyperdrive-training')

In [43]:
# Create a Scikit-learn estimator

# get the training compute cluster
training_cluster = ComputeTarget(ws, 'aml-cluster')

# Set the script parameters
script_params = {
    '--regularization': 0.1,
    '--C': 10,
    '--solver': 'lbfgs',
}

# Get the docker environment
training_env = Environment.get(ws, 'training_environment')

# get the registered dataset by name
file_ds = Dataset.get_by_name(ws, "diabetes file dataset")

estimator = Estimator(source_directory=experiment_folder, # All the files in this directory are uploaded into the cluster nodes for execution
                      compute_target=training_cluster, # only compute allowed for hyperparameter tuning
                      entry_script='diabetes_training.py',
                      script_params=script_params,
                      environment_definition=training_env,
                      inputs=[file_ds.as_named_input('diabetes').as_mount(path_on_compute='/mnt/data')],
                     )

In [None]:
# define the hyperparameter space

param_sampling = RandomParameterSampling( {
    '--regularization': choice(1, 0.333, 0.1, 0.033),
    '--C': choice(1, 3, 10, 30),
    '--solver': choice('lbfgs', 'liblinear', 'newton-cg', 'lbfgs', 'sag'),
    } )

hyperdrive_run_config = HyperDriveConfig(estimator=estimator,
                                         hyperparameter_sampling=param_sampling,
                                         primary_metric_name='Accuracy',
                                         primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                         max_total_runs=20,
                                         max_concurrent_runs=5,
                                        )

In [None]:
# start the HyperDrive experiment run (~25')
hyperdrive_run = experiment.submit(config=hyperdrive_run_config)

In [37]:
# Show the run details while running
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [52]:
# Find best run
best_run = hyperdrive_run.get_best_run_by_primary_metric()
print(best_run.get_details()['runDefinition']['arguments'])

['--C', '3', '--regularization', '0.033', '--solver', 'liblinear']
