# AutoML on remote AML Compute (Porto Seguro's Safe Driving Prediction)

This notebook is refactored (from the original AutoML local training notebook) to use AutoML on remote AML compute, in a cluster.
It also uses AML Datasets for training instead of Pandas Dataframes.

## Utilitity methods

Some methods to keep track of the job, download logs etc.

In [136]:
# utility methods

# Currently, there's no SDK v2 equivalent of v1's 'show_output' or 'wait_for_completion' functionality, 
# that prints the AutoML iteration info

def show_output(client, job) -> None:    
    # This doesn't appear to stream anything at the moment
    client.jobs.stream(created_job.name)


def wait_for_completion(client, job, poll_duration: int = 30) -> None:    
    """Poll for job status every `poll_duration` seconds, until it is terminated"""
    import time
    from azure.ml._operations.run_history_constants import RunHistoryConstants

    cur_status = client.jobs.get(job.name).status
    print("Current job status: ", cur_status)
    while cur_status not in RunHistoryConstants.TERMINAL_STATUSES:
        time.sleep(poll_duration)
        cur_status = client.jobs.get(job.name).status
        print("Current job status: ", cur_status)


def download_outputs(client, job) -> None:
    # This does not download any logs (no models as well, since this is at the parent run level)
    client.jobs.download(job.name, download_path="./outputs")

    # For the child run level, currently this throws an exception saying it's not supported for the job type
    try:
        first_child_run = "{}_0".format(job.name)
        client.jobs.download(first_child_run, download_path="./outputs/")
    except Exception as e:
        import traceback

        print(str(e))
        traceback.print_exc()


def print_studio_url(job, open_in_new_tab: bool = False) -> None:
    # TODO: Any easier way to get the URL?
    
    print("Studio URL: ", job.interaction_endpoints['Studio'].endpoint)
    if open_in_new_tab:
        import webbrowser
        webbrowser.open(job.interaction_endpoints['Studio'].endpoint)
        

def download_outputs_via_mlflow_client(mlflow_client, run_id, path) -> str:
    """Download the `path` (file or dir) from the run artifacts, returns the local path download"""
    local_path = os.path.join("/tmp/artifact_downloads/{}".format(run_id), path)
    if os.path.exists(local_path):
        print("Directory {} already exists. Skipping download.".format(os.path.join(local_dir, path)))
    else:
        # download outputs
        if not os.path.exists(local_dir):
            os.mkdir(local_dir)

        local_path = mlflow_client.download_artifacts(run_id, path, local_dir)
        print("Artifacts downloaded to: {}".format(local_path))
        print("Artifacts: {}".format(os.listdir(local_path)))
    return local_path

## Import Needed Packages

Import the general packages needed for this notebook. These are SDK V2 packages needed to create compute, upload dataset, submit jobs etc.

In [2]:
# Global imports
from azure.ml import MLClient
from azure.core.exceptions import ResourceExistsError

from azure.ml.entities.workspace.workspace import Workspace
from azure.ml.entities.compute.compute import Compute
from azure.ml.entities.assets import Data

## Check Azure ML SDK version

In [3]:
import azure.ml
print("You are currently using version", azure.ml, "of the Azure ML SDK")

You are currently using version <module 'azure.ml' from '/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/__init__.py'> of the Azure ML SDK


##  Initiliaze MLClient

The MLClient is used to interface with AzureML services, to submit job, create compute, upload data etc.
The resource group must be existing at this point.

In [4]:
subscription_id = '381b38e9-9840-4719-a5a0-61d9585e1e91'
resource_group_name = 'gasi_rg_neu'

client = MLClient(subscription_id, resource_group_name) # default_workspace_name=workspace)

## Initialize Workspace

Also set this as the default workspace for submitting ML Jobs

In [5]:
workspace_name = 'gasi_ws_neu'
workspace = Workspace(name=workspace_name)

try:
    client.workspaces.create(workspace)
except ResourceExistsError as re:
    print(re)
    
client.default_workspace_name = workspace_name

Workspace with name gasi_ws_neu already exists.


## Initialize MLFlow

In [6]:
# Set the tracking URI to AzureML, and changing the active experiment

import mlflow

##### NOTE: This is SDK v1 API #####
# TODO: How do we get this from MLClient? Tracking URI can't be obtained from v2 Workspace object
from azureml.core import Workspace as WorkspaceV1
ws = WorkspaceV1(workspace_name=workspace_name, resource_group=resource_group_name, subscription_id=subscription_id)
####################################

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

# Set the active experiment, creating one if it doesn't exist
# mlflow.set_experiment(experiment_name)

# Get Experiment Details
# experiment = mlflow.get_experiment_by_name(experiment_name)
# print("Experiment_id: {}".format(experiment.experiment_id))
# print("Artifact Location: {}".format(experiment.artifact_location))
# print("Tags: {}".format(experiment.tags))
# print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

print("\nRegistry URI:         {}".format(mlflow.get_registry_uri()))
print("\nCurrent tracking uri: {}".format(mlflow.get_tracking_uri()))

Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (azureml-train-restclients-hyperdrive 0.1.0.0 (/home/schrodinger/automl/AzureMlCli/src/azureml-train-restclients-hyperdrive), Requirement.parse('azureml-train-restclients-hyperdrive~=1.27.0')).
Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (azureml-dataset-runtime 0.1.0.0 (/home/schrodinger/automl/AzureMlCli/src/azureml-dataset-runtime), Requirement.parse('azureml-dataset-runtime~=1.27.0')).
If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.



Registry URI:         azureml://northeurope.experiments.azureml.net/mlflow/v1.0/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu?

Current tracking uri: azureml://northeurope.experiments.azureml.net/mlflow/v1.0/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu?


### (Optional) Submit dataset file into DataStore (Azure Blob under the covers)

If there's a CSV file locally, upload it to the datastore.

Note that this is currently going to upload the dataset as a File Dataset, which is incompatible with AutoML. As such, ensure that a TabularDataset is registered with this name outside of this notebook.

In [7]:
# Upload dataset

dataset_name = "porto_seguro_safe_driver_prediction_trimmed"
dataset_version = 1

training_data = Data(name=dataset_name, version=dataset_version, local_path="./porto_data")

try:
    data = client.data.create_or_update(training_data)
    print("Uploaded to path  : ", data.path)
    print("Datastore location: ", data.datastore)
except Exception as e:
    print("Could not create dataset. ", str(e))

training_data

Could not create dataset.  (UserError) A data version with this name and version already exists. If you are trying to create a new data version, use a different name or version. If you are trying to update an existing data version, the existing asset's Path cannot be changed. Only tags and description can be updated.
Additional Information:Type: ComponentName
Info: "managementfrontend"Type: Correlation
Info: {
    "operation": "5cf159221529b446b761ce5f9408ca51",
    "request": "e60f0ca38ac11f41"
}Type: Environment
Info: "northeurope"Type: Location
Info: "northeurope"Type: Time
Info: "2021-05-18T00:25:40.7241096+00:00"Type: DebugInfo
Info: {
    "type": "Microsoft.MachineLearning.Common.Core.Exceptions.BaseException",
    "message": "A data version with this name and version already exists. If you are trying to create a new data version, use a different name or version. If you are trying to update an existing data version, the existing asset's Path cannot be changed. Only tags and descr

Data({'is_anonymous': False, 'name': 'porto_seguro_safe_driver_prediction_trimmed', 'id': None, 'description': None, 'tags': {}, 'properties': {}, 'base_path': './', 'creation_context': None, 'version': 1, 'datastore': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu/datastores/workspaceblobstore', 'path': 'az-ml-artifacts/176b8ab41ee91bbc7c2efbfe86483578/porto_data', 'local_path': PosixPath('/home/schrodinger/automl/Easy-AutoML-MLOps/notebooks/3-automl-remote-compute-run/porto_data')})

In [8]:
assert "{}:{}".format(dataset_name, dataset_version) == "{}:{}".format(training_data.name, training_data.version)

## Load data into Azure ML Dataset and Register into Workspace

Tabular Datasets are currently not supported with SDK v2

In [9]:
# Try to load the dataset from the Workspace

dataset = client.data.get(name=dataset_name, version=dataset_version)


# found = False
# aml_dataset_name = "porto_seguro_safe_driver_prediction_train"

# if aml_dataset_name in ws.datasets.keys(): 
#        found = True
#        aml_dataset = ws.datasets[aml_dataset_name] 
#        print("Dataset loaded from the Workspace")
       
# if not found:
#         # Create AML Dataset and register it into Workspace
#         print("Dataset does not exist in the current Workspace. It will be imported and registered.")
        
#         # Option A: Create AML Dataset from file in AML DataStore
#         # datastore = ws.get_default_datastore()
#         # aml_dataset = Dataset.Tabular.from_delimited_files(path=datastore.path('Datasets/porto_seguro_safe_driver_prediction/porto_seguro_safe_driver_prediction_train.csv'))
#         # data_origin_type = 'AMLDataStore'
        
#         # Option B: Create AML Dataset from file in HTTP URL
#         data_url = 'https://azmlworkshopdata.blob.core.windows.net/safedriverdata/porto_seguro_safe_driver_prediction_train.csv'
#         aml_dataset = Dataset.Tabular.from_delimited_files(data_url)  
#         data_origin_type = 'HttpUrl'
        
#         print(aml_dataset)
                
#         #Register Dataset in Workspace
#         registration_method = 'SDK'  # or 'UI'
#         aml_dataset = aml_dataset.register(workspace=ws,
#                                            name=aml_dataset_name,
#                                            description='Porto Seguro Safe Driver Prediction Train dataset file',
#                                            tags={'Registration-Method': registration_method, 'Data-Origin-Type': data_origin_type},
#                                            create_new_version=True)
        
#         print("Dataset created from file and registered in the Workspace")

dataset

Data({'is_anonymous': False, 'name': 'porto_seguro_safe_driver_prediction_trimmed', 'id': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu/data/porto_seguro_safe_driver_prediction_trimmed/versions/1', 'description': None, 'tags': {}, 'properties': {}, 'base_path': './', 'creation_context': <azure.ml._restclient._2021_03_01_preview.machinelearningservices.models._models_py3.SystemData object at 0x7f1400f22b10>, 'version': 1, 'datastore': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu/datastores/workspaceblobstore', 'path': 'UI/05-13-2021_081406_UTC/porto_seguro_safe_driver_prediction_trimmed.csv', 'local_path': None})

In [10]:
# # Use Pandas DataFrame just to sneak peak some data and schema
# data_df = aml_dataset.to_pandas_dataframe()
# print(data_df.shape)
# print(data_df.describe())
# data_df.head(5)

## Split Data into Train and Test AML Tabular Datasets

Remote AML Training you need to use AML Datasets, you cannot submit Pandas Dataframes to remote runs of AutoMLConfig.

Note that AutoMLConfig below is not using the Test dataset (you only provide a single dataset that will internally be split in validation/train datasets or use cross-validation depending on the size of the dataset. The boundary for that is 20k rows, using cross-validation if less than 20k. This can also be decided by the user.). 

The Test dataset will be used at the end of the notebook to manually calculate the quality metrics with a dataset not seen by AutoML training.

In [11]:
# # Split in train/test datasets (Test=10%, Train=90%)

# train_dataset, test_dataset = aml_dataset.random_split(0.9, seed=0)

# # Use Pandas DF only to check the data
# train_df = train_dataset.to_pandas_dataframe()
# test_df = test_dataset.to_pandas_dataframe()

In [12]:
# print(train_df.shape)
# print(test_df.shape)

# train_df.describe()

In [13]:
# train_df.head(5)

## Connect to Remote AML Compute (Existing AML cluster)

Note that this step currently fails due to some deserialization error from SDK. Ensure that a compute cluster exists by creating it outside of this notebook.

In [14]:
# Set or create compute

cpu_cluster_name = "cpucluster"
compute = Compute("amlcompute",
                  name=cpu_cluster_name, size="STANDARD_D13_V2",
                  min_instances=0, max_instances=3,
                  idle_time_before_scale_down=120)

# Load directly from YAML file
# compute = Compute.load("./compute.yaml")

try:
    # TODO: This currently results in an exception in Azure ML, please create compute manually.
    client.compute.create(compute)
except ResourceExistsError as re:
    print(re)
except Exception as e:
    import traceback
    
    print("Could not create compute.", str(e))
    traceback.print_exc()

Could not create compute. Cannot deserialize duration object., ISO8601Error: Unable to parse duration string ''


Traceback (most recent call last):
  File "/home/schrodinger/anaconda3/envs/devmar/lib/python3.7/site-packages/msrest/serialization.py", line 1872, in deserialize_duration
    duration = isodate.parse_duration(attr)
  File "/home/schrodinger/anaconda3/envs/devmar/lib/python3.7/site-packages/isodate/isoduration.py", line 104, in parse_duration
    raise ISO8601Error("Unable to parse duration string %r" % datestring)
isodate.isoerror.ISO8601Error: Unable to parse duration string ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<ipython-input-14-5f5576546e55>", line 14, in <module>
    client.compute.create(compute)
  File "/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/_operations/compute_operations.py", line 98, in create
    polling=not no_wait,
  File "/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/_restclient/_2021_03_01_preview/machinelearningservices/operations/_machine_learning_compute_operati

In [15]:
# For additional details of current AmlCompute status:
# aml_remote_compute.get_status()

## Train with Azure AutoML automatically searching for the 'best model' (Best algorithms and best hyper-parameters)

### List and select primary metric to drive the AutoML classification problem

In [16]:
# from azureml.train import automl

# # List of possible primary metrics is here:
# # https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train#primary-metric
    
# # Get a list of valid metrics for your given task
# automl.utilities.get_primary_metrics('classification')

## Define AutoML Experiment settings

In [17]:
from azure.ml._restclient._2020_09_01_preview.machinelearningservices.models import GeneralSettings, LimitSettings, DataSettings, TrainingDataSettings, ValidationDataSettings, TrainingSettings
from azure.ml._restclient._2020_09_01_preview.machinelearningservices.models._azure_machine_learning_workspaces_enums import TaskType, OptimizationMetric
from azure.ml._schema.compute_binding import InternalComputeConfiguration
from azure.ml.entities import AutoMLJob
from azure.ml.entities.job.automl.featurization import FeaturizationSettings

# TODO: Add the following
# blocked_models = ['LogisticRegression', 'ExtremeRandomTrees', 'RandomForest'], 
# allowed_models = ['LightGBM'],
# enable_voting_ensemble = True,
# enable_stack_ensemble = False,
# enable_early_stopping= True,
# experiment_timeout_hours=3,                           
# debug_log='automated_ml_errors.log',
# verbosity= logging.DEBUG,
# enable_onnx_compatible_models=True,

compute = InternalComputeConfiguration(target=cpu_cluster_name)

general_settings = GeneralSettings(task_type=TaskType.CLASSIFICATION,
                                   primary_metric= OptimizationMetric.AUC_WEIGHTED,
                                   enable_model_explainability=True)

# TODO: Seems like a bug here, max_trials=3 + max_concurrent_trials=4 seems to only trigger one child run
limit_settings = LimitSettings(job_timeout=60,
                               max_trials=4,
                               max_concurrent_trials=4,
                               enable_early_termination=False)

training_data_settings = TrainingDataSettings(dataset_arm_id="{}:{}".format(training_data.name, training_data.version),
                                              target_column_name="target")
validation_data_settings = ValidationDataSettings(validation_size=0.1)
data_settings = DataSettings(training_data=training_data_settings, validation_data=validation_data_settings)

featurization_settings = FeaturizationSettings(featurization_config="auto")

training_settings = TrainingSettings(enable_dnn_training=False)

extra_automl_settings = {"save_mlflow": True}

automl_job = AutoMLJob(
#     name=job_name,
    compute=compute,
    general_settings=general_settings,
    limit_settings=limit_settings,
    data_settings=data_settings,
    training_settings=training_settings,
    featurization_settings=featurization_settings,
    properties=extra_automl_settings,
)

automl_job

AutoMLJob({'name': '05bb01e9-a479-48ba-a0e3-3bcd868dbcd7', 'id': None, 'description': None, 'tags': {}, 'properties': {'save_mlflow': True}, 'base_path': './', 'type': 'automl_job', 'creation_context': None, 'experiment_name': '3-automl-remote-compute-run', 'status': None, 'interaction_endpoints': None, 'log_files': None, 'output': None, 'general_settings': <azure.ml._restclient._2020_09_01_preview.machinelearningservices.models._models_py3.GeneralSettings object at 0x7f1400f39310>, 'data_settings': <azure.ml._restclient._2020_09_01_preview.machinelearningservices.models._models_py3.DataSettings object at 0x7f140b575e90>, 'limit_settings': <azure.ml._restclient._2020_09_01_preview.machinelearningservices.models._models_py3.LimitSettings object at 0x7f140b592dd0>, 'forecasting_settings': None, 'training_settings': <azure.ml._restclient._2020_09_01_preview.machinelearningservices.models._models_py3.TrainingSettings object at 0x7f140b575fd0>, 'featurization_settings': <azure.ml.entities.j

## Run Experiment (on remote AML Compute) with multiple child runs under the covers

In [18]:
# Submit job
# TODO: There appears to be a bug here (repro: try executing this cell twice)
created_job = client.jobs.create_or_update(automl_job)
created_job

# Dump _all_ info we have about the job 
# created_job._dump_yaml()

AutoMLJob({'name': '05bb01e9-a479-48ba-a0e3-3bcd868dbcd7', 'id': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu/jobs/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7', 'description': None, 'tags': {}, 'properties': {'save_mlflow': 'True'}, 'base_path': './', 'type': 'automl_job', 'creation_context': <azure.ml._restclient._2020_09_01_preview.machinelearningservices.models._models_py3.SystemData object at 0x7f140ce3d3d0>, 'experiment_name': '3-automl-remote-compute-run', 'status': 'NotStarted', 'interaction_endpoints': {'Tracking': <azure.ml._restclient._2020_09_01_preview.machinelearningservices.models._models_py3.JobEndpoint object at 0x7f140ce3d290>, 'Studio': <azure.ml._restclient._2020_09_01_preview.machinelearningservices.models._models_py3.JobEndpoint object at 0x7f140ce3d2d0>}, 'log_files': None, 'output': None, 'general_settings': <azure.ml._restclient._2020_09_01_preview.machinelearningservi

## Explore results with Widget

> Note: This doesn't have any equivalent SDK v2 API right now.

In [19]:
# # Explore the results of automatic training with a Jupyter widget: https://docs.microsoft.com/en-us/python/api/azureml-widgets/azureml.widgets?view=azure-ml-py
# from azureml.widgets import RunDetails
# RunDetails(parent_run).show()

In [20]:
# Wait for the remote parent run to complete

# Get Studio URL, open in new tab
print_studio_url(created_job)

# Wait until the job is finished
wait_for_completion(client, created_job)

# Download logs + outputs locally
download_outputs(client, created_job)


Studio URL:  https://ml.azure.com/runs/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7?wsid=/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourcegroups/gasi_rg_neu/workspaces/gasi_ws_neu&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
Current job status:  NotStarted
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Running
Current job status:  Completed


Downloading the job logs ExperimentRun/dcid.05bb01e9-a479-48ba-a0e3-3bcd868dbcd7/ at ./outputs/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7


Operation returned an invalid status 'A job was found, but it is not supported in this API version and cannot be accessed.'


Traceback (most recent call last):
  File "<ipython-input-1-4fa2dbf6b888>", line 31, in download_outputs
    client.jobs.download(first_child_run, download_path="./outputs/")
  File "/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/_operations/job_operations.py", line 255, in download
    job_details = self.get(name)
  File "/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/_operations/job_operations.py", line 110, in get
    job_object = self._get_job(name)
  File "/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/_operations/job_operations.py", line 294, in _get_job
    **self._kwargs,
  File "/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/_restclient/_2020_09_01_preview/machinelearningservices/operations/_jobs_operations.py", line 196, in get
    raise HttpResponseError(response=response, model=error, error_format=ARMErrorFormat)
azure.core.exceptions.HttpResponseError: Operation returned an invalid status 'A job was found, but it is not supported

### Measure Parent Run Time needed for the whole AutoML process 

> Note: Todo -  Start time /  End time is available on run.info[start_time, end_time]

In [21]:
# import time
# from datetime import datetime

# run_details = parent_run.get_details()

# # Like: 2020-01-12T23:11:56.292703Z
# end_time_utc_str = run_details['endTimeUtc'].split(".")[0]
# start_time_utc_str = run_details['startTimeUtc'].split(".")[0]
# timestamp_end = time.mktime(datetime.strptime(end_time_utc_str, "%Y-%m-%dT%H:%M:%S").timetuple())
# timestamp_start = time.mktime(datetime.strptime(start_time_utc_str, "%Y-%m-%dT%H:%M:%S").timetuple())

# parent_run_time = timestamp_end - timestamp_start
# print('Run Timing: --- %s minutes needed for running the whole Remote AutoML Experiment ---' % (parent_run_time/60))

### Creating ModelProxy for submitting prediction runs to the training environment.
We will create a ModelProxy for the best child run, which will allow us to submit a run that does the prediction in the training environment. Unlike the local client, which can have different versions of some libraries, the training environment will have all the compatible libraries for the model already.


> **Note:**<br/>
  This is currently not possible on SDK V2, the ModelProxy object expects a SDK v1 Run object. 
<br/>**Gaps**: <br/> 1. Make ModelProxy v1 concept independent (e.g. Run, Environment etc.), accepting individual properties that are needed for a Model Proxy job. OR, create ModelProxyJob? Or...? <br/> 2. Child runs are currently not submitted via. MFE, hence there's no way today to load the underlying child runs as v2 a 'Job', and access related information (like get it's environment, outputs etc.)

In [22]:
# ######################### Prepare Data ############################
# y_test = test_dataset.keep_columns('target')
# test_data_no_label = test_dataset.drop_columns('target')

# test_data_no_label_df = test_data_no_label.to_pandas_dataframe()
# print(test_data_no_label_df.shape)
# ####################################################################


# ################## Model Proxy Run ########################
# from azureml.train.automl.model_proxy import ModelProxy
# best_run = parent_run.get_best_child()
# # best_run = parent_run.get_best_child(metric = "accuracy")

# best_model_proxy = ModelProxy(best_run, aml_remote_compute)
# y_pred_test = best_model_proxy.predict(test_data_no_label)

# y_pred_test
# ############################################################

### Show hyperparameters
Show the model pipeline used for the best run with its hyperparameters.

> **Note:** <br/>This isn't possible with SDK V2, much of the Run DTO stuff gets lost in the v2 equivalent (Job).  <br/>**Gaps:** <br/> 1. Child runs are currently not submitted via. MFE, hence there's no way today to load the underlying child runs as v2 a 'Job', and access related information (like get it's environment, outputs etc.) <br/> 2. Decide if these methods need to be on the Run object, or should be ported to the model object, and be accessed from there (e.g. `best_automl_job.load_model().print_pipeline()`)

## Retrieve the 'Best' Model

Using MLFlowClient to get the best child run

In [24]:
from mlflow.tracking import MlflowClient

# TODO: Use this run, as it has MLFlow model stored on the run - AutoML_3454b06e-2e3e-4e3e-a8e0-f52f50f9f358

mlflow_client = MlflowClient()
mlflow_parent_run = mlflow_client.get_run(created_job.name)

best_child_run_id = mlflow_parent_run.data.tags["automl_best_child_run_id"]
print("Found best child run id: ", best_child_run_id)

mlflow_best_child_run = mlflow_client.get_run(best_child_run_id)
mlflow_best_child_run

Found best child run id:  05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_1


<Run: data=<RunData: metrics={'AUC_macro': 0.617244424629982,
 'AUC_micro': 0.9716681799999999,
 'AUC_weighted': 0.6172446874741211,
 'accuracy': 0.9634,
 'average_precision_score_macro': 0.5141399710497053,
 'average_precision_score_micro': 0.9667662768482096,
 'average_precision_score_weighted': 0.9419044919340782,
 'balanced_accuracy': 0.5,
 'f1_score_macro': 0.49067943363553024,
 'f1_score_micro': 0.9634,
 'f1_score_weighted': 0.9454411327289397,
 'log_loss': 0.15478797349812543,
 'matthews_correlation': 0.0,
 'norm_macro_recall': 0.0,
 'precision_score_macro': 0.4817,
 'precision_score_micro': 0.9634,
 'precision_score_weighted': 0.92813956,
 'recall_score_macro': 0.5,
 'recall_score_micro': 0.9634,
 'recall_score_weighted': 0.9634,
 'weighted_accuracy': 0.99855880571045}, params={}, tags={'_aml_system_ComputeTargetStatus': '{"AllocationState":"steady","PreparingNodeCount":0,"RunningNodeCount":1,"CurrentNodeCount":1}',
 '_aml_system_automl_is_child_run_end_telemetry_event_logged':

### Show hyperparameters
Show the model pipeline used for the best run with its hyperparameters.

In [26]:
!pip install xgboost==0.90

Collecting xgboost==0.90
  Downloading xgboost-0.90-py2.py3-none-manylinux1_x86_64.whl (142.8 MB)
[K     |████████████████████████████████| 142.8 MB 91 kB/s s eta 0:00:01    |███████████████████▍            | 86.7 MB 10.7 MB/s eta 0:00:06     |█████████████████████████▏      | 112.2 MB 12.0 MB/s eta 0:00:03
Installing collected packages: xgboost
Successfully installed xgboost-0.90


In [27]:
# Local predictions using python function flavor

import mlflow.pyfunc

try:
    fitted_model = mlflow.pyfunc.load_model("runs:/{}/outputs".format(mlflow_best_child_run.info.run_id))
except Exception as e:
    # TODO: This is probably due to a bug, where MLFlow models are not being generated despite 'save_mlflow'
    print(str(e))
    print("Failed to load MLFlow model, downloading artifacts manually and loading the model.")
    
    import os, pickle

    local_dir = "/tmp/artifact_downloads/{}".format(mlflow_best_child_run.info.run_id)
    if not os.path.exists(local_dir):
        os.mkdir(local_dir)

    local_path = mlflow_client.download_artifacts(mlflow_best_child_run.info.run_id, "outputs", local_dir)
    print("Artifacts downloaded in: {}".format(local_path))
    print("Artifacts: {}".format(os.listdir(local_path)))
    
    pickled_model_path = "{}/model.pkl".format(local_path)
    with open(pickled_model_path, "rb") as model_file:
        fitted_model = pickle.load(model_file)

print(fitted_model)

[Errno 2] No such file or directory: '/tmp/tmp1zc5q0_i/outputs/MLmodel'
Failed to load MLFlow model, downloading artifacts manually and loading the model.
Artifacts downloaded in: /tmp/artifact_downloads/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_1/outputs
Artifacts: ['pipeline_graph.json', 'env_dependencies.json', 'scoring_file_v_1_0_0.py', 'model.pkl', 'conda_env_v_1_0_0.yml']
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=False, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=False, is_onnx_compatible=False, observer=None, task='classification', working_dir='/home/schrodinger/automl/Easy-AutoML-MLOps/notebooks/3-automl-remote-compute-run')),
                ('MaxAbsScaler', MaxAbsScaler(copy=True)),
                ('XGBoostClassifier',
                 XGBoostClassifier(n_jobs=-1, problem_info=None, random_state=0))],
     

In [38]:
# Print more information about the winning model
print("Preprocessor: \n{}\n".format(fitted_model.steps[1]))
print("Estimator: \n", fitted_model.steps[2])

Preprocessor: 
('MaxAbsScaler', MaxAbsScaler(copy=True))

Estimator: 
 ('XGBoostClassifier', XGBoostClassifier(
    random_state=0,
    n_jobs=-1,
    problem_info=None
))


#### Retrieve METRICS for All Child Runs
You can also use SDK methods to fetch all the child runs and see individual metrics that we log.

In [45]:
import pandas as pd

all_iter_metrics = dict()

# max_trials is = 4, so gather metrics for the 4 child iterations
for i in range(4):
    # Construct child run id
    child_run_id = mlflow_parent_run.info.run_id + "_{}".format(i)
    
    # parse metrics for this iteration
    metrics = mlflow_client.get_run(child_run_id).data.metrics
    
    # index by iteration
    all_iter_metrics[i] = metrics

rundata = pd.DataFrame(all_iter_metrics).sort_index(1)
rundata

Unnamed: 0,0,1,2,3
f1_score_micro,0.96,0.96,0.96,0.96
norm_macro_recall,0.0,0.0,0.0,0.0
AUC_weighted,0.57,0.62,0.62,0.62
precision_score_weighted,0.93,0.93,0.93,0.93
accuracy,0.96,0.96,0.96,0.96
AUC_micro,0.97,0.97,0.97,0.97
AUC_macro,0.57,0.62,0.62,0.62
recall_score_weighted,0.96,0.96,0.96,0.96
matthews_correlation,0.0,0.0,0.0,0.0
recall_score_micro,0.96,0.96,0.96,0.96


## Retrieve the Best Model's explanation
Retrieve the explanation from the best_run which includes explanations for engineered features and raw features. Make sure that the run for generating explanations for the best model is completed.

In [51]:
model_explain_run_id = mlflow_best_child_run.data.tags["model_explain_run_id"]

# Wait for the best model explanation run to complete
status = ""
while True:
    mlflow_model_explain_run = mlflow_client.get_run(model_explain_run_id)
    status = mlflow_model_explain_run.info.status
    print("Current Status: ", status)
    if status == "FINISHED":
        break
    time.sleep(10)

mlflow_model_explain_run

Current Status:  FINISHED


<Run: data=<RunData: metrics={}, params={}, tags={'_aml_system_ComputeTargetStatus': '{"AllocationState":"steady","PreparingNodeCount":0,"RunningNodeCount":1,"CurrentNodeCount":1}',
 'mlflow.parentRunId': '05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_1',
 'mlflow.source.name': 'model_explain.py',
 'mlflow.source.type': 'JOB'}>, info=<RunInfo: artifact_uri='azureml://experiments/3-automl-remote-compute-run/runs/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_ModelExplain/artifacts', end_time=1621298404352, experiment_id='71434ea8-978c-473a-a449-88f3bfd5bbc4', lifecycle_stage='active', run_id='05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_ModelExplain', run_uuid='05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_ModelExplain', start_time=1621298301196, status='FINISHED', user_id='d0b038bb-162b-4d49-b8fe-5786e199f6fb'>>

### Download and Print engineered feature importance from artifact store
You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run.
> **Note**: \
  This wouldn't work. It would likely require changes in ExplanationClient to accept non-v1 structures in its initialization, similar to ModelProxy

In [None]:
# from azureml.interpret import ExplanationClient

# client = ExplanationClient.from_run(best_run)
# engineered_explanations = client.download_model_explanation(raw=False)
# exp_data = engineered_explanations.get_feature_importance_dict()
# exp_data

### Download raw feature importance from artifact store
You can use ExplanationClient to download the raw feature explanations from the artifact store of the best_run.

In [None]:
# client = ExplanationClient.from_run(best_run)
# engineered_explanations = client.download_model_explanation(raw=True)
# exp_data = engineered_explanations.get_feature_importance_dict()
# exp_data

## Register Model in Workspace model registry

In [None]:
# Ensure that the model exists locally
output_path = download_outputs_via_mlflow_client(mlflow_client, mlflow_best_child_run.info.run_id, "outputs")

# Create (register?) the model
azure_model = Model(name="porto-seg-automl-remote-compute", version=1, local_path=os.path.join(output_path, "model.pkl"))
azure_model = client.models.create_or_update(azure_model)


registered_model = parent_run.register_model(model_name='porto-seg-automl-remote-compute', 
                                           description='Porto Seguro Model from plain AutoML in remote AML compute')

print(parent_run.model_id)
registered_model

## Deploy Model

In [233]:
from azure.ml.entities import Endpoint, ManagedOnlineEndpoint, Environment, \
CodeConfiguration, ManagedOnlineDeployment, ManualScaleSettings, Code

inference_script_file_name = os.path.join(output_path, "scoring_file_v_1_0_0.py")
conda_environment_yaml = os.path.join(output_path, "conda.yaml")

print("Inference File: ", inference_script_file_name)
print("Conda Environment File: ", conda_environment_yaml)

assert os.path.exists(inference_script_file_name)
assert os.path.exists(conda_environment_yaml)


# Prepare the deployment configuration
environment = Environment(
    name="environment-{}".format(mlflow_best_child_run.info.run_id[:8]),
    version=1,
    path=".",
    conda_file=conda_environment_yaml,
    docker_image="mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20210301.v1",
)

code = Code(
    name="environment-{}".format(mlflow_best_child_run.info.run_id[:8]),
    version=1,
    local_path=inference_script_file_name,
)
code_configuration = CodeConfiguration(
    code=code,
    scoring_script=inference_script_file_name
)

scale_settings = ManualScaleSettings(
    scale_type="Manual",
    min_instances=1,
    max_instances=2,
    instance_count=1
)
deployment = ManagedOnlineDeployment(
    name="deployment-{}".format(mlflow_best_child_run.info.run_id[:8]),
    model=registered_model,
    environment=environment,
    code_configuration=code_configuration,
    instance_type="Standard_F2s_v2",
    scale_settings=scale_settings,
                                    )
online_endpoint = ManagedOnlineEndpoint(
    name="endpoint-{}".format(mlflow_best_child_run.info.run_id[:8]),
    deployments=[deployment],
    description="Demo model deployment",
    tags={"deployed_using": "sdkv2"}
)
##### Loading from YAML
# endpoint = Endpoint.load("/home/schrodinger/automl/Easy-AutoML-MLOps/notebooks/3-automl-remote-compute-run/endpoint.yml")

try:
    client.endpoints.create(online_endpoint)
except Exception as e:
    print("Deployment failed: ", str(e))
    traceback.print_exc()
    

Inference File:  /tmp/artifact_downloads/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_1/outputs/scoring_file_v_1_0_0.py
Conda Environment File:  /tmp/artifact_downloads/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_1/outputs/conda.yaml


Uploading scoring_file_v_1_0_0.py: 100%|██████████| 1/1 [00:00<00:00,  7.19it/s]

The deployment request gasi_ws_neu-endpoint-05bb01e9-5759945 was accepted,  status can be found in the link below: 
https://ms.portal.azure.com/#blade/HubsExtension/DeploymentDetailsBlade/overview/id/%2Fsubscriptions%2F381b38e9-9840-4719-a5a0-61d9585e1e91%2FresourceGroups%2Fgasi_rg_neu%2Fproviders%2FMicrosoft.Resources%2Fdeployments%2Fgasi_ws_neu-endpoint-05bb01e9-5759945

Registering environment version (environment-05bb01e9:1)  Done (4s)
Creating endpoint endpoint-05bb01e9 ..  Done (30s)
Polling hit the exception (DeploymentFailed) At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.


Deployment failed:  (DeploymentFailed) At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.


Traceback (most recent call last):
  File "/home/schrodinger/anaconda3/envs/devmar/lib/python3.7/site-packages/azure/core/polling/base_polling.py", line 482, in run
    self._poll()
  File "/home/schrodinger/anaconda3/envs/devmar/lib/python3.7/site-packages/azure/core/polling/base_polling.py", line 521, in _poll
    raise OperationFailed("Operation failed or canceled")
azure.core.polling.base_polling.OperationFailed: Operation failed or canceled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<ipython-input-233-6864668628da>", line 57, in <module>
    client.endpoints.create(online_endpoint)
  File "/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/_operations/endpoint_operations.py", line 211, in create
    return self._create_online_endpoint(internal_endpoint=endpoint, no_wait=no_wait)
  File "/home/schrodinger/automl/sdk-cli-v2/src/azure-ml/azure/ml/_operations/endpoint_operations.py", line 608, in _create_onli

## See files associated with the 'Best run'

In [106]:
artifacts = mlflow_client.list_artifacts(mlflow_best_child_run.info.run_id)

directories = []
files = []
for artifact in artifacts:
    if artifact.is_dir:
        directories.append(artifact)
    else:
        # File
        files.append(artifact)

print("Directories: ", set([d.path for d in directories]))
print("Files: ", set([f.path for f in files]))

Directories:  {'azureml-logs', 'logs', 'explanation', 'outputs'}
Files:  {'confusion_matrix', 'automl_driver.py', 'accuracy_table'}


## Make Predictions and calculate metrics

### Prep Test Data: 
Generating data in-line (can also load a local test file as pandas dataframe). Tabular Datasets are missing, so can't capture a pandas dataframe object for local predictions, nor there is test run support on SDKv2

In [114]:
import numpy as np
import json

# Duplicating the same row twice
raw_data = json.dumps({
     'data': [
         [20,2,1,3,1,0,0,1,0,0,0,0,0,0,0,8,1,0,0,0.6,0.1,0.61745445,6,1,-1,0,1,11,1,1,0,1,99,2,0.31622777,0.6396829,0.36878178,3.16227766,0.2,0.6,0.5,2,2,8,1,8,3,10,3,0,0,10,0,1,0,0,1,0, 0],
         [20,2,1,3,1,0,0,1,0,0,0,0,0,0,0,8,1,0,0,0.6,0.1,0.61745445,6,1,-1,0,1,11,1,1,0,1,99,2,0.31622777,0.6396829,0.36878178,3.16227766,0.2,0.6,0.5,2,2,8,1,8,3,10,3,0,0,10,0,1,0,0,1,0, 1]
     ],
     'method': 'predict'  # If you have a classification model, you can get probabilities by changing this to 'predict_proba'.
 })

numpy_data = np.array(json.loads(raw_data)['data'])

df_data = pd.DataFrame(data=numpy_data, columns=['id', 'ps_ind_01', 'ps_ind_02_cat', 'ps_ind_03', 'ps_ind_04_cat',
                                               'ps_ind_05_cat', 'ps_ind_06_bin', 'ps_ind_07_bin', 'ps_ind_08_bin',
                                               'ps_ind_09_bin', 'ps_ind_10_bin', 'ps_ind_11_bin', 'ps_ind_12_bin',
                                               'ps_ind_13_bin', 'ps_ind_14', 'ps_ind_15', 'ps_ind_16_bin',
                                               'ps_ind_17_bin', 'ps_ind_18_bin', 'ps_reg_01', 'ps_reg_02', 'ps_reg_03',
                                               'ps_car_01_cat', 'ps_car_02_cat', 'ps_car_03_cat', 'ps_car_04_cat',
                                               'ps_car_05_cat', 'ps_car_06_cat', 'ps_car_07_cat', 'ps_car_08_cat',
                                               'ps_car_09_cat', 'ps_car_10_cat', 'ps_car_11_cat', 'ps_car_11',
                                               'ps_car_12', 'ps_car_13', 'ps_car_14', 'ps_car_15', 'ps_calc_01',
                                               'ps_calc_02', 'ps_calc_03', 'ps_calc_04', 'ps_calc_05', 'ps_calc_06',
                                               'ps_calc_07', 'ps_calc_08', 'ps_calc_09', 'ps_calc_10', 'ps_calc_11',
                                               'ps_calc_12', 'ps_calc_13', 'ps_calc_14', 'ps_calc_15_bin',
                                               'ps_calc_16_bin', 'ps_calc_17_bin', 'ps_calc_18_bin', 'ps_calc_19_bin',
                                               'ps_calc_20_bin', 'target'])
y_test = df_data.pop('target')
print("y_test shape: ", y_test.shape)
df_data

y_test shape:  (2,)


Unnamed: 0,id,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,ps_ind_09_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
0,20.0,2.0,1.0,3.0,1.0,0.0,0.0,1.0,0.0,0.0,...,3.0,0.0,0.0,10.0,0.0,1.0,0.0,0.0,1.0,0.0
1,20.0,2.0,1.0,3.0,1.0,0.0,0.0,1.0,0.0,0.0,...,3.0,0.0,0.0,10.0,0.0,1.0,0.0,0.0,1.0,0.0


### Make predictions in bulk

In [115]:
# Try the best model making predictions with the test dataset
y_predictions = fitted_model.predict(df_data)

print(y_predictions)

[0 0]


### Get all the predictions' probabilities needed to calculate ROC AUC

In [116]:
class_probabilities = fitted_model.predict_proba(df_data)

print('Some class probabilities...: ')
print(class_probabilities)

Some class probabilities...: 
[[0.9700505  0.02994949]
 [0.9700505  0.02994949]]


## Evaluate Model

Evaluating performance is an essential task in machine learning. In this case, because this is a classification problem, the data scientist elected to use an AUC - ROC Curve. When we need to check or visualize the performance of the multi - class classification problem, we use AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve. It is one of the most important evaluation metrics for checking any classification model’s performance.

<img src="https://www.researchgate.net/profile/Oxana_Trifonova/publication/276079439/figure/fig2/AS:614187332034565@1523445079168/An-example-of-ROC-curves-with-good-AUC-09-and-satisfactory-AUC-065-parameters.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 12px; width: 320px; height: 239px;" />

### Calculate the ROC AUC with probabilities vs. the Test Dataset

In [119]:
# Not enough test data, but no changes here

from sklearn.metrics import roc_auc_score

#roc_auc_score(y, clf.predict_proba(X)[:, 1])
#roc_auc_score(y, clf.decision_function(X))

print('ROC AUC *method 1*:')
print(roc_auc_score(y_test, class_probabilities[:,1]))

print('ROC AUC Weighted:')
print(roc_auc_score(y_test, class_probabilities[:,1], average='weighted'))
# AUC with plain LightGBM was: 0.6374553321494826 

ROC AUC *method 1*:
0.5
ROC AUC Weighted:
0.5


### Calculate the Accuracy with predictions vs. the Test Dataset

In [95]:
print(df_data.shape)
print(y_predictions.shape)

(2, 58)
(2,)


In [120]:
from sklearn.metrics import accuracy_score

print('Accuracy:')
print(accuracy_score(y_test, y_predictions))


Accuracy:
0.5


### Load model in memory

#### (Option A: Load from model .pkl file)

This is demonstrated above.

#### (Option B: Load from model registry in Workspace)

#### Using SDK v2

In [148]:
from azure.ml.entities.assets import Model

# Retrieve the registered model by name.
registered_model = client.models.get(azure_model.name, azure_model.version)
registered_model


Directory /tmp/artifact_downloads/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_1/outputs already exists. Skipping download.


Model({'is_anonymous': False, 'name': 'porto-seg-automl-remote-compute', 'id': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu/models/porto-seg-automl-remote-compute/versions/1', 'description': None, 'tags': {}, 'properties': {}, 'base_path': './', 'creation_context': <azure.ml._restclient._2021_03_01_preview.machinelearningservices.models._models_py3.SystemData object at 0x7f13c5bcd250>, 'version': 1, 'datastore': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu/datastores/workspaceblobstore', 'path': 'az-ml-artifacts/2f589510f11c75fc407a9f68ca9408cd/model.pkl', 'local_path': None, 'utc_time_created': None, 'flavors': {}})

##### Using MLFlow

In [165]:
# Retrieve the registered model by name using MLFlow client
registered_model = next(filter(lambda model: model.name == model_name, mlflow_client.list_registered_models()))
# print("name={}; run_id={}; version={}".format(registered_model.name, registered_model.run_id, registered_model.version))
registered_model


<RegisteredModel: creation_timestamp=1621384876543, description='', last_updated_timestamp=1621384876543, latest_versions=[], name='porto-seg-mlflow_05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_1', tags={}>

In [147]:
# Load model from model registry in Workspace
from azureml.core.model import Model

model_path = Model.get_model_path('porto-seg-automl-remote-compute', _workspace=ws)
print(model_path)
# fitted_model = joblib.load(model_path)
# print(fitted_model)

azureml-models/porto-seg-automl-remote-compute/1/model.pkl


## Try model inference with hardcoded input data for the model to predict

This is demonstrated above.

## Retrieve the Best ONNX Model
Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration.

Set the parameter return_onnx_model=True to retrieve the best ONNX model, instead of the Python model.

In [None]:
# TODO: This should be similar to above, although not sure why ONNX models are not being generated at the moment.
#       (Same class of bug as MLflow models not being saved?)

best_run, onnx_mdl = parent_run.get_output(return_onnx_model=True)

### Save the best ONNX model to local path
#### Predict with the ONNX model, using onnxruntime package

In [238]:
def get_onnx_res(onnx_resource_json_path):
    with open(onnx_resource_json_path) as f:
        onnx_res = json.load(f)
    return onnx_res


output_path = download_outputs_via_mlflow_client(mlflow_client, mlflow_best_child_run.info.run_id, "outputs")

onnx_model_path = os.path.join(local_dir, "outputs", "model.onnx")

# Loading an ONNX model with MLFlow can be done via.the following, however, we currently don't save the flavor
# information in the MLModel file.
# mlflow.onnx.load_model(onnx_model_path)

# Loading via. OnnxConverter
from azureml.automl.runtime.onnx_convert import OnnxConverter
from azureml.automl.runtime.onnx_convert import OnnxInferenceHelper

onnx_resource_json_path = os.path.join(local_dir, "outputs", "model_onnx.json")
fitted_onnx_model = OnnxConverter.load_onnx_model(onnx_model_path)

mdl_bytes = fitted_onnx_model.SerializeToString()
onnx_res = get_onnx_res(onnx_resource_json_path)

onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)
pred_onnx, pred_prob_onnx = onnxrt_helper.predict(df_data)

print(pred_onnx)
print(pred_prob_onnx)

Directory /tmp/artifact_downloads/05bb01e9-a479-48ba-a0e3-3bcd868dbcd7_1/outputs already exists. Skipping download.
[0 0]
[[0.9730727  0.02692736]
 [0.9730727  0.02692736]]
