Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Automated Machine Learning
**Single Model Demand Forecasting**

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Compute](#Compute)
1. [Data](#Data)
1. [Import Components From Registry](#ImportComponents)
1. [Create a Pipeline](#Pipeline)
1. [Kick Off Pipeline Runs](#PipelineRuns)
1. [Download Output](#DownloadOutput)
1. [Compare Evaluation Results](#CompareResults)
1. [Deployment](#Deployment)

## 1. Introduction

The objective of this notebook is to illustrate how to use the component-based AutoML single model solution. It walks you through all stages of model evaluation and production process starting with data ingestion and concluding with batch endpoint deployment for production. In this tutorial we will illustrate how to leverage AutoML and train a destributed TCN model ([link](placeholder)). However, the same notebook can be used to train a non-distributed TCN as well as the conventional ML models.

We use a subset of UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)) with the objective of predicting electricity demand per consumer 24 hours ahead. The data was preprocessed using the [data prep notebook](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-data-preparation/auto-ml-forecasting-data-preparation.ipynb). Please refer to it for illustration on how to download the data from the source, aggregate to an hourly frequency, convert from wide to long format and upload to the Datastore. Here, we will work with the data that has been pre-processed and saved locally in the parquet format.

There are a number of steps you need to take before you can put a model into production. A user needs to prepare the data, partition it into appropriate sets, select the best model, evaluate it against a baseline, and monitor the model in real life to collect enough observations on how it would perform had it been put in production. Some of these steps are time consuming, some require certain expertise in writing code. The steps shown in this notebook follow a typical thought process one follows before the model is put in production.

Make sure you have executed the [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) before running this notebook.

## 2. Setup

In [None]:
# Import required libraries
import os
import datetime
import json
import yaml
import azure.ai.ml

import pandas as pd

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml import load_component
from azure.ai.ml import automl
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import (
    BatchEndpoint,
    BatchDeployment,
    AmlCompute,
    PipelineComponentBatchDeployment,
)
from azure.ai.ml.entities._job.automl.tabular.forecasting_settings import (
    ForecastingSettings,
)

print(f"SDK version: {azure.ai.ml.__version__}")

## 2.1. Configure workspace details and get a handle to the workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential does not work
    credential = InteractiveBrowserCredential()

In [None]:
try:
    ml_client = MLClient.from_config(credential)
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"
    ml_client = MLClient(credential, subscription_id, resource_group, workspace)
    print(ml_client)

### 2.2. Show Azure ML Workspace information

In [None]:
ws = ml_client.workspaces.get(name=ml_client.workspace_name)

output = {}
output["Workspace"] = ml_client.workspace_name
output["Subscription ID"] = ml_client.connections._subscription_id
output["Resource Group"] = ws.resource_group
output["Location"] = ws.location
pd.DataFrame(data=output, index=[""]).T

## 3. Compute 

#### Create or Attach existing AmlCompute

You will need to create a compute target for your AutoML run. In this tutorial, you will create AmlCompute as your training compute resource.

> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.


Here, we use a 10 node cluster of the `STANDARD_NC6` series for illustration purposes. You will need to adjust the compute type and the number of nodes based on your needs which can be driven by the speed needed for model selection, data size, etc. 

#### Creation of AmlCompute takes approximately 5 minutes. 
If the AmlCompute with that name is already in your workspace, this code will skip the creation process.
As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:
from azure.core.exceptions import ResourceNotFoundError

amlcompute_cluster_name = "demand-fcst-single-model-cluster"

try:
    # Retrieve an already attached Azure Machine Learning Compute.
    compute_target = ml_client.compute.get(amlcompute_cluster_name)
except ResourceNotFoundError as e:
    compute_target = AmlCompute(
        name=amlcompute_cluster_name,
        size="Standard_NC8as_T4_v3",
        type="amlcompute",
        min_instances=0,
        max_instances=10,
        idle_time_before_scale_down=600,
    )
    poller = ml_client.begin_create_or_update(amlcompute_cluster_name)
    poller.wait()

## 4. Data

For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and convereted to the kilowatt hours (kWh) for 10 customers. Each customer is assigned to one of the two groups as denoted by the entries in the `group_id` column. The following cells read and print the first few rows of the training data as well as print the number of unique time series in the dataset.

In [None]:
time_column_name = "datetime"
target_column_name = "usage"
time_series_id_column_names = ["group_id", "customer_id"]

In [None]:
dataset_type = "train"
df = pd.read_parquet(f"./data/{dataset_type}/uci_electro_small_{dataset_type}.parquet")
df.head(3)

In [None]:
nseries = df.groupby(time_series_id_column_names).ngroups
print(f"Data contains {nseries} individual time-series\n---")

In [None]:
df[time_series_id_column_names].drop_duplicates()

In [None]:
# Training MLTable defined locally, with local data to be uploaded
train_dataset = Input(type=AssetTypes.MLTABLE, path="./data/train")
valid_dataset = Input(type=AssetTypes.MLTABLE, path="./data/valid")

However, we will use our test data set from the pipeline run and we will need to upload it to URI directory to be used.

In [None]:
test_dataset = Input(type=AssetTypes.URI_FOLDER, path="./data/test")

## 5. Import Components From Registry

An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body. Components are the building blocks of the Azure Machine Learning pipelines. It's a good engineering practice to build a machine learning pipeline where each step has well-defined inputs and outputs. In Azure Machine Learning, a component represents one reusable step in a pipeline. Components are designed to help improve the productivity of pipeline building. Specifically, components offer:
- Well-defined interface: Components require a well-defined interface (input and output). The interface allows the user to build steps and connect steps easily. The interface also hides the complex logic of a step and removes the burden of understanding how the step is implemented.

- Share and reuse: As the building blocks of a pipeline, components can be easily shared and reused across pipelines, workspaces, and subscriptions. Components built by one team can be discovered and used by another team.

- Version control: Components are versioned. The component producers can keep improving components and publish new versions. Consumers can use specific component versions in their pipelines. This gives them compatibility and reproducibility.

For a more detailed information on this subject, refer to the this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-component?view=azureml-api-2).

To import components,  we need to get the registry. The following command obtains the public regsitry from which we will import components for our experiment.

In [None]:
# get registry for the inference component
ml_client_inference_registry = MLClient(
    credential=credential, registry_name="azureml-preview"
)
print(ml_client_inference_registry)
print("---")

In [None]:
# get registry for the compute metrics component
ml_client_metrics_registry = MLClient(credential=credential, registry_name="azureml")
print(ml_client_metrics_registry, "\n---")

Next, we pull specific components and use them to build a pipeline of steps. For the illustration of the product evaluation workflow we will use the following components:
- Inference component: generates forecast and can be applied to both test and inference sets.
- Compute metrics component: calculates metrics per time series if the inference component was used on a test set.

In [None]:
inference_component = ml_client_inference_registry.components.get(
    name="automl_forecasting_inference", label="latest"
)
print(f"Inference component version: {inference_component.version}\n---")

In [None]:
compute_metrics_component = ml_client_metrics_registry.components.get(
    name="compute_metrics", label="latest"
)
print(f"Compute metrics component version: {compute_metrics_component.version}\n---")

## 6. Create a Pipeline

Now that we imported the components we will build an evaluation pipeline. This pipeline will allow us to train the best model, genererate rolling forecast on the test set, and calculate metrics on the test set output.

### 6.1. Set Pipeline Parameters

AzureML components can only receive specific object types such as strings, JSON/YML files, URI Folders and URI Files. Other object types are not accepted. Because of this, we will create the pipeline by utilizing the `pipeline_parameters` dictionary. Most of the parameters in this dictionary will define the settings for the model training step of the pipeline and the remaining ones will be used in inference and compute metrics components. To have a better understanding of what these settings represents, we will build this dictionary in sequential steps. 

#### 6.1.1. Training Step Parameters

First, we create a set of parameters `automl_settings` which will be used to define the `forecasting()` factory function to kick off the model training stage. Think of this as the bare minimum settings that are necessary to define a forecasting job, and it contains the following properties:


|Property|Description|
|-|-|
| **task**               | forecasting |
| **target_column_name** | The name of the column to target for predictions. It must always be specified. This parameter is applicable to `training_data`, `validation_data` and `test_data`. |
| **primary_metric**     | This is the metric that you want to optimize. Forecasting supports the following primary metrics<ul><li>`normalized_root_mean_squared_error`</li><li>`normalized_mean_absolute_error`</li><li>`spearman_correlation`</li><li>`r2_score`</li></ul> We recommend using either the normalized root mean squared error (default metric) or normalized mean absolute error as a primary metric because they measure forecast accuracy. See the [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-automl-forecasting-faq#how-do-i-choose-the-primary-metric) for a more detailed discussion on this topic. |
| **n_cross_validations** | Number of cross-validation folds to use for model/pipeline selection. This can be set to "auto", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or, users could specify an integer value. The default value is "auto". |

Please note that the `forecasting()` function also requires a training and/or validation data. We will be providing this information as a separate parameter to the pipeline method.

In [None]:
automl_settings = dict(
    task="forecasting",
    target_column_name=target_column_name,
    primary_metric="normalized_root_mean_squared_error",
    n_cross_validations="auto",
)

Next, we define the forecasting specific parameters for the experiment which will be stored in the `forecast_settings` dictionary. Technically, there are only 2 parameters that are necessary for forecasting tasks (`forecast_horizon` and `time_column_name`). For a greater control over the experiment we also list optional parameters that users can set, they are marked with an asterisk $(*)$ in the table below. Feel free to uncomment and set the desired values. See the following [link placeholder]() for the detailed description for each of these parameters.

|Property|Description|
|-|-|
| **time_column_name**               | The name of the time column in the data. |
| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). |
| **time_series_id_column_names***    | The column names used to uniquely identify the time series in data that has multiple rows with the same timestamp. If the time series identifiers are not defined, AutoML will detect them for you. |

In [None]:
forecast_settings = dict(
    forecast_horizon=24,
    time_column_name=time_column_name,
    time_series_id_column_names=time_series_id_column_names,
    # cv_step_size=24,
    # target_lags = None,
    # target_rolling_window_size = None,
    # frequency = None
    # feature_lags = None,
    # seasonality = None,
    # use_stl = None,
    # short_series_handling_config = None,
)

Next, we set parameters to configure limits such as timeouts and store them in the `training_limits` dictionary.

|Property|Description|
|-|-|
| **timeout_minutes**          | Maximum amount of time in minutes that the whole AutoML job can take before the job terminates. This timeout includes setup, featurization and training runs but does not include the ensembling and model explainability runs at the end of the process since those actions need to happen once all the trials (children jobs) are done. If not specified, the default job's total timeout is 6 days (8,640 minutes). To specify a timeout less than or equal to 1 hour (60 minutes), make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results. |
| **trial_timeout_minutes**    | Maximum time in minutes that each trial (child job) can run for before it terminates. If not specified, a value of 1 month or 43200 minutes is used. |
| **max_trials**               | Represents the maximum number of trials an Automated ML job can try to run a training algorithm with different combination of hyperparameters. Its default value is set to 1000. If `enable_early_termination` is defined, then the number of trials used to run training algorithms can be smaller. |
| **max_concurrent_trials**    | The maximum number of trials (children jobs) that would be executed in parallel. It's highly recommended to set the number of concurrent runs to the number of nodes in the cluster (aml compute defined in `compute`). The default value is 1. |
| **max_nodes**                | Maximum number of nodes to use in training. This value should be set only for the distirbuted TCN training. We encourage this value to be a multiple of max_concurrent_iterations. The multiple indicates the number of nodes that will be used by each concurrent iteration. Minimum acceptable value to kick off distributed training is 2. |
| **enable_early_termination** | Represents whether to enable of experiment termination if the loss score doesn't improve after 'x' number of iterations. In an Automated ML job, no early stopping is applied on first 20 iterations. The early stopping window starts only after first 20 iterations. The default value is `True`. |

In [None]:
training_limits = dict(
    timeout_minutes=60,
    trial_timeout_minutes=30,
    max_concurrent_trials=5,
    max_trials=20,
    max_nodes=10,
    enable_early_termination=True,
)

Next, we set parameters to configure training parameters such as enabling DNN and blocking/alowing specific models, and store them in the `training_settings` dictionary.

|Property|Description|
|-|-|
| **enable_dnn_training**          | A flag to turn on or off the inclusion of DNN based models to try out during model selection. The default value is `False`. |
| **training_mode**    |The training mode to use. The possible values are `distributed` and `non_distributed` (default value). When this parameter is set to `distributed` and `enablle_dnn_training=True`, a disitributed TCN run will be kicked off. |
| **allowed_training_algorithms**               | A list of Time Series Forecasting algorithms to try out as base model for model training in an experiment. If it is omitted or set to `None`, then all supported algorithms are used during experiment, except algorithms specified in `blocked_training_algorithms`. The default value is `None`. |
| **blocked_training_algorithms**               | A list of Time Series Forecasting algorithms to not run as base model while model training in an experiment. If it is omitted or set to `None`, then all supported algorithms are used during model training.  The default value is `None`.|
| **enable_model_explainability**    | Represents a flag to turn on model explainability like feature importance, of best model evaluated by Automated ML system. The default value is `True`. |

In [None]:
training_settings = dict(
    enable_dnn_training=True,
    training_mode="distributed",
    allowed_training_algorithms=["TCNForecaster"],
    blocked_training_algorithms=None,
    enable_model_explainability=True,
)

##### 6.1.2. Other Pipeline Parameters
<br>
Next, we set declare the parameters that will be used be the inference and compute metrics components. We will store them in the `component_settings` dictionary.

|Property|Description|
|-|-|
| **compute_name**                | The name of the AML compute infrastructure to execute the job on. |
| **forecast_mode**               | Type of forecat to perform on the test set. Can be `recursive` or `rolling`. Rolling forecast can be used for the evaluation purpose. The default value is `recursive`. |
| **forecast_step**               | The forecast step used for rolling forecast. See the following [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-forecast?view=azureml-api-2#evaluating-model-accuracy-with-a-rolling-forecast) for more details. |
| **is_validation_data_provided** | Set this value to `True` if validation data is provided.  |

In [None]:
component_settings = dict(
    compute_name=amlcompute_cluster_name,
    forecast_mode="rolling",
    forecast_step=24,
    is_validation_data_provided=False,
)

Now that all parameters explained and set, we save them all in the `pipeline_parameters` dictionary which will be used to build the evaluation pipeline.

In [None]:
pipeline_parameters = {
    **automl_settings,
    **forecast_settings,
    **training_settings,
    **training_limits,
    **component_settings,
}
print(json.dumps(pipeline_parameters, indent=4), "\n---")

### 6.2. Build a Pipeline

Next, we build a pipeline from the imported components. Since this notebook is designed to illustrate the evaluation flow, we will string these components in the following fashion. First, we train the best model. Then, we generate a rolling forecast with the step size of 24 (hours) on the test set. This is done to mimic the evaluation process when a customer is tracking model's performance in real time and generates forecasts every 24 hours. Finally, we compute metrics based on the rolling forecast output from the previous step. You do not have to modify anything in the next cell.

In [None]:
@pipeline(
    description="AutoML Forecasting Single Model Evaluation Pipeline",
)
def evaluation_pipeline(training_data, inference_data, validation_data=None):
    # 0. Extract parameters from the dictionary
    target_column_name = pipeline_parameters.get("target_column_name")
    primary_metric = pipeline_parameters.get(
        "primary_metric", "normalized_root_squared_error"
    )
    n_cross_validations = pipeline_parameters.get("n_cross_validations", "auto")

    # -- 0.1 set_forecast_settings
    time_column_name = pipeline_parameters.get("time_column_name")
    time_series_id_column_names = pipeline_parameters.get(
        "time_series_id_column_names", None
    )
    country_or_region_for_holidays = pipeline_parameters.get(
        "country_or_region_for_holidays", None
    )
    cv_step_size = pipeline_parameters.get("cv_step_size", None)
    forecast_horizon = pipeline_parameters.get("forecast_horizon", None)
    target_lags = pipeline_parameters.get("target_lags", None)
    target_rolling_window_size = pipeline_parameters.get(
        "target_rolling_window_size", None
    )
    frequency = pipeline_parameters.get("frequency", None)
    feature_lags = pipeline_parameters.get("feature_lags", None)
    seasonality = pipeline_parameters.get("seasonality", None)
    use_stl = pipeline_parameters.get("use_stl", None)
    short_series_handling_config = pipeline_parameters.get(
        "short_series_handling_config", None
    )

    # -- 0.2 set_training
    enable_dnn_training = pipeline_parameters.get("enable_dnn_training", False)
    training_mode = pipeline_parameters.get("training_mode", None)
    enable_model_explainability = pipeline_parameters.get(
        "enable_model_explainability", True
    )
    enable_stack_ensemble = pipeline_parameters.get("enable_stack_ensemble", False)
    enable_vote_ensemble = pipeline_parameters.get("enable_vote_ensemble", True)
    allowed_training_algorithms = pipeline_parameters.get(
        "allowed_training_algorithms", None
    )
    blocked_training_algorithms = pipeline_parameters.get(
        "blocked_training_algorithms", None
    )

    # -- 0.3 set_limits
    max_concurrent_trials = pipeline_parameters.get("max_concurrent_trials", None)
    max_cores_per_trial = pipeline_parameters.get("max_cores_per_trial", None)
    max_nodes = pipeline_parameters.get("max_nodes", None)
    max_trials = pipeline_parameters.get("max_trials", None)
    timeout_minutes = pipeline_parameters.get("timeout_minutes", None)
    trial_timeout_minutes = pipeline_parameters.get("trial_timeout_minutes", None)

    # -- 0.4 component-specific settings
    compute_name = pipeline_parameters.get("compute_name")
    max_nodes = pipeline_parameters.get("max_nodes", 1)
    forecast_mode = pipeline_parameters.get("forecast_mode", "recursive")
    forecast_step = pipeline_parameters.get("forecast_step", 1)
    forecast_quantiles = pipeline_parameters.get("forecast_quantiles", None)
    is_validation_data_provided = pipeline_parameters.get(
        "is_validation_data_provided", False
    )

    print(f"---\nTraining mode: {training_mode}\n---")
    # 1. Model Training Step
    # -- 1.1 Define the automl forecasting task with automl function
    if is_validation_data_provided:
        print("---\n Validation data pipeline constructor path \n---")
        training_node = automl.forecasting(
            compute=compute_name,
            training_data=training_data,
            validation_data=validation_data,
            target_column_name=target_column_name,
            primary_metric=primary_metric,
            enable_model_explainability=enable_model_explainability,
            outputs={"best_model": Output(type=AssetTypes.CUSTOM_MODEL)},
        )
    else:
        training_node = automl.forecasting(
            compute=compute_name,
            training_data=training_data,
            target_column_name=target_column_name,
            primary_metric=primary_metric,
            n_cross_validations=n_cross_validations,
            enable_model_explainability=enable_model_explainability,
            outputs={"best_model": Output(type=AssetTypes.CUSTOM_MODEL)},
        )

    # --  1.2 Define forecasting settings
    training_node.set_forecast_settings(
        forecast_horizon=forecast_horizon,
        time_column_name=time_column_name,
        time_series_id_column_names=time_series_id_column_names,
        country_or_region_for_holidays=country_or_region_for_holidays,
        cv_step_size=cv_step_size,
        target_lags=target_lags,
        target_rolling_window_size=target_rolling_window_size,
        frequency=frequency,
        feature_lags=feature_lags,
        seasonality=seasonality,
        use_stl=use_stl,
        short_series_handling_config=short_series_handling_config,
    )

    # -- 1.3 Set training parameters
    training_node.set_training(
        enable_dnn_training=enable_dnn_training,
        training_mode=training_mode,
        enable_model_explainability=enable_model_explainability,
        allowed_training_algorithms=allowed_training_algorithms,
    )

    # -- 1.4 Set training limits. All limits are optional.
    training_node.set_limits(
        timeout_minutes=timeout_minutes,
        trial_timeout_minutes=trial_timeout_minutes,
        max_trials=max_trials,
        max_concurrent_trials=max_concurrent_trials,
        max_cores_per_trial=max_cores_per_trial,
        max_nodes=max_nodes,
    )

    # 2. Inferencing step
    inference_node = inference_component(
        test_data=inference_data,
        model_path=training_node.outputs.best_model,
        target_column_name=target_column_name,
        forecast_mode=forecast_mode,
        forecast_step=forecast_step,
        forecast_quantiles=forecast_quantiles,
    )

    # 3. Metrics calculation step
    compute_metrics_node = compute_metrics_component(
        task="tabular-forecasting",
        prediction=inference_node.outputs.inference_output_file,
        ground_truth=inference_node.outputs.inference_output_file,
        evaluation_config=inference_node.outputs.evaluation_config_output_file,
    )
    compute_metrics_node.compute = compute_name

    # 4. Specify pipeline outputs
    return {
        "output_files": compute_metrics_node.outputs.evaluation_result,
        "output_model": training_node.outputs.best_model,
        "forecast_output": inference_node.outputs.inference_output_file,
    }

## 7. Kick Off Pipeline Runs

Now that the pipeline is defined, we will use it to kick off several runs. First, we will kick off an experiment which will train, inference and evaluate the performance for the best AutoML model for each partition. Next, we will kick off the same pipeline which will only use the naive model for the same partitions. This will allow us to establish a baseline and compare performance results.

### 7.1. Kick Off Best Model Pipeline

In [None]:
pipeline_job = evaluation_pipeline(
    training_data=Input(type="uri_folder", path="./data/train"),
    inference_data=Input(type="uri_folder", path="./data/test"),
    validation_data=Input(type="uri_folder", path="./data/valid"),
)
print(pipeline_job)

In [None]:
# set pipeline level compute
pipeline_job.settings.default_compute = amlcompute_cluster_name

In [None]:
experiment_name = "single-model-experiment-" + datetime.datetime.now().strftime(
    "%Y%m%d"
)

pipeline_submitted_job = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name=experiment_name,
)
ml_client.jobs.stream(pipeline_submitted_job.name)

In [None]:
# To rehydrate run
# RUN_ID = "<Paste the PipelineRunId from the output of the previous cell.>"
# pipeline_submitted_job = ml_client.jobs.get(RUN_ID)
# pipeline_submitted_job

### 7.2. Kick Off the Baseline Experiment

To establish a baseline, we will use the same pipeline as before with one minore change. We will add Naive model to the allowed model list and change the number of rolling origin cross validations (ROCV) to 2. Reducing the ROCV speeds up the runtime and is needed for model selection only, while in this run we have only one model. We also turn off distributed training because it is not supported for the non-DNN models.

In [None]:
pipeline_parameters.update(
    {
        "allowed_training_algorithms": ["Naive"],
        "n_cross_validations": 2,
        "enable_dnn_training": False,
        "training_mode": None,
    }
)

In [None]:
pipeline_job_base = evaluation_pipeline(
    training_data=Input(type="uri_folder", path="./data/train"),
    inference_data=Input(type="uri_folder", path="./data/test"),
)
print(pipeline_job_base)

In [None]:
# set pipeline level compute
pipeline_job_base.settings.default_compute = amlcompute_cluster_name

In [None]:
base_experiment_name = (
    "single-model-experiment-base-" + datetime.datetime.now().strftime("%Y%m%d")
)

pipeline_submitted_job_base = ml_client.jobs.create_or_update(
    pipeline_job_base,
    experiment_name=base_experiment_name,
)
ml_client.jobs.stream(pipeline_submitted_job_base.name)

In [None]:
# To rehydrate baseline run
# RUN_ID = "<Paste the PipelineRunId from the output of the previous cell.>"
# pipeline_submitted_job_base = ml_client.jobs.get(RUN_ID)
# pipeline_submitted_job_base

## 8. Download Pipeline Output
Next, we will download the output files generated by the compute metrics components for each executed pipeline and save them in the corresponfing subfolder of the `output` folder. First, we create corresponding output directories. Then, we execute the `ml_client.jobs.download` command which downloads experiments' outputs.

In [None]:
# create output directories
automl_output_dir = os.path.join(os.getcwd(), "output/automl")
base_output_dir = os.path.join(os.getcwd(), "output/base")

os.makedirs(automl_output_dir, exist_ok=True)
os.makedirs(base_output_dir, exist_ok=True)

In [None]:
ml_client.jobs.download(
    name=pipeline_submitted_job.name,
    download_path=automl_output_dir,
    output_name="output_files",
)
ml_client.jobs.download(
    name=pipeline_submitted_job_base.name,
    download_path=base_output_dir,
    output_name="output_files",
)

In [None]:
ml_client.jobs.download(
    name=pipeline_submitted_job.name,
    download_path=automl_output_dir,
    output_name="forecast_output",
)

ml_client.jobs.download(
    name=pipeline_submitted_job_base.name,
    download_path=base_output_dir,
    output_name="forecast_output",
)

Next, we download the best AutoML model. It will be used for the batch deployment in section 10.

In [None]:
ml_client.jobs.download(
    name=pipeline_submitted_job.name,
    download_path=automl_output_dir,
    output_name="output_model",
)

## 9. Compare Evaluation Results

### 9.1. Examine Metrics

In this section, we compare metrics for the 2 pipeline runs to quantify accuracy improvement of AutoML over the baseline model. First, we compare metrics that are calculated for the entire dataset. Since there are 10 unique time series in the test dataset, these individual metrics are aggregated into a single number. The non-normalized metrics can be misleading due to the difference in scales of each unique time series. The following [article (placeholder)](https://review.learn.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml?view=azureml-api-2&branch=pr-en-us-238443#forecasting-metrics-normalization-and-aggregation) explains this topic in a greater detail.

The code in the next cell loads dataset metrics for each of the experiments.

In [None]:
metrics_artifacts_path = os.path.join(
    "named-outputs", "output_files", "evaluationResult"
)

with open(os.path.join(automl_output_dir, metrics_artifacts_path, "metrics.json")) as f:
    metrics_automl_series = json.load(f)

with open(os.path.join(base_output_dir, metrics_artifacts_path, "metrics.json")) as f:
    metrics_base_series = json.load(f)

Next, we merge two dataframes to examine metrics side by side. The `metrics_all` data frame contains two columns which correspond to the scores from the best AutoML and the baseline models, respectively. 

In [None]:
metrics_automl = (
    pd.Series(metrics_automl_series).to_frame(name="score").reset_index(drop=False)
)
metrics_base = (
    pd.Series(metrics_base_series).to_frame(name="score").reset_index(drop=False)
)
metrics_all = pd.DataFrame(
    [metrics_automl_series, metrics_base_series], index=["score_automl", "score_base"]
).T
metrics_all

#### 9.1.1 Detailed Metrics

Next, we will load and examine the detailed accuracy metrics since the aggregate metrics may not convey enough information to make a decision about product accuracy. It may be helpful to examine metrics at a more granular level. We will extract metrics per time series. To do this, we create a helper function `extract_specific_metric` which reads the JSON file and returns a specified metric for each time series. Even though the file contains the following metrics, we will  we will focus on the normalized root mean squared error (NRMSE) accuracy metric for illustration purposes. <ul>
    <li> `explained_variance` </li>
    <li> `mean_absolute_error` </li>
    <li> `mean_absolute_percentage_error`</li>
    <li> `median_absolute_error`</li>
    <li> `normalized_median_absolute_error`</li>
    <li> `normalized_root_mean_squared_error`</li>
    <li> `normalized_root_mean_squared_error`</li>
    <li> `normalized_root_mean_squared_log_error`</li>
    <li> `r2_score`</li>
    <li> `root_mean_squared_log_error`</li>
    <li> `root_mean_squared_error`</li>
    <li> `root_mean_squared_log_error`</li>
</ul>

In [None]:
def extract_specific_metric(path, metric_name):
    with open(path) as f:
        artifact = json.load(f)
    all_metrics = pd.DataFrame(artifact["data"])
    index_scores = ["customer_id"] + [metric_name]
    return all_metrics[index_scores]

In [None]:
metrics_table_relative_path = os.path.join(
    metrics_artifacts_path, "artifacts", "forecast_time_series_id_distribution_table"
)
automl_metric = extract_specific_metric(
    os.path.join(automl_output_dir, metrics_table_relative_path),
    "normalized_root_mean_squared_error",
)

base_metric = extract_specific_metric(
    os.path.join(base_output_dir, metrics_table_relative_path),
    "normalized_root_mean_squared_error",
)

In [None]:
metrics_df = automl_metric.merge(
    base_metric,
    left_index=True,
    right_index=True,
    how="inner",
    suffixes=["_automl", "_base"],
)
metrics_df

### 9.2 Generate Time Series Plots

Here, we generate forecast versus actuals plot for the test set for both the best model and the baseline. Since we use rolling evaluation with the step size of 24 hours, this mimics the behavior of putting both models in production and monitoring their behavior for the duration of the test set. This step helps you make informed decisions about model performance and saves numerous costs associated with productionalizing the model and monitoring its performance in real life. 

In the next block of code, we, load the test set output for each of the runs and merge the data. Then, we generate and save time series plots.

In [None]:
forecast_table_relative_path = os.path.join(
    "named-outputs", "forecast_output", "inference_output_file"
)

forecast_column_name = "automl_predictions"
base_forecast_column_name = "base_predictions"
actual_column_name = "automl_actuals"
forecast_origin_column_name = "automl_forecast_origin"

automl_fcst = pd.read_json(
    os.path.join(automl_output_dir, forecast_table_relative_path),
    orient="records",
    lines=True,
)
automl_fcst[forecast_origin_column_name] = pd.to_datetime(
    automl_fcst[forecast_origin_column_name], unit="ms"
)

base_fcst = pd.read_json(
    os.path.join(base_output_dir, forecast_table_relative_path),
    orient="records",
    lines=True,
)
base_fcst[forecast_origin_column_name] = pd.to_datetime(
    automl_fcst[forecast_origin_column_name], unit="ms"
)

print(automl_fcst.head(3), "\n---")
print(base_fcst.head(3), "\n---")

In [None]:
forecast_table_relative_path = os.path.join(
    "named-outputs", "forecast_output", "inference_output_file"
)

forecast_column_name = "automl_prediction"
base_forecast_column_name = "base_prediction"
actual_column_name = "automl_actual"
forecast_origin_column_name = "automl_forecast_origin"

automl_fcst = pd.read_json(
    os.path.join(automl_output_dir, forecast_table_relative_path),
    orient="records",
    lines=True,
)
automl_fcst[forecast_origin_column_name] = pd.to_datetime(
    automl_fcst[forecast_origin_column_name], unit="ms"
)

base_fcst = pd.read_json(
    os.path.join(base_output_dir, forecast_table_relative_path),
    orient="records",
    lines=True,
)
base_fcst[forecast_origin_column_name] = pd.to_datetime(
    automl_fcst[forecast_origin_column_name], unit="ms"
)

merge_columns = ["customer_id"] + [actual_column_name]
merge_columns.extend([time_column_name, forecast_origin_column_name])

backtest = automl_fcst.merge(
    base_fcst.rename(columns={forecast_column_name: base_forecast_column_name}),
    on=merge_columns,
    how="inner",
)

print(f"AutoML forecast table size: {automl_fcst.shape}\n---")
print(f"Base forecast table size: {base_fcst.shape}\n---")
print(f"Merged forecast table size: {backtest.shape}\n---")
backtest.head()

In [None]:
from scripts.helper_scripts import draw_one_plot
from matplotlib import pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

plot_filename = "forecast_vs_actual.pdf"

pdf = PdfPages(os.path.join(os.getcwd(), "./output", plot_filename))
for _, one_forecast in backtest.groupby("customer_id"):
    one_forecast[time_column_name] = pd.to_datetime(one_forecast[time_column_name])
    one_forecast.sort_values(time_column_name, inplace=True)
    draw_one_plot(
        one_forecast,
        time_column_name,
        target_column_name,
        ["customer_id"],
        [actual_column_name, forecast_column_name, base_forecast_column_name],
        pdf,
        plot_predictions=True,
    )
pdf.close()

In [None]:
from IPython.display import IFrame

IFrame(os.path.join("./output/forecast_vs_actual.pdf"), width=800, height=300)

## 10. Deployment

In this section, we will illustrate how to perfrom batch inference using the inference component.  Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data in asynchronous way. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters and store outputs to a datastore for further analysis. For more information on batch endpoints see this [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-scoring-pipeline?view=azureml-api-2&tabs=python).

### 10.1. Build the pipeline

First, we create a pipeline consisting of one step that invokes the inference componnet. This pipeline takes the inference data set as a parameter, generates a forecast and returns the predictions. The only one named output of this pipeline will be `forecast`. It is a table with predictions, stored in JSONL format. In the current setup, users can generate distribution forecast. To do this, uncomment the `forecast_quantiles` line in the pipeline definition and specify desired quantiles as a string. In the code example below the valuess of 0.1 and 0.9 are entered as `"0.1,0.9"`.

In [None]:
# Define pipeline
@pipeline(
    description="AutoML Inferencing Pipeline",
)
def demand_inference_single_model(
    test_data: Input(type=AssetTypes.MLTABLE),
    model_path: Input(type=AssetTypes.MLFLOW_MODEL),
    target_column_name: Input(type="string"),
    forecast_mode: Input(type="string"),
):
    inference_node = inference_component(
        test_data=test_data,
        model_path=model_path,
        target_column_name=target_column_name,
        forecast_mode=forecast_mode,
        # forecast_quantiles="0.1,0.9"
    )
    return {
        "forecast": inference_node.outputs.inference_output_file,
    }

### 10.2. Create Batch Endpoint

A batch endpoint's name needs to be unique in each region since the name is used to construct the invocation URI. To ensure uniqueness, append any trailing characters to the name specified in the following code.

In [None]:
# # TODO: Delete once components are in the public regitry
# from azure.ai.ml.constants._common import AZUREML_PRIVATE_FEATURES_ENV_VAR

# os.environ[AZUREML_PRIVATE_FEATURES_ENV_VAR] = "False"

In [None]:
import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "sdk-single-model-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}\n---")

In [None]:
endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An endpoint for component deployments",
    properties={"ComponentDeployment.Enabled": True},
)

The following command creates the endpoint in the workspace usign the MLClient created earlier. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

In [None]:
# # query the endpoint URI
# endpoint = ml_client.batch_endpoints.get(name=endpoint_name)
# print(endpoint)

### 10.2. Create the Deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. Our pipeline is defined in a function. To transform it to a component, you'll use the `build()` method. Pipeline components are reusable compute graphs that can be included in batch deployments or used to compose more complex pipelines.

In [None]:
pipeline_component = demand_inference_single_model._pipeline_builder.build()

Now we can define the deployment

In [None]:
deployment = PipelineComponentBatchDeployment(
    name="sdk-single-model-deployment",
    description="A single model deployment.",
    endpoint_name=endpoint.name,
    component=pipeline_component,
    settings={"default_compute": amlcompute_cluster_name},
)

The following command creates the deployment in the workspace usign the MLClient created earlier. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
ml_client.batch_deployments.begin_create_or_update(deployment).result()

### 10.3. Invoke the Endpoint

The next cell contains the command that invokes the endpoint for batch inference job. The `invoke` method contains the `inputs` parameter. This parameter contains the inputs necessary to execute the inference component on the endpoint. To convince yourself this is the case, compare the input parameters for the `inference_component_from_registry` in section 6.3 with the `inputs` we are proving in the next cell. They are identical.

Notice, the the `forecast_mode` is set to `"recursive"`. In the evaluation pipeline this component was used to generate rolling forecast to evaluate model performance on the test set. For more details on rolling evaluation, see our [forecasting model evaluation article](placeholder). Here, we are using it to generate a forecast.

In [None]:
batch_job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name,
    deployment_name=deployment.name,
    inputs={
        "test_data": Input(path=os.path.join(os.getcwd(), "data", "inference")),
        "model_path": Input(
            path=os.path.join(automl_output_dir, "named-outputs", "output_model")
        ),
        "target_column_name": Input(type="string", default=target_column_name),
        "forecast_mode": Input(type="string", default="recursive"),
    },
)

Next, we will stream the job output to monitor the execution.

In [None]:
job_name = batch_job.name
batch_job = ml_client.jobs.get(name=job_name)
print(f"Batch job status: {batch_job.status}\n---")
ml_client.jobs.stream(name=job_name)

### 10.4. Download Forecast Output

Finally, we download the forecast output and print the first few rows.

In [None]:
fcst_output_dir = os.path.join(os.getcwd(), "forecast")

for child in ml_client.jobs.list(parent_job_name=job_name):
    print(f"{child.name}\n---\nDownloading data ...\n---")
    ml_client.jobs.download(
        child.name,
        download_path=fcst_output_dir,
        output_name="inference_output_file",
    )

In [None]:
fcst_df = pd.read_json(
    os.path.join(
        fcst_output_dir,
        "named-outputs",
        "inference_output_file",
        "inference_output_file",
    ),
    orient="records",
    lines=True,
)
fcst_df[time_column_name] = pd.to_datetime(fcst_df[time_column_name], unit="ms")
fcst_df.head()

### 10.5. [Optional] Delete the Endpoint

In [None]:
ml_client.online_endpoints.begin_delete(name=endpoint.name).wait()