Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Automated Machine Learning

## Demand Forecasting Using Many Models (preview)

> [!IMPORTANT]
> Items marked (preview) in this article are currently in public preview.
> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Compute](#Compute)
1. [Data](#Data)
1. [Import Components From Registry](#ImportComponents)
1. [Create a Pipeline](#CreatePipeline)
1. [Kick Off Pipeline Runs](#PipelineRuns)
1. [Download Output](#DownloadOutput)
1. [Compare Evaluation Results](#CompareResults)
1. [Deployment](#Deployment)

## 1. Introduction  <a id="Introduction">

The objective of this notebook is to illustrate how to use the component-based AutoML many models solution for demand forecasting tasks. It walks you through all stages of model evaluation and production process starting with data ingestion and concluding with batch endpoint deployment for production.

We use a subset of UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)) with the objective of predicting electricity demand per consumer 24 hours ahead. The data was preprocessed using the [data prep notebook](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-data-preparation/auto-ml-forecasting-data-preparation.ipynb). Please refer to it for illustration on how to download the data from the source, aggregate to an hourly frequency, convert from wide to long format and upload to the Datastore. Here, we will work with the data that has been pre-processed and saved the public datastore in the csv format.

There are a number of steps you need to take before you can put a model into production. A user needs to prepare the data, partition it into appropriate sets, select the best model, evaluate it against a baseline, and monitor the model in real life to collect enough observations on how it would perform had it been put in production. Some of these steps are time consuming, some require certain expertise in writing code. The steps shown in this notebook follow a typical thought process one follows before the model is put in production.

Make sure you have executed the [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) before running this notebook.

## 2. Setup  <a id="Setup">

In [None]:
# Import required libraries
import os
import datetime
import json
import yaml
import azure.ai.ml

import pandas as pd

from time import sleep

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml import load_component
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import (
    BatchEndpoint,
    BatchDeployment,
    AmlCompute,
    PipelineComponentBatchDeployment,
)

print(f"SDK version: {azure.ai.ml.__version__}")

## 2.1. Configure workspace details and get a handle to the workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential does not work
    credential = InteractiveBrowserCredential()

In [None]:
try:
    ml_client = MLClient.from_config(credential)
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"
    ml_client = MLClient(credential, subscription_id, resource_group, workspace)
    print(ml_client)

### 2.2. Show Azure ML Workspace information

In [None]:
ws = ml_client.workspaces.get(name=ml_client.workspace_name)

output = {}
output["Workspace"] = ml_client.workspace_name
output["Subscription ID"] = ml_client.connections._subscription_id
output["Resource Group"] = ws.resource_group
output["Location"] = ws.location
pd.DataFrame(data=output, index=[""]).T

## 3. Compute  <a id="Compute">

#### Create or Attach existing AmlCompute

You will need to create a compute target for your AutoML run. In this tutorial, you will create AmlCompute as your training compute resource.

> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.


Here, we use a 5 node cluster of the `STANDARD_DS15_V2` series for illustration purposes. You will need to adjust the compute type and the number of nodes based on your needs which can be driven by the speed needed for model selection, data size, etc. 

#### Creation of AmlCompute takes approximately 5 minutes. 
If the AmlCompute with that name is already in your workspace, this code will skip the creation process.
As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:
from azure.core.exceptions import ResourceNotFoundError

amlcompute_cluster_name = "demand-fcst-mm-cluster"

try:
    # Retrieve an already attached Azure Machine Learning Compute.
    compute_target = ml_client.compute.get(amlcompute_cluster_name)
except ResourceNotFoundError as e:
    compute_target = AmlCompute(
        name=amlcompute_cluster_name,
        size="STANDARD_DS15_V2",
        type="amlcompute",
        min_instances=0,
        max_instances=5,
        idle_time_before_scale_down=600,
    )
    poller = ml_client.begin_create_or_update(compute_target)
    poller.wait()

## 4. Data  <a id="Data">

For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and converted to the kilowatt hours (kWh) for 10 customers.

The data for this notebook is located in the `automl-sample-notebook-data` container in the datastore and is publicly available. In the next few cells, we will download the train, test and inference datasets from the public datastore and store them locally in the _parquet_ format.

In [None]:
train_data_path = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/train/uci_electro_small_mm_train.parquet"
test_data_path = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/test/uci_electro_small_mm_test.parquet"
inference_data_path = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/inference/uci_electro_small_mm_inference.parquet"

In [None]:
time_column_name = "datetime"
target_column_name = "usage"
time_series_id_column_names = ["customer_id"]

In [None]:
def create_folder_and_save_as_parquet(file_uri, output_folder):
    os.makedirs(output_folder, exist_ok=True)
    data_frame = pd.read_parquet(file_uri)
    file_name = os.path.split(file_uri)[-1]
    data_path = os.path.join(output_folder, file_name)
    data_frame.to_parquet(data_path, index=False)
    return None


create_folder_and_save_as_parquet(train_data_path, "./data/train")
create_folder_and_save_as_parquet(test_data_path, "./data/test")
create_folder_and_save_as_parquet(inference_data_path, "./data/inference")

The following cells read and print the first few rows of the training data as well as the number of unique time series in the dataset.

In [None]:
df = pd.read_parquet("./data/train/")
df.head(3)

In [None]:
nseries = df.groupby(time_series_id_column_names).ngroups
print(f"Data contains {nseries} individual time series\n---")

## 5. Import Components From Registry  <a id="ImportComponents">

An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body. Components are the building blocks of the Azure Machine Learning pipelines. It's a good engineering practice to build a machine learning pipeline where each step has well-defined inputs and outputs. In Azure Machine Learning, a component represents one reusable step in a pipeline. Components are designed to help improve the productivity of pipeline building. Specifically, components offer:
- Well-defined interface: Components require a well-defined interface (input and output). The interface allows the user to build steps and connect steps easily. The interface also hides the complex logic of a step and removes the burden of understanding how the step is implemented.

- Share and reuse: As the building blocks of a pipeline, components can be easily shared and reused across pipelines, workspaces, and subscriptions. Components built by one team can be discovered and used by another team.

- Version control: Components are versioned. The component producers can keep improving components and publish new versions. Consumers can use specific component versions in their pipelines. This gives them compatibility and reproducibility.

For a more detailed information on this subject, refer to the this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-component?view=azureml-api-2).

To import components,  we need to get the registry. The following command obtains the public regsitry from which we will import components for our experiment.

In [None]:
# get registry for all components
ml_client_registry = MLClient(credential=credential, registry_name="azureml")
print(ml_client_registry)
print("---")

Next, we pull specific components and use them to build a pipeline of steps. For the illustration of the product evaluation workflow we will use the following components:
- Data partitioning component: allows users to partion the data for many models runs, both, training and inference.
- Many models training component: trains the best model per partition specified by users.
- Many moodels inference componnet: generates forecast for each partition. This can be done on the test and inference sets.
- Compute metrics component: calculates metrics per time series if the inference component was used on a test set.

In [None]:
partition_component_from_registry = ml_client_registry.components.get(
    name="automl_tabular_data_partitioning", label="latest"
)
print(
    f"Data partitioning component version: {partition_component_from_registry.version}\n---"
)

In [None]:
train_component_from_registry = ml_client_registry.components.get(
    name="automl_many_models_training",
    label="latest",
)
print(
    f"Many models training component version: {train_component_from_registry.version}\n---"
)

In [None]:
inference_component_from_registry = ml_client_registry.components.get(
    name="automl_many_models_inference", label="latest"
)
print(
    f"Many models inference component version: {train_component_from_registry.version}\n---"
)

In [None]:
compute_metrics_component = ml_client_registry.components.get(
    name="compute_metrics",
    version="0.0.10",  # label="latest"
)
print(
    f"Many models inference component version: {compute_metrics_component.version}\n---"
)

## 6. Create a Pipeline  <a id="CreatePipeline">

Now that we imported the components we will build an evaluation pipeline. This pipeline will allow us to partition the data, train best models for each partition, genererate rolling forecasts on the test set, and, finally, calculate metrics on the test set output.

### 6.1. Create a YML

AzureML components can only receive specific object types such as strings, JSON/YML files, URI Folders and URI Files. Other object types are not accepted. Because of this, the settings needs to be passed into the training component in YML format.

The following are the bare-minimum parameters needed to successfully train many models. For a finer control of the experiment a user may add other parameters to the config file. See the [forecast settings API doc](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.automl.forecastingjob#azure-ai-ml-automl-forecastingjob-set-forecast-settings) for a complete list of available parameters. 

|Property|Description|
|-|-|
| **task**                           | forecasting |
| **primary_metric**                 | This is the metric that you want to optimize. Forecasting supports the following primary metrics<ul><li>`normalized_root_mean_squared_error`</li><li>`normalized_mean_absolute_error`</li><li>`spearman_correlation`</li><li>`r2_score`</li></ul> We recommend using either the normalized root mean squared error or normalized mean absolute erorr as a primary metric because they measure forecast accuracy. See the [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-automl-forecasting-faq#how-do-i-choose-the-primary-metric) for a more detailed discussion on this topic. |
| **forecast_horizon**       | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). |
| **label_column_name**      | The name of the target column we are trying to predict. |
| **time_column_name**       | The name of your time column. |
| **time_series_id_column_names** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |
| **enable_early_stopping**  | Flag to enable early termination if the primary metric is no longer improving. |
| **partition_column_names** | The names of columns used to group your models. For timeseries, the groups must not split up individual time-series. That is, each group must contain one or more whole time-series. |
| **allow_multi_partitions** | A flag that allows users to train one model per partition when each partition contians more than one unique time series. The dafault value is `False`. |
| **track_child_runs**       | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |
| **enable_early_stopping**  | Flag to enable early termination if the primary metric is no longer improving. |
| **max_trials** | Represents the maximum number of trials an Automated ML job can try to run a training algorithm with different combination of hyperparameters. Its default value is set to 1000. If `enable_early_stopping` is defined, then the number of trials used to run training algorithms can be smaller.|
| **timeout_minutes** | Maximum amount of time in minutes that the whole AutoML job can take before the job terminates. This timeout includes setup, featurization and training runs but does not include the ensembling and model explainability runs at the end of the process since those actions need to happen once all the trials (children jobs) are done. If not specified, the default job's total timeout is 6 days (8,640 minutes). To specify a timeout less than or equal to 1 hour (60 minutes), make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results. |
| **trial_timeout_minutes**  | Maximum time in minutes that each trial (child job) can run for before it terminates. If not specified, a value of 1 month or 43200 minutes is used. |

In [None]:
max_horizon = 24
partition_column_names = ["customer_id"]
allow_multi_partitions = False

In [None]:
# Required parameters
automl_settings = dict(
    task="forecasting",
    primary_metric="normalized_root_mean_squared_error",
    debug_log="debug.txt",
    label_column_name=target_column_name,
    time_column_name=time_column_name,
    forecast_horizon=max_horizon,
    time_series_id_column_names=time_series_id_column_names,
    partition_column_names=partition_column_names,
    max_trials=25,
    timeout_minutes=60,
    trial_timeout_minutes=5,
    n_cross_validations=2,
    forecast_step=max_horizon,
    track_child_runs=False,
    enable_early_stopping=True,
    allow_multi_partitions=allow_multi_partitions,
)
pd.DataFrame(data=automl_settings, index=[""]).T

Next, we save these settings as the `automl_settings.yml` file.

In [None]:
with open("automl_settings.yml", "w") as file:
    yaml.dump(automl_settings, file)

### 6.2. Provide additional pipeline parameters

The next set of parameters is necessary to build the pipeline of components. These parameters are specific to the many models training and/or inference components. Since both of these components rely on the Parallel run step (PRS) to train/inference multiple models at once, you will need to determine the appropriate number of workers and nodes for your use case. The `max_concurrency_per_node` is based off the number of cores of the compute VM. The `max_nodes` will determine the number of nodes to use, increasing the node count will speed up the training process.

|Property|Description|
|:-|:-|
| **max_nodes**                     | The number of compute nodes in a cluster to be used for training and inferencing steps. We recommend to start with 3 and increase the node_count if the training time is taking too long. |
| **max_concurrency_per_node**   | Process count per node. We recommend a 2:1 ratio for number of cores to the number of processes per node. For example, if a node has 16 cores then configure 8 **or less** process counts per node for optimal performance. |
|**retrain_failed_model**| If training a model for any partition fails, should AutoML kick off a new child run for that partition? Possible values are `True` or `False`.|
|**forecast_mode**| Type of forecat to perform on the test set. Can be `recursive` or `rolling`. Rolling forecast can be used for the evaluation purposes. |
|**forecast_step**| The forecast step used for rolling forecast. See this [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-forecast?view=azureml-api-2#evaluating-model-accuracy-with-a-rolling-forecast) for more details on this parameter. |
| **parallel_step_timeout_in_seconds**         | Maximum amount of time in seconds that the `ParallelRunStep` class is allowed. This is optional but provides customers with greater control on exit criteria. This must be greater than `experiment_timeout_hours` by at least 300 seconds. |
|**partition_column_names**| The names of columns used to group your models. For timeseries, the groups must not split up individual time-series. That is, each group must contain one or more whole time-series. This parameter is identical to the one in the `automl_config` object.|
|**compute_name**| Name of the compute to execute the pipeline on. |
|**enable_event_logger**| Set this value to `True` to enable event logger. |
| **input_type**               | Type of file format for the input data. Supported options are `csv` and `parquet`. |

In [None]:
# Pipeline parameters
pipeline_parameters = dict(
    max_nodes=1,
    max_concurrency_per_node=10,
    retrain_failed_model=True,
    forecast_mode="rolling",
    forecast_step=max_horizon,
    parallel_step_timeout_in_seconds=3700,
    partition_column_names=partition_column_names,
    compute_name=amlcompute_cluster_name,
    enable_event_logger=True,
    input_type="parquet",
)
print("Pipeline parameters\n---")
display(pd.DataFrame(data=pipeline_parameters, index=[""]).T)

The data partitioning component allows us to partition the large datasets and should be used when the data is too large to be partitioned on a single node. If your dataset is small relative to the RAM of a single node in your cluster, you most likely do not need this component since both training and inference components perform data partitionining as part of their internal work flow. The partitioning that takes place inside these components is done on a single node of the cluster the pipeline is executed on. 

If the data is too big to be handled internally (we are talking about size that are large than 2GB of data and the compute RAM of 28GB or less), you may want to use the partitioning component which uses spark cluster for the job. Since the data we are working with is is not big, we do not need the partitioning component. However, this notebook is written to handle both scenarios. If you choose to run a spark job, you need to specify a separate set of parameters to the pipeline builder which must incude the following:

|Property|Description|
|:-|:-|
| **instance_type**            | A key that defines the compute instance type to be used for the serverless Spark compute. The following instance types are currently supported:<ul><li>`Standard_E4S_V3`</li><li>`Standard_E8S_V3`</li><li>`Standard_E16S_V3`</li><li>`Standard_E32S_V3`</li><li>`Standard_E64S_V3`</li></ul>|
| **runtime_version**          | A key that defines the Spark runtime version. The following Spark runtime versions are currently supported:<ul><li>`3.1.0`</li><li>`3.2.0`</li></ul> |
| **driver_cores**       | The he number of cores allocated for the Spark driver. |
| **driver_memory**      | The allocated memory for the Spark exedriver, with a size unit suffix `k`, `m`, `g` or `t` (for example, `512m`, `2g`). |
| **executor_cores**     | The number of cores allocated for the Spark executor. |
| **executor_memory**    | The allocated memory for the Spark executor, with a size unit suffix `k`, `m`, `g` or `t` (for example, `512m`, `2g`). |
| **executor_instances** | The number of Spark executor instances|

All of these parameters are described in [this document](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-submit-spark-jobs?view=azureml-api-2&tabs=sdk). In this notebook we are using the serverless spark cluster, so we may not have the attached spark workspace.

To distinguish whether the pipeline uses the partitioning component, we use the `USE_PARTITIONING_COMPONENT` parameter. When it is set to `True`, partitioning component is added the pipeline in section 6.3. Since the dataset we are working with is small, there is no need for this component, so we set the parameter value to `False` in the next cell.

In [None]:
USE_PARTITIONING_COMPONENT = False

In [None]:
from IPython.display import display

# Spark parameters (optional)
if USE_PARTITIONING_COMPONENT:
    spark_parameters = dict(
        instance_type="Standard_E4S_V3",
        runtime_version="3.2.0",
        driver_cores=1,
        driver_memory="2g",
        executor_cores=2,
        executor_memory="2g",
        executor_instances=2,
    )
    print("Spark parameters\n---")
    display(pd.DataFrame(data=spark_parameters, index=[""]).T)

### 6.3. Build a Pipeline

Next, we build a pipeline from the imported components. Since this notebook is designed to illustrate the evaluation flow, we will string these components in the following fashion. First, we train the best model for each partition. Then, we generate a rolling forecast with the step size of 24 (hours) on the test set. This is done to mimic the evaluation process when a customer is tracking model's performance in real time and generates forecasts every 24 hours. Finally, we compute metrics based on the rolling forecast output from the previous step. You do not have to modify anything in the next cell.

In [None]:
@pipeline(description="AutoML Forecasting Many Models Evaluation Pipeline")
def evaluation_pipeline(raw_data, inference_data, automl_config):
    # 0. Extract pipeline parameters from the dictionary
    partition_column_names = " ".join(pipeline_parameters.get("partition_column_names"))
    compute_name = pipeline_parameters.get("compute_name")
    max_concurrency_per_node = pipeline_parameters.get("max_concurrency_per_node")
    parallel_step_timeout_in_seconds = pipeline_parameters.get(
        "parallel_step_timeout_in_seconds", 3700
    )
    max_nodes = pipeline_parameters.get("max_nodes", 1)
    enable_event_logger = pipeline_parameters.get("enable_event_logger", True)
    retrain_failed_model = pipeline_parameters.get("retrain_failed_model", True)
    forecast_mode = pipeline_parameters.get("forecast_mode", "recursive")
    forecast_step = pipeline_parameters.get("forecast_step", 1)
    input_type = pipeline_parameters.get("input_type", "csv")

    if USE_PARTITIONING_COMPONENT:
        # 1. Data partitioning step
        partition_step = partition_component_from_registry(
            raw_data=raw_data,
            partition_column_names=partition_column_names,
            input_type=pipeline_parameters.get("input_type", "csv"),
        )

        partition_step.resources = {
            "instance_type": spark_parameters.get("instance_type", "Standard_E4S_V3"),
            "runtime_version": str(spark_parameters.get("runtime_version", "3.2.0")),
        }
        partition_step.conf = {
            "spark.driver.cores": spark_parameters.get("driver_cores", 1),
            "spark.driver.memory": str(spark_parameters.get("driver_memory", "2g")),
            "spark.executor.cores": spark_parameters.get("executor_cores", 2),
            "spark.executor.memory": str(spark_parameters.get("executor_memory", "2g")),
            "spark.executor.instances": spark_parameters.get("executor_instances", 2),
        }
        partition_step.outputs.partitioned_data.mode = "direct"

        # 2. Model training step
        training_node = train_component_from_registry(
            raw_data=partition_step.outputs.partitioned_data,
            automl_config=automl_config,
            max_concurrency_per_node=max_concurrency_per_node,
            parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
            max_nodes=max_nodes,
            retrain_failed_model=retrain_failed_model,
            compute_name=compute_name,
        )
    else:
        # 2. Model training step
        training_node = train_component_from_registry(
            raw_data=raw_data,
            automl_config=automl_config,
            max_concurrency_per_node=max_concurrency_per_node,
            parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
            max_nodes=max_nodes,
            retrain_failed_model=retrain_failed_model,
            compute_name=compute_name,
        )

    # 3. Inferencing step
    inference_node = inference_component_from_registry(
        raw_data=inference_data,
        max_nodes=max_nodes,
        max_concurrency_per_node=max_concurrency_per_node,
        parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
        optional_train_metadata=training_node.outputs.run_output,
        forecast_mode=forecast_mode,
        forecast_step=forecast_step,
        compute_name=compute_name,
    )

    # 4. Metrics calculation step
    compute_metrics_node = compute_metrics_component(
        task="tabular-forecasting",
        prediction=inference_node.outputs.evaluation_data,
        ground_truth=inference_node.outputs.evaluation_data,
        evaluation_config=inference_node.outputs.evaluation_configs,
    )
    compute_metrics_node.compute = compute_name

    # 5. Specify pipeline outputs
    return {
        "output_files": compute_metrics_node.outputs.evaluation_result,
        "forecast_output": inference_node.outputs.raw_predictions,
    }

## 7. Kick Off Pipeline Runs  <a id="PipelineRuns">

Now that the pipeline is defined, we will use it to kick off several runs. First, we will kick off an experiment which will train, inference and evaluate the performance for the best AutoML model for each partition. Next, we will kick off the same pipeline which will only use the naive model for the same partitions. This will allow us to establish a baseline and compare performance results.

### 7.1. Kick Off Best Many Model Pipeline

In [None]:
pipeline_job = evaluation_pipeline(
    raw_data=Input(type=AssetTypes.URI_FOLDER, path="./data/train"),
    inference_data=Input(type=AssetTypes.URI_FOLDER, path="./data/test"),
    automl_config=Input(type=AssetTypes.URI_FILE, path="./automl_settings.yml"),
)
if not USE_PARTITIONING_COMPONENT:
    pipeline_job.settings.default_compute = amlcompute_cluster_name
print(pipeline_job)

In [None]:
evaluation_experiment_name = "mm-experiment-" + datetime.datetime.now().strftime(
    "%Y%m%d"
)

pipeline_submitted_job = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name=evaluation_experiment_name,
    skip_validation=True,
)
ml_client.jobs.stream(pipeline_submitted_job.name)

In [None]:
# To rehydrate run
# RUN_ID = "<Paste the PipelineRunId from the output of the previous cell.>"
# pipeline_submitted_job = ml_client.jobs.get(RUN_ID)
# pipeline_submitted_job

### 7.2. Kick Off the Baseline Experiment

To establish a baseline, we will use the same pipeline as before with one minore change. We will add Naive model to the allowed model list and change the number of rolling origin cross validations (ROCV) to 2. Reducing the ROCV speeds up the runtime and is needed for model selection only, while in this run we have only one model.

In [None]:
baseline_settings = automl_settings.copy()
baseline_settings.update(
    {"allowed_training_algorithms": ["Naive"], "n_cross_validations": 2}
)

Similarly to what we have done in section 6.3, we save the baseline experiment settings to as a YAML file.

In [None]:
with open("automl_settings_base.yml", "w") as file:
    yaml.dump(baseline_settings, file)

In [None]:
pipeline_job_base = evaluation_pipeline(
    raw_data=Input(type=AssetTypes.URI_FOLDER, path="./data/train"),
    inference_data=Input(type=AssetTypes.URI_FOLDER, path="./data/test"),
    automl_config=Input(type="uri_file", path="./automl_settings_base.yml"),
)
if not USE_PARTITIONING_COMPONENT:
    pipeline_job_base.settings.default_compute = amlcompute_cluster_name
print(pipeline_job_base)

In [None]:
base_experiment_name = "mm-experiment-base-" + datetime.datetime.now().strftime(
    "%Y%m%d"
)

pipeline_submitted_job_base = ml_client.jobs.create_or_update(
    pipeline_job_base,
    experiment_name=base_experiment_name,
    skip_validation=True,
)
ml_client.jobs.stream(pipeline_submitted_job_base.name)

In [None]:
# To rehydrate baseline run
# RUN_ID = "<Paste the PipelineRunId from the output of the previous cell.>"
# pipeline_submitted_job_base = ml_client.jobs.get(RUN_ID)
# pipeline_submitted_job_base

## 8. Download Pipeline Output  <a id="DownloadOutput">
Next, we will download the output files generated by the compute metrics components for each executed pipeline and save them in the corresponfing subfolder of the `output` folder. First, we create corresponding output directories. Then, we execute the `ml_client.jobs.download` command which downloads experiments' outputs.

In [None]:
# create output directories
mm_output_dir = os.path.join(os.getcwd(), "output/many-models")
base_output_dir = os.path.join(os.getcwd(), "output/base")

os.makedirs(mm_output_dir, exist_ok=True)
os.makedirs(base_output_dir, exist_ok=True)

In [None]:
ml_client.jobs.download(
    pipeline_submitted_job.name, download_path=mm_output_dir, output_name="output_files"
)
ml_client.jobs.download(
    pipeline_submitted_job_base.name,
    download_path=base_output_dir,
    output_name="output_files",
)

In [None]:
ml_client.jobs.download(
    pipeline_submitted_job.name,
    download_path=mm_output_dir,
    output_name="forecast_output",
)

ml_client.jobs.download(
    pipeline_submitted_job_base.name,
    download_path=base_output_dir,
    output_name="forecast_output",
)

## 9. Compare Evaluation Results  <a id="CompareResults">

### 9.1. Examine Metrics

In this section, we compare metrics for the 2 pipeline runs to quantify accuracy improvement of AutoML over the baseline model. First, we compare metrics that are calculated for the entire dataset. Since there are 10 unique time series in the test dataset, these individual metrics are aggregated into a single number. The non-normalized metrics can be misleading due to the difference in scales of each unique time series. The following [article (placeholder)](https://review.learn.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml?view=azureml-api-2&branch=pr-en-us-238443#forecasting-metrics-normalization-and-aggregation) explains this topic in a greater detail.

The code in the next cell loads dataset metrics for each of the experiments.

In [None]:
metrics_artifacts_path = os.path.join(
    "named-outputs", "output_files", "evaluationResult"
)

with open(os.path.join(mm_output_dir, metrics_artifacts_path, "metrics.json")) as f:
    metrics_automl_series = json.load(f)

with open(os.path.join(base_output_dir, metrics_artifacts_path, "metrics.json")) as f:
    metrics_base_series = json.load(f)

Next, we merge two dataframes to examine metrics side by side. The `metrics_all` data frame contains two columns which correspond to the scores from the many models and the baseline experiments, respectively. 

In [None]:
metrics_automl = (
    pd.Series(metrics_automl_series).to_frame(name="score").reset_index(drop=False)
)
metrics_base = (
    pd.Series(metrics_base_series).to_frame(name="score").reset_index(drop=False)
)
metrics_all = pd.DataFrame(
    [metrics_automl_series, metrics_base_series], index=["score_automl", "score_base"]
).T
metrics_all

#### 9.1.1 Detailed Metrics

Next, we will load and examine the detailed accuracy metrics since the aggregate metrics may not convey enough information to make a decision about product accuracy. It may be helpful to examine metrics at a more granular level. We will extract metrics per time series. To do this, we create a helper function `extract_specific_metric` which reads the JSON file and returns a specified metric for each time series. Even though the file contains the following metrics, we will  we will focus on the normalized root mean squared error (NRMSE) accuracy metric for illustration purposes. <ul>
    <li> `explained_variance` </li>
    <li> `mean_absolute_error` </li>
    <li> `mean_absolute_percentage_error`</li>
    <li> `median_absolute_error`</li>
    <li> `normalized_median_absolute_error`</li>
    <li> `normalized_root_mean_squared_error`</li>
    <li> `normalized_root_mean_squared_error`</li>
    <li> `normalized_root_mean_squared_log_error`</li>
    <li> `r2_score`</li>
    <li> `root_mean_squared_log_error`</li>
    <li> `root_mean_squared_error`</li>
    <li> `root_mean_squared_log_error`</li>
</ul>

In [None]:
def extract_specific_metric(path, metric_name):
    with open(path) as f:
        artifact = json.load(f)
    all_metrics = pd.DataFrame(artifact["data"])
    index_scores = time_series_id_column_names + [metric_name]
    return all_metrics[index_scores]

In [None]:
metrics_table_relative_path = os.path.join(
    metrics_artifacts_path, "artifacts", "forecast_time_series_id_distribution_table"
)
automl_metric = extract_specific_metric(
    os.path.join(mm_output_dir, metrics_table_relative_path),
    "normalized_root_mean_squared_error",
)

base_metric = extract_specific_metric(
    os.path.join(base_output_dir, metrics_table_relative_path),
    "normalized_root_mean_squared_error",
)

In [None]:
metrics_df = automl_metric.merge(
    base_metric,
    left_index=True,
    right_index=True,
    how="inner",
    suffixes=["_automl", "_base"],
)
metrics_df

### 9.2 Generate Time Series Plots

Here, we generate forecast versus actuals plot for the test set for both the best many models and the baseline. Since we use rolling evaluation with the step size of 24 hours, this mimics the behavior of putting both models in production and monitoring their behavior for the duration of the test set. This step helps you make informed decisions about model performance and saves numerous costs associated with productionalizing the model and monitoring its performance in real life. 

In the next block of code, we, load the test set output for each of the runs and merge the data. Then, we generate and save time series plots.

In [None]:
forecast_table_relative_path = os.path.join("named-outputs", "forecast_output")

forecast_column_name = "automl_prediction"
base_forecast_column_name = "base_predictions"
actual_column_name = "automl_actual"
forecast_origin_column_name = "forecast_origin"

automl_fcst = pd.read_parquet(os.path.join(mm_output_dir, forecast_table_relative_path))
base_fcst = pd.read_parquet(os.path.join(base_output_dir, forecast_table_relative_path))

merge_columns = time_series_id_column_names + [actual_column_name]
merge_columns.extend([time_column_name, forecast_origin_column_name])

backtest = automl_fcst.merge(
    base_fcst.rename(columns={forecast_column_name: base_forecast_column_name}),
    on=merge_columns,
    how="inner",
)

print(f"AutoML forecast table size: {automl_fcst.shape}\n---")
print(f"Base forecast table size: {base_fcst.shape}\n---")
print(f"Merged forecast table size: {backtest.shape}\n---")
backtest.head()

In [None]:
from scripts.helper_scripts import draw_one_plot
from matplotlib import pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

plot_filename = "forecast_vs_actual.pdf"

pdf = PdfPages(os.path.join(os.getcwd(), "./output", plot_filename))
for _, one_forecast in backtest.groupby(time_series_id_column_names):
    one_forecast[time_column_name] = pd.to_datetime(one_forecast[time_column_name])
    one_forecast.sort_values(time_column_name, inplace=True)
    draw_one_plot(
        one_forecast,
        time_column_name,
        target_column_name,
        time_series_id_column_names,
        [actual_column_name, forecast_column_name, base_forecast_column_name],
        pdf,
        plot_predictions=True,
    )
pdf.close()

In [None]:
from IPython.display import IFrame

IFrame(os.path.join("./output/forecast_vs_actual.pdf"), width=800, height=300)

## 10. Deployment  <a id="Deployment">

In this section, we will illustrate how to deploy and inference models using batch endpoint. Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data in asynchronous way. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters and store outputs to a datastore for further analysis. For more information on batch endpoints see this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-batch?view=azureml-api-2).

### 10.1. Create Batch Endpoint

In [None]:
import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "sdk-many-models-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}\n---")

In [None]:
endpoint = BatchEndpoint(
    name=endpoint_name,
    description="A many models endpoint for component deployments",
    properties={"ComponentDeployment.Enabled": True},
)

The following command creates the Endpoint in the workspace usign the MLClient created earlier. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

### 10.2. Create the Deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing.

In [None]:
deployment = PipelineComponentBatchDeployment(
    name="sdk-many-models-deployment",
    description="A many models deployment.",
    endpoint_name=endpoint_name,
    component=inference_component_from_registry.id,
    settings={"default_compute": amlcompute_cluster_name},
)

The following command creates the deployment in the workspace usign the MLClient created earlier. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
ml_client.batch_deployments.begin_create_or_update(deployment).result()

### 10.3. Invoke the Endpoint

The next cell contians the command that invokes the endpoint for batch inference job. The `invoke` method contains the `inputs` parameter. This parameter contains the inputs necessary to execute the inference component on the endpoint. To convince yourself this is the case, compare the input parameters for the `inference_component_from_registry` in section 6.3 with the `inputs` we are proving in the next cell. They are identical.

Notice, the the `forecast_mode` is set to `"recursive"`. In the evaluation pipeline this component was used to generate rolling forecast to evalaute model performance on the test set. For more details on rolling evaluation, see our [forecasting model evaluation article](placeholder). Here, we are using it to generate a forecast.

In [None]:
job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name,
    deployment_name=deployment.name,
    inputs={
        "raw_data": Input(type=AssetTypes.URI_FOLDER, path="./data/inference"),
        "training_experiment_name": Input(
            type="string", default=evaluation_experiment_name
        ),
        "max_nodes": Input(type="integer", default=1),
        "max_concurrency_per_node": Input(type="integer", default=5),
        "compute_name": Input(type="string", default=amlcompute_cluster_name),
        "forecast_mode": Input(type="string", default="recursive"),
        "parallel_step_timeout_in_seconds": Input(type="integer", default=3700),
    },
)

Next, we will stream the job output to monitor the execution.

In [None]:
job_name = job.name
batch_job = ml_client.jobs.get(name=job_name)
print(f"Batch job status: {batch_job.status}\n---")
ml_client.jobs.stream(name=job_name)

### 10.4. Download Forecast Output

Finally, we download the forecast output and print the first few rows.

In [None]:
# Check that the run result was posted on Azure
final_state = ("Completed", "Failed")
while (
    batch_job.status not in final_state
    or len(list(ml_client.jobs.list(parent_job_name=batch_job.name))) < 3
):
    print("The runs were not posted... Re-trying in 10 seconds.\n---")
    batch_job = ml_client.jobs.get(name=job_name)
    sleep(10)

In [None]:
fcst_output_dir = os.path.join(os.getcwd(), "forecast")

for child in ml_client.jobs.list(parent_job_name=job.name):
    print(f"{child.name}\n---")
    if (
        child.properties["azureml.moduleName"]
        == "automl_many_models_inference_collect_step"
    ):
        print("Downloading data ...\n---")
        for attempt in range(3):
            print(f"Attempt: {attempt}")
            try:
                ml_client.jobs.download(
                    child.name, download_path=fcst_output_dir, output_name="metadata"
                )
                break
            except BaseException:
                sleep(10)

In [None]:
fcst_df = pd.read_parquet(
    os.path.join(fcst_output_dir, "named-outputs", "metadata", "raw_forecast")
)
fcst_df.head()

### 10.5. [Optional] Delete the Endpoint

In [None]:
ml_client.online_endpoints.begin_delete(name=endpoint.name).wait()