Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Automated Machine Learning
**Demand Forecasting Using Many Models**

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Compute](#Compute)
1. [Data](#Data)
1. [Import Components From Registry](#ImportComponents)
1. [Create a Pipeline](#Pipeline)
1. [Kick Off Pipeline Runs](#PipelineRuns)
1. [Download Output](#DownloadOutput)
1. [Compare Evaluation Results](#CompareResults)
1. [Deployment](#Deployment)

## 1. Introduction

The objective of this notebook is to illustrate how to use the component-based AutoML many models solution accelerator for demand forecasting tasks. It walks you through all stages of model evaluation and production process starting with data ingestion and concluding with batch endpoint deployment for production.

We use a subset of UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)) with the objective of predicting electricity demand per consumer 24 hours ahead. The data was preprocessed using the [data prep notebook](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-data-preparation/auto-ml-forecasting-data-preparation.ipynb). Please refer to it for illustration on how to download the data from the source, aggregate to an hourly frequency, convert from wide to long format and upload to the Datastore. Here, we will work with the already uploaded data. 

Having a problem description such as to generate accurate forecasts 24 hours ahead sounds like a relatively straight forward task. However, there are quite a few steps a user needs to take before the model is put in production. A user needs to prepare the data, partition it into appropriate sets, select the best model, evaluate it against a baseline, and monitor the model in real life to collect enough observations on how it would perform had it been put in production. Some of these steps are time consuming, some require certain expertise in writing code. The steps shown in this notebook follow a typical thought process one follows before the model is put in production.

Make sure you have executed the [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) before running this notebook.

## 2. Setup

In [None]:
# Import required libraries
import os
import datetime
import json
import time

import pandas as pd

from azureml.core import Environment
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml import load_component
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import (
    Environment,
    BuildContext,
    Model,
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    CodeConfiguration,
    BatchEndpoint,
    BatchDeployment,
    AmlCompute,
)
from azure.ai.ml.entities._deployment.job_definition import JobDefinition

# print the sdk version - you many want to share this in the issue you will report if parts of this notebook don't work
!pip show azure-ai-ml
os.environ["AZURE_ML_CLI_PRIVATE_FEATURES_ENABLED"] = "true"

## 2.1. Configure workspace details and get a handle to the workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential does not work
    credential = InteractiveBrowserCredential()

In [None]:
try:
    ml_client = MLClient.from_config(credential)
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"
    ml_client = MLClient(credential, subscription_id, resource_group, workspace)
    print(ml_client)

### 2.2. Show Azure ML Workspace information

In [None]:
ws = ml_client.workspaces.get(name=ml_client.workspace_name)

output = {}
output["Workspace"] = ml_client.workspace_name
output["Subscription ID"] = ml_client.connections._subscription_id
output["Resource Group"] = workspace.resource_group
output["Location"] = workspace.location
pd.DataFrame(data=output, index=[""]).T

## 3. Compute 

#### Create or Attach existing AmlCompute

You will need to create a compute target for your AutoML run. In this tutorial, you will create AmlCompute as your training compute resource.

> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.


Here, we use a 5 node cluster of the `STANDARD_DS15_V2` series for illustration purposes. You will need to adjust the compute type and the number of nodes based on your needs which can be driven by the speed needed for model seelction, data size, etc. 

#### Creation of AmlCompute takes approximately 5 minutes. 
If the AmlCompute with that name is already in your workspace, this code will skip the creation process.
As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:
from azure.core.exceptions import ResourceNotFoundError
from azure.ai.ml.entities import AmlCompute

amlcompute_cluster_name = "demand-fcst-mm-cluster"

try:
    # Retrieve an already attached Azure Machine Learning Compute.
    compute_target = ml_client.compute.get(amlcompute_cluster_name)
except ResourceNotFoundError as e:
    compute_target = AmlCompute(
        name=amlcompute_cluster_name,
        size="STANDARD_DS15_V2",
        type="amlcompute",
        min_instances=0,
        max_instances=5,
        idle_time_before_scale_down=600,
    )
    poller = ml_client.begin_create_or_update(compute)
    poller.wait()

## 4. Data

For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contians electricity consumption data for 370 consumers measured at 15 minute intervals. We aggregated the data to an hourly frequency and convereted to the kilowatt hours (kWh) for 10 customers. The following cells read and prints the first few rows of the training data as well as print the number of uique time series in a dataset.

In [None]:
time_column_name = "datetime"
target_column_name = "usage"
time_series_id_column_names = ["customer_id"]

In [None]:
dataset_type = "train"
df = pd.read_parquet(
    f"./data/{dataset_type}/uci_electro_small_mm_{dataset_type}.parquet"
)
df.head(3)

In [None]:
nseries = df.groupby(time_series_id_column_names).ngroups
print(f"Data contains {nseries} individual time-series\n---")

## 5. Import Components From Registry

An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body. Components are the building blocks of the Azure Machine Learning pipelines. It's a good engineering practice to build a machine learning pipeline to split a complete machine learning task into a multi-step workflow. Such that, everyone can work on the specific step independently. In Azure Machine Learning, a component represents one reusable step in a pipeline. Components are designed to help improve the productivity of pipeline building. Specifically, components offer:
- Well-defined interface: Components require a well-defined interface (input and output). The interface allows the user to build steps and connect steps easily. The interface also hides the complex logic of a step and removes the burden of understanding how the step is implemented.

- Share and reuse: As the building blocks of a pipeline, components can be easily shared and reused across pipelines, workspaces, and subscriptions. Components built by one team can be discovered and used by another team.

- Version control: Components are versioned. The component producers can keep improving components and publish new versions. Consumers can use specific component versions in their pipelines. This gives them compatibility and reproducibility.

For a more detailed information on this subject, refer to the this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-component?view=azureml-api-2).

To import components,  we need to get the registry. The following command obtains the public regsitry from which we will import components for our experiment.

In [None]:
# get registry
# TODO: Change registry name once the components are in the public registry
ml_client_registry = MLClient(
    credential=credential, registry_name="ManyModels_HTS_BugBash"
)
print(ml_client_registry)
print("---")

In [None]:
# TODO: delete this cell once the components are in the public registry
preview_registry = "ForecastingDemand2"
preview_registry_ml_client = MLClient(
    credential,
    ml_client.subscription_id,
    "nirovins-southcentralus-rg",
    registry_name=preview_registry,
)
print(preview_registry_ml_client)
print("---")

Next, we pull specific components and use them to build a pipeline of steps. For the illustration of the product evaluation workflow we will use the following components:
- Data partitioning component: allows users to partion the data for many models runs, both, training and inference.
- Many models training component: trains the best model per partition specified by users.
- Many moodels inference componnet: generates forecast for each partition. This can be done on the test and inference sets.
- Compute metrics component: calculates metrics per time series if inference component was used on a test set.

In [None]:
partition_component_from_registry = ml_client_registry.components.get(
    name="automl_solution_accelerators_partition", label="latest"
)
print(
    f"Data partitioning component version: {partition_component_from_registry.version}\n---"
)

In [None]:
train_component_from_registry = ml_client_registry.components.get(
    name="automl_many_model_training",
    label="latest",
)
print(
    f"Many models training component version: {train_component_from_registry.version}\n---"
)

In [None]:
inference_component_from_registry = ml_client_registry.components.get(
    name="automl_many_model_inferencing", label="latest"
)
print(
    f"Many models inference component version: {train_component_from_registry.version}\n---"
)

In [None]:
compute_metrics_component = preview_registry_ml_client.components.get(
    name="compute_metrics", label="latest"
)
print(
    f"Many models inference component version: {compute_metrics_component.version}\n---"
)

## 6. Create a Pipeline

Now, that we imported the components we will build an evaluation pipeline. This pipeline will allow us to partition the data, train best models for each partition, genererate rolling forecasts on the test set, and, finally, calculate metrics on the test set output.

### 6.1. Create a JSON

AzureML components can only receive specific object types such as strings, JSON files, URI Folders and URI Files. Other object types are not accepted. As a result, we need to create a JSON file. It will be passed into the training component, which, in turn, will convert this information to the `AutoMLConfig` object.

The following are the bare-minimum parameters needed to successfully train many models. For a finer control of the experiment a user may add other parameters to the config file. See [AutoMLConfig documentation](https://learn.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py) for a complete list of available parameters. 

|Property|Description|
|-|-|
| **task**                           | forecasting |
| **primary_metric**                 | This is the metric that you want to optimize. Forecasting supports the following primary metrics<ul><li>`normalized_root_mean_squared_error`</li><li>`normalized_mean_absolute_error`</li><li>`spearman_correlation`</li><li>`r2_score`</li></ul> We recommend using either the normalized root mean squared error or normalized mean absolute erorr as a primary metric because they measure forecast accuracy. See the [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-automl-forecasting-faq#how-do-i-choose-the-primary-metric) for a more detailed discussion on this topic. |
| **label_column_name**      | The name of the target column we are trying to predict. |
| **time_column_name**       | The name of your time column. |
| **time_series_id_column_names** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |
| **enable_early_stopping**  | Flag to enable early termination if the primary metric is no longer improving. |
| **partition_column_names** | The names of columns used to group your models. For timeseries, the groups must not split up individual time-series. That is, each group must contain one or more whole time-series. |
| **allow_multi_partitions** | A flag that allows users to train one model per partition when each partition contians more than one unique time series. The dafault value is `Fasle`. |
| **track_child_runs**       | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |
| **enable_early_stopping**  | Flag to enable early termination if the primary metric is no longer improving. |

In [None]:
max_horizon = 24
target_column_name = "usage"
time_column_name = "date"
time_series_id_column_names = ["customer_id"]
partition_column_names = ["customer_id"]
allow_multi_partitions = False
retrain_failed_models = True

In [None]:
# Required parameters
automl_settings = dict(
    task="forecasting",
    primary_metric="normalized_root_mean_squared_error",
    debug_log="debug.txt",
    label_column_name=target_column_name,
    time_column_name=time_column_name,
    forecast_horizon=max_horizon,
    time_series_id_column_names=time_series_id_column_names,
    partition_column_names=partition_column_names,
    track_child_runs=False,
    enable_early_stopping=True,
    allow_multi_partitions=allow_multi_partitions,
)
pd.DataFrame(data=automl_settings, index=[""]).T

Next, we save these settings as a JSON object. This object will be the input for the many models training component.

In [None]:
import json

json_object = json.dumps(automl_settings)

# Writing to sample.json
with open("./automl_settings.json", "w") as outfile:
    outfile.write(json_object)

### 6.2. Provide additional pipeline parameters

The next set of parameters is necessary to build the pipeline of components. These parameters are specific to the many models training and/or inference components. Since both training and inference component rely on the Parallel run step (PRS) to train/inference multiple models at once, you will need to determine the appropriate number of workers and nodes for your use case. The `max_concurrency_per_instance` is based off the number of cores of the compute VM. The `instance_count` will determine the number of master nodes to use, increasing the node count will speed up the training process.

|Property|Description|
|:-|:-|
| **instance_count**                     | The number of compute nodes in a cluster to be used for training and inferencing steps. We recommend to start with 3 and increase the node_count if the training time is taking too long. |
| **max_concurrency_per_instance**   | Process count per node. We recommend a 2:1 ratio for number of cores to the number of processes per node. For example, if a node has 16 cores then configure 8 **or less** process counts per node for optimal performance. |
|**retrain_failed_model**| If training a model for any partition fails, should AutoML kick off a new child run for that partition? Possible values are `True` or `False`.|
|**forecast_mode**| Type of forecat to perform on the test set. Can be `recursive` or `rolling`. Rolling forecast can be used for the evaluation purposes |
| **prs_timeout_seconds**         | Maximum amount of time in seconds that the `ParallelRunStep` class is allowed. This is optional but provides customers with greater control on exit criteria. This must be greater than `experiment_timeout_hours` by at least 300 seconds. |
|**partition_column_names**| The names of columns used to group your models. For timeseries, the groups must not split up individual time-series. That is, each group must contain one or more whole time-series. This parameter is identical to the one in the `automl_config` object.|
|**compute_name**| Name of the compute to execute the pipeline on. |
|**enable_event_logger**| Set this value to `True` to enable event logger. |
| **input_type**               | Type of file format for the input data. Supported options are `csv` and `parquet`. |

In [None]:
# Pipeline parameters
pipeline_parameters = dict(
    instance_count=2,
    max_concurrency_per_instance=5,
    retrain_failed_model=True,
    forecast_mode="rolling",
    prs_timeout_seconds=3700,
    partition_column_names=partition_column_names,
    compute_name=amlcompute_cluster_name,
    enable_event_logger=True,
    input_type="csv",
)
print("Pipeline parameters\n---")
pd.DataFrame(data=pipeline_parameters, index=[""]).T

The data partioning component allows us to partition the data in several ways. For "small" datasets the data can be loaded into the memory of a single node of the compute you plan to run the experiments on. If the data is too large to be partitioned on a single node, we need to use spark cluster for this step. Even though the data we are working with is small, we modified this notebook to handle both sets of scenarios. If you choose to run a spark job, we need to specify a separate set of parameters to the pipeline builder which must incude the following:

|Property|Description|
|:-|:-|
| **instance_type**            | A key that defines the compute instance type to be used for the serverless Spark compute. The following instance types are currently supported:<ul><li>`Standard_E4S_V3`</li><li>`Standard_E8S_V3`</li><li>`Standard_E16S_V3`</li><li>`Standard_E32S_V3`</li><li>`Standard_E64S_V3`</li></ul>|
| **runtime_version**          | A key that defines the Spark runtime version. The following Spark runtime versions are currently supported:<ul><li>`3.1.0`</li><li>`3.2.0`</li></ul> |
| **driver_cores**       | The he number of cores allocated for the Spark driver. |
| **driver_memory**      | The allocated memory for the Spark exedriver, with a size unit suffix `k`, `m`, `g` or `t` (for example, `512m`, `2g`). |
| **executor_cores**     | The number of cores allocated for the Spark executor. |
| **executor_memory**    | The allocated memory for the Spark executor, with a size unit suffix `k`, `m`, `g` or `t` (for example, `512m`, `2g`). |
| **executor_instances** | The number of Spark executor instances|

All of these parameters are described in [this document](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-submit-spark-jobs?view=azureml-api-2&tabs=sdk).

In [None]:
USE_SPARK = False

In [None]:
# Spark parameters (optional)
if USE_SPARK:
    spark_parameters = dict(
        instance_type=2,
        runtime_version="3.2.0",
        driver_cores=1,
        driver_memory="2g",
        executor_cores=2,
        executor_memory="2g",
        executor_instances=2,
    )
    print("Spark parameters\n---")
    pd.DataFrame(data=pipeline_parameters, index=[""]).T

### 6.3. Build a Pipeline


Next, we build the pipeline from the components. Since this notebook is designed to illustrate the evaluation flow, we will string these componenets in the folloiwng fashion. First, we partition the training data. Next, we train the best model for each partition. Then, we generaring a rolling forecast with the step size of 24 (hours) on the test set. Finally, we compute metrics based on the rolling forecast output from the revious step.

In [None]:
@pipeline
def mm_train_inference_components(
    raw_data,
    inference_data,
    automl_config,
    pipeline_parameters,
    spark_parameters={},
    use_spark=USE_SPARK,
):
    # 0. Extract pipeline parameters from the dictionary
    partition_column_names = pipeline_parameters.get("pipeline_parameters")
    compute_name = pipeline_parameters.get("compute_name")
    max_concurrency_per_instance = pipeline_parameters.get(
        "max_concurrency_per_instance"
    )
    prs_step_timeout = pipeline_parameters.get("prs_step_timeout", 3700)
    instance_count = pipeline_parameters.get("instance_count", 1)
    enable_event_logger = pipeline_parameters.get("enable_event_logger", True)
    retrain_failed_model = pipeline_parameters.get("retrain_failed_model", True)
    forecast_mode = pipeline_parameters.get("forecast_mode", "recursive")
    input_type = pipeline_parameters.get("input_type", "csv")

    # 1. Data partitioning step
    partition_step = partition_component_from_registry(
        raw_data=raw_data,
        partition_column_names=partition_column_names,
        input_type=pipeline_parameters.get("input_type", "csv"),
    )

    if use_spark:
        partition_step.resources = {
            "instance_type": spark_parameters.get("instance_type", "Standard_E4S_V3"),
            "runtime_version": str(spark_parameters.get("runtime_version", "3.2.0")),
        }
        partition_step.conf = {
            "spark.driver.cores": spark_parameters.get("driver_cores", 1),
            "spark.driver.memory": str(spark_parameters.get("driver_memory", "2g")),
            "spark.executor.cores": spark_parameters.get("executor_cores", 2),
            "spark.executor.memory": str(spark_parameters.get("executor_memory", "2g")),
            "spark.executor.instances": spark_parameters.get("executor_instances", 2),
        }
        partition_step.outputs.partitioned_data.mode = "direct"

    # 2. Model training step
    mm_train = train_component_from_registry(
        raw_data=partition_step.outputs.partitioned_data,
        automl_config=automl_config,
        max_concurrency_per_instance=max_concurrency_per_instance,
        prs_step_timeout=prs_step_timeout,
        instance_count=instance_count,
        enable_event_logger=enable_event_logger,
        retrain_failed_model=retrain_failed_model,
        compute_name=compute_name,
    )

    # 3. Inferencing step
    mm_inference = inference_component_from_registry(
        raw_data=inference_data,
        enable_event_logger=enable_event_logger,
        instance_count=instance_count,
        max_concurrency_per_instance=max_concurrency_per_instance,
        prs_step_timeout=prs_step_timeout,
        optional_train_metadata=mm_train.outputs.run_output,
        forecast_mode=forecast_mode,
        compute_name=compute_name,
    )

    # 4. Metrics calculation step
    compute_metrics_node = compute_metrics_component(
        task="tabular-forecasting",
        prediction=mm_inference.outputs.evaluation_data,
        ground_truth=mm_inference.outputs.evaluation_data,
        evaluation_config=mm_inference.outputs.evaluation_configs,
    )
    compute_metrics_node.compute = compute_name

    # 5. Specify pipeline outputs
    return {"output_files": compute_metrics_node.outputs.evaluation_result}

## 7. Kick Off Pipeline Runs

Now that the pipline is defined, we will use it to kick off several run. First, we will kick off an experiment which will evaluate the performance for the best AutoML model for each partition. Next, we will kick a pipeline which will only use the naive model for the same partitions. This will allow us to establish a baseline and compare performance results.

### 7.1. Kick Off Best Many Model Pipeline

In [None]:
pipeline_job = mm_train_inference_components(
    raw_data=Input(type="uri_folder", path="./data/train_small"),
    inference_data=Input(type="uri_folder", path="./data/test_small"),
    automl_config=Input(type="uri_file", path="./automl_settings_mm.json"),
    pipeline_parameters=pipeline_parameters,
)
if not USE_SPARK:
    pipeline_job.settings.default_compute = amlcompute_cluster_name
print(pipeline_job)

In [None]:
import datetime
import uuid

experiment_name = "mm-experiment-" + datetime.datetime.now().strftime("%Y%m%d")

pipeline_submitted_job = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name=experiment_name,
    skip_validation=True,
)
ml_client.jobs.stream(pipeline_submitted_job.name)

In [None]:
# # To rehydrate run
# RUN_ID = <Paste the PipelineRunId from the output of the previous cell.>
# pipeline_submitted_job = ml_client.jobs.get(RUN_ID)
# pipeline_submitted_job

### 7.2. Kick Off the Baseline Experiment

To establish a baseline, we will use the same pipeline as before with one minore change. We will add Naive model to the allowed model list and change the number of rolling origin cross validations (ROCV) to 2. Reducting the ROCV speeds up te runtime and has no effect on the accuracy of the model.

In [None]:
baseline_settings = automl_settings.copy()
baseline_settings.update({"allowed_models": "Naive", "n_cross_validations": 2})

Similarly to what we have done in section 6.3, we save the baseline experiment settings to as a JSON object.

In [None]:
json_object = json.dumps(baseline_settings)

# Writing to sample.json
with open("./automl_settings_base.json", "w") as outfile:
    outfile.write(json_object)

In [None]:
pipeline_job_base = mm_train_inference_components(
    raw_data=Input(type="uri_folder", path="./data/train"),
    inference_data=Input(type="uri_folder", path="./data/test"),
    automl_config=Input(type="uri_file", path="./automl_settings_base.json"),
    pipeline_parameters=pipeline_parameters,
)
pipeline_job_base.settings.default_compute = "mm-compute"
print(pipeline_job_base)

In [None]:
pipeline_submitted_job_base = ml_client.jobs.create_or_update(
    pipeline_job_base,
    experiment_name=experiment_name,
    skip_validation=True,
)
ml_client.jobs.stream(pipeline_submitted_job_base.name)

In [None]:
# # To rehydrate baseline run
# RUN_ID = <Paste the PipelineRunId from the output of the previous cell.>
# pipeline_submitted_job_base = ml_client.jobs.get(RUN_ID)
# pipeline_submitted_job_base

## 8. Download Pipeline Output
Next, we will download the output files generated by the compute metrics components for each experiment and save them in the corresponfing subfolder of the `output` folder. First, we create corresponding output directories. Then, we execute the `ml_client.jobs.download` command which downloads experiment outputs into the corresponding folders.

In [None]:
# create output directories
mm_output_dir = os.path.join(os.getcwd(), "output/many-models")
base_output_dir = os.path.join(os.getcwd(), "output/base")

os.makedirs(mm_output_dir, exist_ok=True)
os.makedirs(base_output_dir, exist_ok=True)

In [None]:
ml_client.jobs.download(
    pipeline_submitted_job.name, download_path=mm_output_dir, output_name="output_files"
)
ml_client.jobs.download(
    pipeline_submitted_job_base.name,
    download_path=base_output_dir,
    output_name="output_files",
)

## 9. Compare Evaluation Results

### 9.1. Examine Metrics

In this section, we compare metrics for the 2 pipeline runs to quantify accuracy improvement of AutoML over the baseline model.

In [None]:
artifacts_path = os.path.join("named-outputs", "output_files", "evaluationResult")

with open(os.path.join(mm_output_dir, artifacts_path, "metrics.json")) as f:
    metrics_hts_series = json.load(f)

with open(os.path.join(base_output_dir, artifacts_path, "metrics.json")) as f:
    metrics_base_series = json.load(f)

In [None]:
metrics_hts = (
    pd.Series(metrics_hts_series).to_frame(name="score").reset_index(drop=False)
)
metrics_base = (
    pd.Series(metrics_base_series).to_frame(name="score").reset_index(drop=False)
)

metrics_all = metrics_hts.merge(
    metrics_base, on="index", how="inner", suffixes=["_mm", "_base"]
)
metrics_all

### 9.2 Plot Metrics

- Need Nikolay's changes to be reflected in the component and the environment it uses.

### 9.3. Time Series Plots

- Need Nikolay's changes to be reflected in the component and the environment it uses.

## 10. Deployment

In this section, we will illustrate how to deploy and inference models using batch endpoint. Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data over in asynchronous way. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters and store outputs to a data store for further analysis. For more information on batch endpoints see this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-batch?view=azureml-api-2).

### 10.1. Create Batch Endpoint

In [None]:
# TODO: Delete once components are in the public regitry
import os
from azure.ai.ml.constants._common import AZUREML_PRIVATE_FEATURES_ENV_VAR

os.environ[AZUREML_PRIVATE_FEATURES_ENV_VAR] = "true"

In [None]:
import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "sdk-many-models-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}\n---")

In [None]:
endpoint = BatchEndpoint(
    name=endpoint_name,
    description="A many models endpoint for component deployments",
    properties={"ComponentDeployment.Enabled": True},
)

The following command creates the Endpoint in the workspace usign the MLClient created earlier. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

### 10.2. Create the Deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. You do not have to change anything in the following cell.

In [None]:
deployment = BatchDeployment(
    name="sdk-many-models-deployment",
    description="A many models deployment.",
    endpoint_name=endpoint_name,
    compute=amlcompute_cluster_name,
    job_definition=JobDefinition(
        type="pipeline",
        component=inference_component_from_registry.id,
        settings={"compute": amlcompute_cluster_name},
    ),
)

The following command creates the deployment in the workspace usign the MLClient created earlier. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
ml_client.batch_deployments.begin_create_or_update(deployment).result()

Next, we update the default deployment name in the endpoint

In [None]:
endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

### 10.3. Invoke the Endpoint

The next cell contians the command that invokes the endpoint for batch inference job. The `invoke` method contains the `inputs` parameter. This parameter contains the inputs necessary to execute the inference component on the endpoint. To convince yourself this is the case, compare the input parameters for the `inference_component_from_registry` in section 6.3 with the `inputs` we are proving in the next cell. They are identical.

Notice, the the `forecast_mode` is set to `"recursive"`. In the evaluation pipeline this component was used to generate rolling forecast to evalaute model performance on the test set. Here, we are using it to generate a forecast.

In [None]:
job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name,
    deployment_name=deployment.name,
    inputs={
        "raw_data": Input(type=AssetTypes.URI_FOLDER, path="./data/inference_small"),
        "train_experiment_name": Input(type="string", default=experiment_name),
        "instance_count": Input(type="integer", default=1),
        "max_concurrency_per_instance": Input(type="integer", default=2),
        "compute_name": Input(type="string", default=amlcompute_cluster_name),
        "forecast_mode": Input(type="string", default="recursive"),
        "prs_step_timeout": Input(type="integer", default=3700),
    },
)

Next, we will stream the job output to monitor the execution.

In [None]:
job_name = job.name
batch_job = ml_client.jobs.get(name=job_name)
print(f"Batch job status: {batch_job.status}\n---")
ml_client.jobs.stream(name=job_name)

### 10.4. Download Forecast Output

Next, we download the forecast output and print the first few rows.

In [None]:
ml_client.jobs.download(job_name, download_path=".")

In [None]:
fcst_df = pd.read_csv(output_file, parse_dates=[time_column_name])
fcst_df.head()