## Setup

For this tutorial, you can re-use the environment defined in the [previous tutorial](../1_mnist-pytorch-lit/mnist-pytorch-lit.ipynb).

## Prerequesites
1. An Azure subscription. If you don't have on, you can [create a free account](https://aka.ms/AMLFree).
2. An active Azure Resource Group. Create one [here](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal) or re-use an existing resource group.
3. A working Azure ML Workspace. Create one [here](https://learn.microsoft.com/en-us/azure/machine-learning/concept-workspace#create-a-workspace) or re-use an existing workspace.
4. **An active account with at least 20 cores**. Follow the instructions [here](https://learn.microsoft.com/en-us/azure/quotas/per-vm-quota-requests) to increase your quota.
5. A CPU compute environment.
6. A GPU compute environment.

## Introduction

In the previous tutorial, we trained a MNIST classifier using Azure Python SDK. One bottleneck was the wait time between jobs. More specifically, we had to wait for the data to upload before training the classifier. Azure can help automate this process using pipelines. We can set different steps for the different jobs that use their own compute environments. For instance, we can re-use our CPU cluster to upload the data, train the model in a distributed manner on a multi-gpu cluster and finally test it on the less expensive CPU cluster.

## 1. Connect to your Azure Workspace

Switching between workspaces or resource groups during an active session can cause authentification issues. The following cell attempts to load a token using the default authentification and falls back to the browser option if the former fails.

In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    cred = DefaultAzureCredential()
    # Check if given credential can get token successfully
    cred.get_token("https://management.azure.com/.default")
except Exception as e:
    # Fall back to browser authentification
    cred = InteractiveBrowserCredential()

In the previous example, we used a dotenv file to store and load sensitive information. Alternatively, we can create a json configuration file in a predefined path that Azure will load automatically. This prevents us from installing unecessary libraries. The following cell copies the default configuration file. You can then replace the default values with yours.

In [None]:
!cp .azureml/config.example.json .azureml/config.json

Connect to your workspace and load the compute clusters. We assume the CPU and GPU clusters were already created.

In [None]:
ml_client = MLClient.from_config(
    credential=cred,
)
cpu_cluster_name = "cpu-cluster"
gpu_cluster_name = "gpu-cluster"

# will fail if one of the compute environments does not exist
print(f"CPU Cluster: {ml_client.compute.get(cpu_cluster_name)}")
print(f"GPU Cluster: {ml_client.compute.get(gpu_cluster_name)}")

## 2. Download the data

Run the [download script](get-data/download.ps1) using Powershell to download a copy of CIFAR-10. 

## 3. Define command components in YAML

Azure provides the utility function `load_component` which parses the YAML configuration job into an executable function.

In [None]:
from azure.ai.ml import load_component

parent_dir = "."
get_data_func = load_component(source=parent_dir + "/get-data.job.yaml")
train_model_func = load_component(source=parent_dir + "/train-model.job.yaml")
eval_model_func = load_component(source=parent_dir + "/eval-model.job.yaml")

## 4. Build and launch the pipeline

In [None]:
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import Input

@pipeline()
def train_cifar_pytorch():
    # define the data job
    data_job = get_data_func(
        cifar_tar=Input(
            path="wasbs://datasets@azuremlexamples.blob.core.windows.net/cifar-10-python.tar.gz",
            type="uri_file",
        )
    )
    data_job.outputs.cifar.mode = "upload"

    # define the training job
    training_job = train_model_func(
        max_epochs=5,
        path_to_data=data_job.outputs.cifar,
        num_workers=4,
        batch_size=16,
    )
    training_job.compute = "gpu-cluster"
    training_job.outputs.path_to_model.mode = "upload"

    # define the evaluation job
    eval_job = eval_model_func(
        cifar=data_job.outputs.cifar,
        path_to_model=training_job.outputs.path_to_model
    )

pipeline_job = train_cifar_pytorch()
pipeline_job.settings.default_compute = "cpu-cluster"

Submit the pipeline job

In [None]:
experiment_name = "pipeline_cifar10_pytorch"

# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name=experiment_name
)
pipeline_job

In [6]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

![Expected output](media/pipeline_cifar10_pytorch_expected.png)

> **Important**:
> Your default subscription might be limited to 6 vCPUs. To increase your quotas, follow the official instructions [here](https://learn.microsoft.com/en-us/azure/quotas/per-vm-quota-requests).