# Image classification distributed training

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Create `Pipeline` with components

**Motivations** -This is a sample to demonstrate distributed training in azure machine learning. In this pipeline, we use components to do image preprocessing on cpu nodes, and a mpi custom component on distributed gpu nodes.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [None]:
#import required libraries
from azure.ml import MLClient, dsl
from azure.ml.entities import load_component

## 1.2. Configure credential

We are using `DefaultAzureCredential` to get access to workspace. When an access token is needed, it requests one using multiple identities(`EnvironmentCredential, ManagedIdentityCredential, SharedTokenCacheCredential, VisualStudioCodeCredential, AzureCliCredential, AzurePowerShellCredential`) in turn, stopping when one provides a token.
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for more information.

`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python) for all available credentials if it does not work for you.  

In [None]:
from azure.identity import DefaultAzureCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token('https://management.azure.com/.default')
except Exception as ex:
    # If exception happens when retrieve token, try exclude the failed credential like this then try again:
    # Exclude VSCode credential:
    # credential = DefaultAzureCredential(exclude_visual_studio_code_credential=True)
    raise Exception("Failed to retrieve a token from the included credentials due to the following exception, try to add `exclude_xxx_credential=True` to `DefaultAzureCredential` and try again.") from ex

## 1.3. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace.

In [None]:
try:
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # NOTE: Update following workspace information if not correctly configure before
    client_config = {
        "subscription_id": "<SUBSCRIPTION_ID>",
        "resource_group": "<RESOURCE_GROUP>",
        "workspace_name": "<WORKSPACE_NAME>"
    }

    if client_config["subscription_id"].startswith('<'):
        print("please update your <SUBSCRIPTION_ID> <RESOURCE_GROUP> <WORKSPACE_NAME> in notebook cell")
        raise ex
    else:  # write and reload from config file
        import json, os
        config_path = "../../.azureml/config.json"
        os.makedirs(os.path.dirname(config_path), exist_ok=True)
        with open(config_path, "w") as fo:
            fo.write(json.dumps(client_config))
        ml_client = MLClient.from_config(credential=credential, path=config_path)
print(ml_client)

## 1.4. Retrieve or create an Azure Machine Learning compute target
Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the pipeline on this compute target.

If we could not find the compute with the given name, then we will create a new compute here. This process is broken down into the following steps:

1. Create the configuration
2. Create the Azure Machine Learning compute

**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**

In [None]:
from azure.ml.entities import AmlCompute

# specify aml compute name.
gpu_compute_target = 'gpu-cluster'
cpu_compute_target = 'cpu-cluster'

try:
    ml_client.compute.get(cpu_compute_target)
except Exception:
    print('Creating a new cpu compute target...')
    compute = AmlCompute(name=cpu_compute_target, size="STANDARD_D2_V2", min_instances=0, max_instances=4)
    ml_client.compute.begin_create_or_update(compute)

try:
    ml_client.compute.get(gpu_compute_target)
except Exception:
    print('Creating a new gpu compute target...')
    compute = AmlCompute(name=gpu_compute_target, size="STANDARD_NC6", min_instances=0, max_instances=4)
    ml_client.compute.begin_create_or_update(compute)

## Prepare dataset
This dataset is a subset of the [Open Images Dataset](https://storage.googleapis.com/openimages/web/index.html).
- training dataset contains 24 images (3 categories * 8 images per category)
- validation dataset contains 6 images (3 categories * 2 images per category)

This is a extremely small dataset only for demo usage for this notebook. Use larger datasets to train models for production usage. 

NOTE: Use zip file here to avoid performance issue of mounting file dataset with many subfolders.

In [None]:
from azure.ml.entities import Data, JobInput
from azure.ml._constants import AssetTypes


train_image_dataset = JobInput(name='TrainData', type=AssetTypes.URI_FOLDER, path="./data/train")
val_image_dataset = JobInput(name='ValidData', type=AssetTypes.URI_FOLDER, path="./data/val")

# 2. Define command component via YAML
Below is a basic example to define command component  using YAML.


In [None]:

apply_transform = load_component(yaml_file="./apply_image_transformation/apply_image_transformation.yaml")
convert = load_component(yaml_file="./convert_to_image_directory/convert_to_image_directory.yaml")
init_transform = load_component(yaml_file="./init_image_transformation/init_image_transformation.yaml")

# this train component is an mpi component.
imagecnn_train = load_component(yaml_file='./imagecnn_train/entry.spec.yaml')

# 3. Basic pipeline job

## 3.1 Build pipeline

In [None]:

# define pipeline
@dsl.pipeline(
    name='image_classification_with_densenet',
    description='E2E image classification pipeline with densenet',
    default_compute=cpu_compute_target,
)
def image_classification_with_densenet():
    convert_train = convert(input_path=train_image_dataset)

    convert_val = convert(input_path=val_image_dataset)

    init_image_transformation = init_transform(
        resize='False',
        size=256,
        center_crop='False',
        crop_size=224,
        pad='False',
        padding=0,
        color_jitter='False',
        grayscale='False',
        random_resized_crop='False',
        random_resized_crop_size=256,
        random_crop='False',
        random_crop_size=224,
        random_horizontal_flip='True',
        random_vertical_flip='False',
        random_rotation='False',
        random_rotation_degrees=0,
        random_affine='False',
        random_affine_degrees=0,
        random_grayscale='False',
        random_perspective='False',
    )

    apply_trans_on_train = apply_transform(
        mode='For training',
        input_image_transform_path=init_image_transformation.outputs.output_path,
        input_image_dir_path=convert_train.outputs.output_path,
    )

    apply_trans_on_val = apply_transform(
        mode='For inference',
        input_image_transform_path=init_image_transformation.outputs.output_path,
        input_image_dir_path=convert_val.outputs.output_path,
    )

    imagecnn_train_gpu = imagecnn_train(
        train_data=apply_trans_on_train.outputs.output_path,
        valid_data=apply_trans_on_val.outputs.output_path,
        data_backend='pytorch',
        arch='resnet50',
        model_config='classic',
        workers=5,
        epochs=4,
        batch_size=16,
        optimizer_batch_size=-1,
        lr=0.1,
        lr_schedule='step',
        warmup=0,
        label_smoothing=0.0,
        mixup=0.0,
        momentum=0.9,
        weight_decay=0.0001,
        print_freq=10,
        resume='',
        pretrained_weights='',
        static_loss_scale=1.0,
        prof=-1,
        seed=123,
        raport_file='experiment_raport.json',
        save_checkpoint_epochs=2,
    )
    imagecnn_train_gpu.compute = gpu_compute_target
    imagecnn_train_gpu.resources.instance_count = 1

# create a pipeline
pipeline = image_classification_with_densenet()

In [None]:
print(pipeline)

# 3.2 Submit pipeline job

In [None]:
pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name='pipeline_samples')
pipeline_job

In [None]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

# Next Steps
You can see further examples of running a pipeline job [here](/sdk/jobs/pipelines/)