In [1]:
from azureml.core import Workspace
subscription_id = '<subscription_id>'
resource_group = '<resource_group>'

ws = Workspace(
    workspace_name='WORKSPACE-NAME',
    subscription_id=subscription_id,
    resource_group=resource_group,
)

Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception The 'msrestazure<=0.6.4,>=0.4.33' distribution was not found and is required by the application.


## Create AKS cluster

Before proceeding, please provision an AKS cluster within the same subscription as the Azure ML Workspace. It does not need to be in the same Resource Group.

## Prepare the cluster for Azure ML usage

Before we attach it to the workspace, we need to set up the cluster for use with Azure ML training. This means enabling the Azure ML Extension on AKS.
See [these docs](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-attach-arc-kubernetes?tabs=studio#aks-prerequisites) for more details.

In [None]:
!az provider register --namespace Microsoft.KubernetesConfiguration
!az provider register --namespace Microsoft.ContainerService

Those may take a few minutes to complete, so wait about 10 minutes before moving on. You can track the progress with the commands:
```
az provider show -n Microsoft.KubernetesConfiguration -o table
az provider show -n Microsoft.ContainerService -o table
```

In [2]:
!az feature register --namespace "Microsoft.ContainerService" --name "AKS-ExtensionManager"

Run these commands again to make sure the registration has refreshed

In [None]:
!az provider register --namespace Microsoft.KubernetesConfiguration
!az provider register --namespace Microsoft.ContainerService

In [None]:
!az k8s-extension create --name arcml-extension --extension-type Microsoft.AzureML.Kubernetes --config enableTraining=True --cluster-type connectedClusters --cluster-name merlionaks --resource-group rg-merlion-feature-store-project --scope cluster --auto-upgrade-minor-version False

## Attach AKS cluster to Azure ML

Now, we can attach the AKS cluster to the workspace and use it for training as a Compute Target. This is using the experimental KubernetesCompute class.

In [3]:
from azureml.core.compute import KubernetesCompute, ComputeTarget
cluster_name = 'merlionaks'
resource_id = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.ContainerService/managedClusters/<AKS-CLUSTER-NAME>"
# Verify that cluster does not exist already
try:
    aks_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # To use a different region for the compute, add a location='<region>' parameter
    # resource ID for the Kubernetes cluster and user-managed identity
    attach_config = KubernetesCompute.attach_configuration(
        resource_id=resource_id,
        namespace="default",
        )
    aks_target = ComputeTarget.attach(ws, cluster_name, attach_config)


aks_target.wait_for_completion(show_output=True)

Class KubernetesCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Found existing cluster, use it.

Final state of "Succeeded" has been reached



# Create a standard Compute Cluster

Alongside our AKS cluster, now we'll need to create a standard Azure ML Compute Cluster.
These clusters are auto-scaling VM instances are managed entirely by Azure ML.

In [2]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "merlion-cpu"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # To use a different region for the compute, add a location='<region>' parameter
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


# Perform training on AKS or Compute Cluster

One of the beauties of the Azure ML SDK is that moving a training job from one compute type to another (in our example, AKS to AML Compute), it's as simple as choosing a different compute cluster for the training job.

This notebook will follow the example in the [official docs](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets)

In [4]:
from azureml.core import Experiment

experiment_name = 'aks_vs_amlcompute'
experiment = Experiment(workspace=ws, name=experiment_name)

In [5]:
# We will use a curated environment, which is a Microsoft managed Docker image
from azureml.core import Workspace, Environment

myenv = Environment.get(workspace=ws, name="AzureML-Tutorial")

In [6]:
from azureml.core import ScriptRunConfig

# Train on AKS
aks_src = ScriptRunConfig(source_directory='../scripts',
                      script='train.py',
                      compute_target=aks_target,
                      environment=myenv)

aks_src.run_config.environment = myenv

In [7]:
run = experiment.submit(aks_src)

In [8]:
run.wait_for_completion(show_output=True)

RunId: aks_vs_amlcompute_1646355381_203c4099
Web View: https://ml.azure.com/runs/aks_vs_amlcompute_1646355381_203c4099?wsid=/subscriptions/12f4bdb4-aa23-4f3d-bff0-7eec97b0443f/resourcegroups/rg-merlion-feature-store-project/workspaces/merlion-feature-store-workspace&tid=72f988bf-86f1-41af-91ab-2d7cd011db47

Streaming azureml-logs/75_job_post-tvmps_a39395b957fe43c9a11ec9dc9916fe7c-master-0_d.txt

[2022-03-04T00:57:44.121148] Entering job release
Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (pyarrow 6.0.1 (/azureml-envs/azureml_1296d9ccb6d6509a0126eeef4e26fcc9/lib/python3.6/site-packages), Requirement.parse('pyarrow<4.0.0,>=0.17.0'), {'azureml-dataset-runtime'}).
Cannot provide tracer without any exporter configured.
[2022-03-04T00:57:46.155326] Starting job release
[2022-03-04T00:57:46.194524] Logging experiment finalizing status in history service.
Starting the daemon thread to refre

{'runId': 'aks_vs_amlcompute_1646355381_203c4099',
 'target': 'merlionaks',
 'status': 'Completed',
 'startTimeUtc': '2022-03-04T00:57:22.723444Z',
 'endTimeUtc': '2022-03-04T00:58:22.384531Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'kubernetes',
  'ContentSnapshotId': '597b4f84-d6dd-4d50-b667-d09c76fcb988',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json',
  'azureml.git.repository_uri': 'https://gitlab.com/merlion-crew/feature-store.git',
  'mlflow.source.git.repoURL': 'https://gitlab.com/merlion-crew/feature-store.git',
  'azureml.git.branch': 'story/270/tarockey',
  'mlflow.source.git.branch': 'story/270/tarockey',
  'azureml.git.commit': '5ae8cc0ddb325a6c1a6b38d49d8892f418fad6ec',
  'mlflow.source.git.commit': '5ae8cc0ddb325a6c1a6b38d49d8892f418fad6ec',
  'azureml.git.dirty': 'False',
  'JobType': 'RegularJob',
  'GpuCount': '0',
  'Cluster': 'merlionaks'},
 'inputDatasets': [],
 'outputDataset

In [9]:
# Train on Azure ML Compute Cluster
compute_src = ScriptRunConfig(source_directory='../scripts',
                      script='train.py',
                      compute_target=cpu_cluster,
                      environment=myenv)

compute_src.run_config.environment = myenv

In [10]:
compute_run = experiment.submit(compute_src)
compute_run.wait_for_completion(show_output=True)

RunId: aks_vs_amlcompute_1646355508_5f917981
Web View: https://ml.azure.com/runs/aks_vs_amlcompute_1646355508_5f917981?wsid=/subscriptions/12f4bdb4-aa23-4f3d-bff0-7eec97b0443f/resourcegroups/rg-merlion-feature-store-project/workspaces/merlion-feature-store-workspace&tid=72f988bf-86f1-41af-91ab-2d7cd011db47

Execution Summary
RunId: aks_vs_amlcompute_1646355508_5f917981
Web View: https://ml.azure.com/runs/aks_vs_amlcompute_1646355508_5f917981?wsid=/subscriptions/12f4bdb4-aa23-4f3d-bff0-7eec97b0443f/resourcegroups/rg-merlion-feature-store-project/workspaces/merlion-feature-store-workspace&tid=72f988bf-86f1-41af-91ab-2d7cd011db47



{'runId': 'aks_vs_amlcompute_1646355508_5f917981',
 'target': 'merlion-cpu',
 'status': 'Completed',
 'startTimeUtc': '2022-03-04T01:00:05.672558Z',
 'endTimeUtc': '2022-03-04T01:01:00.637673Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '597b4f84-d6dd-4d50-b667-d09c76fcb988',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json',
  'azureml.git.repository_uri': 'https://gitlab.com/merlion-crew/feature-store.git',
  'mlflow.source.git.repoURL': 'https://gitlab.com/merlion-crew/feature-store.git',
  'azureml.git.branch': 'story/270/tarockey',
  'mlflow.source.git.branch': 'story/270/tarockey',
  'azureml.git.commit': '5ae8cc0ddb325a6c1a6b38d49d8892f418fad6ec',
  'mlflow.source.git.commit': '5ae8cc0ddb325a6c1a6b38d49d8892f418fad6ec',
  'azureml.git.dirty': 'False'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train.py',
  'command': '',
  'use