### Introduction

Azure Machine Learning is a platform for operating machine learning workloads in the cloud.

Azure Machine Learning belongs to the Microsoft Azure platform and enable the users to manage:
- Scalable on-demand compute for machine learning workloads.
- Data storage and connectivity from a wide range of sources.
- Machine Learning workflow orchestration to automate model training, deployment and management
processes.
- Model registration and management.
- Metrics and monitoring for training experiments, datasets and published services.
- Model deployment for real-time and batch inference.

### Workspace

A _workspace_ is a context for the experiments, data, compute target and other assets associated with a machine learning workloads. It defines the boundary for machine learning assets. It includes:
- Compute targets
- Data
- Notebooks containing sharing code and documentation.
- Experiments
- Pipelines that define orchestrated multi-step processes.
- Models

A workspace can be created by:
- The Azure portal
- Using Azure Machine Learning Python SDK to run some code that creates a workspace.

#### Workspace

In [1]:
from azureml.core import Workspace

In [None]:
# Create a new workspace.
ws = Workspace.create(name = 'aml-workspace',
                      subscription_id = '#####',
                      resource_group = '####',
                      create_resource_group = False,
                      location = '####')

To create a compute instance, you should go to Azure ML studio and choose between:
- _Compute instances_: development workstations that ds can use to work with data and models.
- _Compute clusters_: scalable clusters of VMs.
- _Inference clusters_: deployment targets for predictive services that use your trained model.
- _Attached compute_: links to other Azure compute resources, such as Virtual Machines or Azure Databricks clusters.

In [2]:
# Connect to an existing workspace.
ws = Workspace.from_config()

In [None]:
ws = Workspace.get(name='aml-workspace',
                   subscription_id='####',
                   resource_group='####')

In [3]:
for elem in ws.compute_targets:
    compute = ws.compute_targets[elem]
    print(compute.name, ':', compute.type)

#### Inline Experiments

An experiment is a process, usually the running of a script or a pipeline that generate some results tracked by Azure.

In [8]:
from azureml.core import Experiment
# Create the experiment.
experiment = Experiment(workspace = ws, name = 'my-first-experiment')
# Start the experiment.
run = experiment.start_logging()
# End the experiment.
run.complete()

For each experiment, a log file is generated with some result that you can print. However, metrics can be compared across different runs if they are stored by means of:

- _log_: record a single named value.
- _log_list_: record a named list of values.
- _log_row_: record a row with multiple columns.
- _log_table_: record a dictionary as table.
- _log_image_ record an image file or a plot.

In [8]:
from azureml.core import Experiment
import pandas as pd

experiment = Experiment(workspace = ws, name = 'my-first-experiment')
run = experiment.start_logging()
data = pd.read_csv('https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv')

n_rows = data.shape[0]

run.log('# of rows:', n_rows)

run.complete()

In [9]:
from azureml.widgets import RunDetails

RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [10]:
run.get_metrics()

{'# of rows:': 150}

Moreover, also some file can be stored under the _Output_ folder.

In [26]:
appo = data.sample(10)
appo.to_csv('./sample.csv')

In [27]:
run.upload_file(name = 'outputs/sample.csv', path_or_stream = './sample.csv')

AzureMLAggregatedException: AzureMLAggregatedException:
	Message: UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.9bda24a1-b0bf-44f7-b3ab-f568f948ff20/outputs/sample.csv already exists.
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.9bda24a1-b0bf-44f7-b3ab-f568f948ff20/outputs/sample.csv already exists."
    }
}

In [28]:
run.get_file_names()

['outputs/sample.csv']

Since the target is local, it means that the machine used to run the previous experiment is my local machine and not a compute taget on azureml.

In [11]:
run.get_details_with_logs()

{'runId': '1a5ef708-b8fd-47e4-a836-64c807f30f6e',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2022-12-18T10:29:19.554827Z',
 'endTimeUtc': '2022-12-18T10:29:24.440202Z',
 'services': {},
 'properties': {'azureml.git.repository_uri': 'https://github.com/LuciaRavazzi/AzureML.git',
  'mlflow.source.git.repoURL': 'https://github.com/LuciaRavazzi/AzureML.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': 'df4c86328c098a60581020fc4d0fc488fee33532',
  'mlflow.source.git.commit': 'df4c86328c098a60581020fc4d0fc488fee33532',
  'azureml.git.dirty': 'True',
  'ContentSnapshotId': 'e588f4a2-5e55-4f69-afb1-b0050b0bb8d9'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {},
 'submittedBy': 'Lucia Ravazzi'}

#### Running a Script as an Experiment

It's better to run an experiment starting from a Python Script instead of the previous experiment.

The ScriptRunConfig defines the compute target, which the deafult one is the local, and the environment.

In [14]:
from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.core.runconfig import DockerConfiguration

# Create a Python environment for the experiment
env = Environment.from_conda_specification("experiment_env", "environment.yml")

script_config = ScriptRunConfig(source_directory = './Script/',
                                script = '1_Script.py',
                                environment=env,
                                docker_runtime_config=DockerConfiguration(use_docker=True),
                                compute_target = 'my-compute')

experiment = Experiment(workspace = ws, name = 'my-first-experiment')
run = experiment.submit(config = script_config)
run.wait_for_completion(show_output = True)

RunId: my-first-experiment_1671360447_363a01c9
Web View: https://ml.azure.com/runs/my-first-experiment_1671360447_363a01c9?wsid=/subscriptions/d12c1b85-0a70-4232-b483-12d1ffcfc148/resourcegroups/ResourceGroupRavazzi/workspaces/aml-workspace&tid=b00367e2-193a-4f48-94de-7245d45c0947

Streaming user_logs/std_log.txt

  from cryptography.hazmat.backends import default_backend
  import mlflow
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.05613422393798828 seconds

Execution Summary
RunId: my-first-experiment_1671360447_363a01c9
Web View: https://ml.azure.com/runs/my-first-experiment_1671360447_363a01c9?wsid=/subscriptions/d12c1b85-0a70-4232-b483-12d1ffcfc148/resourcegroups/ResourceGroupRavazzi/workspaces/aml-workspace&tid=b00367e2-193a-4f48-94de-7245d45c0947



{'runId': 'my-first-experiment_1671360447_363a01c9',
 'target': 'my-compute',
 'status': 'Completed',
 'startTimeUtc': '2022-12-18T10:47:36.460173Z',
 'endTimeUtc': '2022-12-18T10:49:12.960801Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlctrain',
  'ContentSnapshotId': '8ad7eef6-fc5d-49b0-a5a2-e1a9ecd33863',
  'azureml.git.repository_uri': 'https://github.com/LuciaRavazzi/AzureML.git',
  'mlflow.source.git.repoURL': 'https://github.com/LuciaRavazzi/AzureML.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': 'df4c86328c098a60581020fc4d0fc488fee33532',
  'mlflow.source.git.commit': 'df4c86328c098a60581020fc4d0fc488fee33532',
  'azureml.git.dirty': 'True',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': '1_Script.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': [

In [15]:
# view log files.
run.get_details_with_logs()

{'runId': 'my-first-experiment_1671360447_363a01c9',
 'target': 'my-compute',
 'status': 'Completed',
 'startTimeUtc': '2022-12-18T10:47:36.460173Z',
 'endTimeUtc': '2022-12-18T10:49:12.960801Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlctrain',
  'ContentSnapshotId': '8ad7eef6-fc5d-49b0-a5a2-e1a9ecd33863',
  'azureml.git.repository_uri': 'https://github.com/LuciaRavazzi/AzureML.git',
  'mlflow.source.git.repoURL': 'https://github.com/LuciaRavazzi/AzureML.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': 'df4c86328c098a60581020fc4d0fc488fee33532',
  'mlflow.source.git.commit': 'df4c86328c098a60581020fc4d0fc488fee33532',
  'azureml.git.dirty': 'True',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': '1_Script.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': [

In [19]:
# it's easier to view the logs in file.
import os

log_folder = 'downloaded-logs'

# Download all files
run.get_all_logs(destination=log_folder)

# Verify the files have been downloaded
# for root, directories, filenames in os.walk(log_folder):
#    for filename in filenames:
#        print (os.path.join(root,filename))

['downloaded-logs\\user_logs/std_log.txt',
 'downloaded-logs\\system_logs/cs_capability/cs-capability.log',
 'downloaded-logs\\system_logs/hosttools_capability/hosttools-capability.log',
 'downloaded-logs\\system_logs/lifecycler/execution-wrapper.log',
 'downloaded-logs\\system_logs/lifecycler/lifecycler.log',
 'downloaded-logs\\system_logs/lifecycler/vm-bootstrapper.log',
 'downloaded-logs\\system_logs/metrics_capability/metrics-capability.log',
 'downloaded-logs\\system_logs/snapshot_capability/snapshot-capability.log']

#### View versions of experiments

Some version of the same experiment has been run and the results can be viewed through the next lines along with the results.

In [20]:
experiment = ws.experiments['my-first-experiment']
for logged_run in experiment.get_runs():
    print('Run ID:', logged_run.id)
    metrics = logged_run.get_metrics()
    for key in metrics.keys():
        print('-', key, metrics.get(key))

Run ID: my-first-experiment_1671360447_363a01c9
- # of rows: 150
Run ID: my-first-experiment_1671360057_2352e96a
Run ID: 1a5ef708-b8fd-47e4-a836-64c807f30f6e
- # of rows: 150
Run ID: e033743a-5c51-4529-8221-779d1529fe49
- # of rows: 150
Run ID: my-first-experiment_1670877546_8df17ea7
- # of rows: 150
Run ID: my-first-experiment_1670876681_894b6eae
- # of rows: 150
Run ID: my-first-experiment_1670876500_d742894d
Run ID: my-first-experiment_1670875862_d9d617dd
Run ID: my-first-experiment_1670875832_3fdcf2ca
Run ID: my-first-experiment_1670875867_20ff7df1
Run ID: my-first-experiment_1670872296_24e91cbd
Run ID: my-first-experiment_1670872249_011c19a0
Run ID: my-first-experiment_1670872134_4c8ed57a
Run ID: my-first-experiment_1670872100_721bdf3d
Run ID: my-first-experiment_1670872006_9b9eb4bd
Run ID: my-first-experiment_1670871995_514d0e5a
Run ID: my-first-experiment_1670871964_a7f56680
Run ID: my-first-experiment_1670871914_3ef9f1c5
Run ID: 9bda24a1-b0bf-44f7-b3ab-f568f948ff20
- # of rows:

#### MLFlow

MLFlow is an open-source platform for managing machine learning processes.

In [23]:
# !pip install azureml-mlflow

In [25]:
# --- Inline experiment tracked by MLFlow.

from azureml.core import Experiment
import pandas as pd
import mlflow

# Set the MLflow tracking URI to the workspace
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace=ws, name='my-first-experiment')
mlflow.set_experiment(experiment.name)

# start the MLflow experiment
with mlflow.start_run():

    print("Starting experiment:", experiment.name)

    # Load data
    data = pd.read_csv('https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv')

    # Count the rows and log the result
    row_count = (len(data))
    mlflow.log_metric('observations', row_count)
    print("Run complete")

Starting experiment: my-first-experiment
Run complete


In [26]:
# Get the latest run of the experiment
run = list(experiment.get_runs())[0]

# Get logged metrics
print("\nMetrics:")
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))

# Get a link to the experiment in Azure ML studio
experiment_url = experiment.get_portal_url()
print('See details at', experiment_url)


Metrics:
observations 150.0
See details at https://ml.azure.com/experiments/id/bc781008-74a3-4cb4-bd22-f1e3c90a9026?wsid=/subscriptions/d12c1b85-0a70-4232-b483-12d1ffcfc148/resourcegroups/ResourceGroupRavazzi/workspaces/aml-workspace&tid=b00367e2-193a-4f48-94de-7245d45c0947


In [27]:
# --- Experiment with Script tracked by MLFlow.

from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.core.runconfig import DockerConfiguration

# Create a Python environment for the experiment
env = Environment.from_conda_specification("experiment_env", "environment.yml")

script_config = ScriptRunConfig(source_directory = './Script/',
                                script = '3_Script.py',
                                environment=env,
                                docker_runtime_config=DockerConfiguration(use_docker=True),
                                compute_target = 'my-compute')

experiment = Experiment(workspace = ws, name = 'my-first-experiment')
run = experiment.submit(config = script_config)
run.wait_for_completion(show_output = True)

RunId: my-first-experiment_1671361661_7e507ad8
Web View: https://ml.azure.com/runs/my-first-experiment_1671361661_7e507ad8?wsid=/subscriptions/d12c1b85-0a70-4232-b483-12d1ffcfc148/resourcegroups/ResourceGroupRavazzi/workspaces/aml-workspace&tid=b00367e2-193a-4f48-94de-7245d45c0947

Streaming user_logs/std_log.txt

  from cryptography.hazmat.backends import default_backend
  import mlflow
observations: 150
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.055298805236816406 seconds

Execution Summary
RunId: my-first-experiment_1671361661_7e507ad8
Web View: https://ml.azure.com/runs/my-first-experiment_1671361661_7e507ad8?wsid=/subscriptions/d12c1b85-0a70-4232-b483-12d1ffcfc148/resourcegroups/ResourceGroupRavazzi/workspaces/aml-workspace&tid=b00367e2-193a-4f48-94de-7245d45c0947



{'runId': 'my-first-experiment_1671361661_7e507ad8',
 'target': 'my-compute',
 'status': 'Completed',
 'startTimeUtc': '2022-12-18T11:07:53.317817Z',
 'endTimeUtc': '2022-12-18T11:08:04.671185Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlctrain',
  'ContentSnapshotId': '7a00713c-d6af-4cc9-ac1e-a5f0fbd177e3',
  'azureml.git.repository_uri': 'https://github.com/LuciaRavazzi/AzureML.git',
  'mlflow.source.git.repoURL': 'https://github.com/LuciaRavazzi/AzureML.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': 'df4c86328c098a60581020fc4d0fc488fee33532',
  'mlflow.source.git.commit': 'df4c86328c098a60581020fc4d0fc488fee33532',
  'azureml.git.dirty': 'True',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': '3_Script.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': [

In [28]:
# Get logged metrics
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))

observations 150.0


In [29]:
experiment = ws.experiments['my-first-experiment']
for logged_run in experiment.get_runs():
    print('Run ID:', logged_run.id)
    metrics = logged_run.get_metrics()
    for key in metrics.keys():
        print('-', key, metrics.get(key))

Run ID: my-first-experiment_1671361661_7e507ad8
- observations 150.0
Run ID: 5e763043-2b73-4f97-9ecc-d9d42cc1637d
- observations 150.0
Run ID: my-first-experiment_1671360447_363a01c9
- # of rows: 150
Run ID: my-first-experiment_1671360057_2352e96a
Run ID: 1a5ef708-b8fd-47e4-a836-64c807f30f6e
- # of rows: 150
Run ID: e033743a-5c51-4529-8221-779d1529fe49
- # of rows: 150
Run ID: my-first-experiment_1670877546_8df17ea7
- # of rows: 150
Run ID: my-first-experiment_1670876681_894b6eae
- # of rows: 150
Run ID: my-first-experiment_1670876500_d742894d
Run ID: my-first-experiment_1670875862_d9d617dd
Run ID: my-first-experiment_1670875832_3fdcf2ca
Run ID: my-first-experiment_1670875867_20ff7df1
Run ID: my-first-experiment_1670872296_24e91cbd
Run ID: my-first-experiment_1670872249_011c19a0
Run ID: my-first-experiment_1670872134_4c8ed57a
Run ID: my-first-experiment_1670872100_721bdf3d
Run ID: my-first-experiment_1670872006_9b9eb4bd
Run ID: my-first-experiment_1670871995_514d0e5a
Run ID: my-first-e