### Environment and Compute targets

The runtime context for each experiment run consists of two elements:
- The _environment_
- The _compute target_

#### Environments

Python runs are based on _virtual environments_ in which both the python version and packages are defined. In most, Python installation, packages are installed and managed by Conda or Pip.

To improve portability, we usually create environment in docker containers hosted in compute targets.

In general, AzureML handles the creation of this environment through the docker container and manages packages and python. It encapsulates the environment in the _Environment_ class which can be registered in the workspace.

Creating an environment from a specification file

In [10]:
from azureml.core import Environment

env = Environment(name = 'training_environment',
                  file_path = 'environment.yml')

Creating an environment from an existing Conda environment

In [12]:
env = Environment.from_existing_conda_environment(name='azure_env',
                                                  conda_environment_name='azure_env')

FileNotFoundError: [WinError 2] The system cannot find the file specified

Creating an environment by specifying packages

In [3]:
from azureml.core.conda_dependencies import CondaDependencies

env = Environment('training_environment')
deps = CondaDependencies.create(conda_packages = ['scikit-learn','pandas','numpy'],
                                pip_packages=['azureml-defaults'])
env.python.conda_dependencies = deps

Experiments usually use docker containers to create envs, otherwise the environment would be created in computer target (use_docker = False).

In [14]:
from azureml.core import Experiment, ScriptRunConfig
from azureml.core.runconfig import DockerConfiguration

docker_config = DockerConfiguration(use_docker=True)

script_config = ScriptRunConfig(source_directory='Script',
                                script='1_Script.py',
                                environment=env,
                                docker_runtime_config=docker_config)

AzureML has multiple docker's images which are chosen based on the type of compute target: indeed, if you are using a compute target with GPU, a docker's image with CUDA will be chosen.
Moreover, it is possible to create and register your own docker image.

In [None]:
# env.docker.base_image='my-base-image'
# env.docker.base_image_registry='myregistry.azurecr.io/myimage'

On the other hand, you can add some additional layers based on the default image.

In [16]:
# env.docker.base_image=None
# env.docker.base_image_registry='myregistry.azurecr.io/myimage'

If your image includes the dependencies, you can flag it.

In [17]:
# env.python.user_managed_dependencies=True
# env.python.interpreter_path = '/opt/miniconda/bin/python'

Environment can be registered.

In [11]:
from azureml.core import Workspace

ws = Workspace.from_config()

env.register(workspace=ws)

{
    "assetId": "azureml://locations/francecentral/workspaces/0ee9671f-7144-4125-a665-a1b9cffe5395/environments/training_environment/versions/1",
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:20221101.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "buildContext": null,
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "training_e

In [20]:
for env_name in Environment.list(workspace = ws):
    print('Name:', env_name)

Name: training_environment
Name: AzureML-VowpalWabbit-8.8.0
Name: AzureML-PyTorch-1.3-CPU
Name: AzureML-Dask-CPU
Name: AzureML-Dask-GPU
Name: AzureML-Triton
Name: AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu
Name: AzureML-sklearn-0.24-ubuntu18.04-py37-cpu
Name: AzureML-sklearn-1.0-ubuntu20.04-py38-cpu
Name: AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu
Name: AzureML-tensorflow-2.6-ubuntu20.04-py38-cuda11-gpu
Name: AzureML-tensorflow-2.5-ubuntu20.04-py38-cuda11-gpu
Name: AzureML-tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu
Name: AzureML-pytorch-1.7-ubuntu18.04-py37-cuda11-gpu
Name: AzureML-pytorch-1.8-ubuntu18.04-py37-cuda11-gpu
Name: AzureML-pytorch-1.9-ubuntu18.04-py37-cuda11-gpu
Name: AzureML-pytorch-1.10-ubuntu18.04-py38-cuda11-gpu
Name: AzureML-minimal-ubuntu18.04-py37-cpu-inference
Name: AzureML-responsibleai-0.20-ubuntu20.04-py38-cpu
Name: AzureML-responsibleai-0.21-ubuntu20.04-py38-cpu
Name: AzureML-PTA-pytorch-1.11-py38-cuda11.3-gpu
Name: AzureML-PTA-pytorch-1.11-py38-cuda11.5-

Retrieve a registered environment.

If you are trying to retrieve an environment which doesn't exist, AzureML will create it for you.

In [12]:
training_env = Environment.get(workspace=ws, name='training_environment')

### Compute

In Azure machine learning, _Compute Targets_ are real or virtual computers in which experiments are run.

There are multiple choices of compute target, such as the following ones:
- Local Compute: the computers on which the code which initiate the experiment is written. It can be your local compute or a virtual machine when you are using the notebook asset in AzureML.
- Compute cluster: AzureML gives you the possibility to use multi-node of Virtual Machines that scales up or down to meet the demand. It can be useful when a requirement is the scalability or some kind of parallel processing should be used.
- Attached compute: If you already use an Azure-based compute environment for data science, such as a virtual machine or an Azure Databricks cluster, you can attach it to your Azure Machine Learning workspace and use it as a compute target for certain types of workload.
- Inference clusters: this kind of compute represents an Azure Kubernetes Service cluster and can only be used to deploy trained models as inferencing services.

Using multiple Compute targets can have multiple advantages:
- Develop and test your code on a low-cost machine before using a cluster for deployment.
- Choosing the best compute for the current task. For instance, it could be more effective to use a compute target with a performing GPU for training a DNN and then, switch to a CPU-based compute target.

Moreover, compute target costs depends on what resources are you exploiting. So, you can scale automatically based on the workload needed and stop the processes automatically as well.

A compute target can be created through AzureML studio or by SDK.

A _managed_ compute target is one that is managed by AzureML, such as a cluster.

In [2]:
from azureml.core import Workspace

ws = Workspace.from_config()

We would like to create a cluster based on the image of the virtual machine STANDARD_DS11_V2 up to four nodes.

The _dedicated_ priority means that the current machine is reserved for us, otherwise if it was equal to lowpriority, we would lose our priority if someone with a higher priority asked for that.

In [3]:
from azureml.core.compute import ComputeTarget, AmlCompute

compute_name = 'aml-cluster'
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2',
                                                       min_nodes=0,
                                                       max_nodes=4,
                                                       vm_priority='dedicated')

aml_cluster = ComputeTarget.create(ws, compute_name, compute_config)
aml_cluster.wait_for_completion(show_output=True)

InProgress.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


An _unmanaged_ compute target is one that is not managed by AzureML, such as an Azure virtual machine or Azure Databricks cluster.

In the code below, a databricks cluster will be attached to the personal workspace.

In [5]:
# Specify a name for the compute (unique within the workspace)
# compute_name = 'db_cluster'

# Define configuration for existing Azure Databricks cluster
# db_workspace_name = 'db_workspace'
# db_resource_group = 'db_resource_group'
# db_access_token = '1234-abc-5678-defg-90...'
# db_config = DatabricksCompute.attach_configuration(resource_group=db_resource_group,
#                                                   workspace_name=db_workspace_name,
#                                                   access_token=db_access_token)
# Create the compute
# databricks_compute = ComputeTarget.attach(ws, compute_name, db_config)
# databricks_compute.wait_for_completion(True)

Check if a compute target already exists.

In [8]:
from azureml.core.compute_target import ComputeTargetException

compute_name = "abl-cluster"

try:
    aml_cluster = ComputeTarget(workspace=ws, name=compute_name)
    print('Found existing cluster.')
except ComputeTargetException:
    print('Cluster not found.')

Cluster not found.


To use a particular compute target when running an experiment, you should specify its name in the ScriptRunConfig.

First of all, the compute target is started and the environment is created, then the job can start.

In [None]:
from azureml.core import Environment, ScriptRunConfig

compute_name = 'aml-cluster'

training_env = Environment.get(workspace=ws, name='training_environment')

script_config = ScriptRunConfig(source_directory='Script',
                                script='1_Script.py',
                                environment=training_env,
                                compute_target=compute_name)

It is also possible to specify the compute target object.

In [15]:
cluster = ComputeTarget(workspace=ws, name=compute_name)

training_env = Environment.get(workspace=ws, name='training_environment')

script_config = ScriptRunConfig(source_directory='Script',
                                script='1_Script.py',
                                environment=training_env,
                                compute_target=cluster)

### Full example

Retrieve data.

In [21]:
from azureml.core import Dataset
from azureml.data.datapath import DataPath

default_ds = ws.get_default_datastore()
name_data = 'diabetes dataset'

if name_data not in default_ds.name:
    Dataset.File.upload_directory(src_dir='./Script/data',
                                  target=DataPath(default_ds, 'diabetes-data'))

    tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

    try:
        tab_data_set = tab_data_set.register(workspace=ws,
                                name='diabetes dataset',
                                description='diabetes data',
                                tags = {'format':'CSV'},
                                create_new_version=True)
        print('Dataset registered.')

    except Exception as ex:

        print(ex)

Validating arguments.
Arguments validated.
Uploading file to diabetes-data
Uploading an estimated of 1 files
Target already exists. Skipping upload for diabetes-data\diabetes.csv
Uploaded 0 files
Creating new dataset
Dataset registered.


AzureML creates a standard env if you don't define another configuration. Especially, it cosists of azureml-defaults and numpy and pandas.

In the configuration file, the conda dependencies are defined after the pip dependencies and so, the conda dep should contains also pip.

If you reuse an environment, the first run cache it which will be retrived quickier.

You cannot use the prefix AzureML for your own envs because it is reserved for the curated envs.

In [39]:
from azureml.core import Environment

env = Environment.from_conda_specification('experiment-env', 'environment.yml')
# Let Azure ML manage dependencies
# env.python.user_managed_dependencies = False

env.register(workspace=ws)

In [43]:
from azureml.core import Experiment, ScriptRunConfig
from azureml.core.runconfig import DockerConfiguration
from azureml.widgets import RunDetails

data = ws.datasets.get(name_data)

# Create a script config
script_config = ScriptRunConfig(source_directory='Script',
                                script='5_Train_Dataset.py',
                                arguments = ['--regularization', 0.1, # Regularizaton rate parameter
                                             '--input-data', data.as_named_input('training_data')], # Reference to dataset
                                environment=env,
                                compute_target='ravazzil-compute') # Use docker to host environment


In [45]:
# submit the experiment
experiment_name = 'mslearn-train-diabetes'
experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(config=script_config)
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [46]:
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')
for file in run.get_file_names():
    print(file)



outputs/5_Train_with_parameters_and_dataset.py
outputs/diabetes_model.pkl
outputs/my_diabetes_model.pkl
system_logs/cs_capability/cs-capability.log
system_logs/hosttools_capability/hosttools-capability.log
system_logs/lifecycler/lifecycler.log
system_logs/lifecycler/vm-bootstrapper.log
system_logs/metrics_capability/metrics-capability.log
system_logs/snapshot_capability/snapshot-capability.log


In [51]:
from azureml.core.compute import ComputeTarget, AmlCompute

compute_name = 'aml-cluster'

if compute_name not in ws.compute_targets:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2',
                                                       min_nodes=0,
                                                       max_nodes=2)

    aml_cluster = ComputeTarget.create(ws, compute_name, compute_config)
    aml_cluster.wait_for_completion(show_output=True)

    print('Compute target successfully created.')

else:
    print('Compute target already created.')

InProgress.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [52]:
# Create a script config
script_config = ScriptRunConfig(source_directory='Script',
                                script='5_Train_Dataset.py',
                                arguments = ['--regularization', 0.1, # Regularizaton rate parameter
                                             '--input-data', data.as_named_input('training_data')], # Reference to dataset
                                environment=env,
                                compute_target=compute_name) # Use docker to host environment

In [53]:
# submit the experiment
experiment_name = 'mslearn-train-diabetes'
experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(config=script_config)
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [None]:
It takes a while in order to run all the experiment because first of all, the container image must be built with the conda environment, then the node of the cluster mst be started and the image must be deployed.

In [58]:
cluster_state = aml_cluster.get_status()
print(cluster_state.allocation_state, cluster_state.current_node_count)

Steady 1


In [None]:
# It blocks the kernel for the local notebook.
run.wait_for_completion()