# Train using Azure Machine Learning Compute

 - Initialize a Workspace
 - Create an Experiment
 - Introduction to AmlCompute
 - Submit an AmlCompute script run using a persistent compute target
 - Download the fitted model from the run output artifacts
 
### Prerequisites:
 
If you are using an Azure Machine Learning Compute Instance, Experiment is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments. Please ensure azureml-core is installed on the machine running Jupyter.


In [1]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

SDK version: 1.42.0


#### Initialize Workspace

In [2]:
from azureml.core import Workspace

# The workspace information from the previous experiment has been pre-filled for you.
subscription_id = "afb6ac2c-d7e2-4e20-80e7-b216c765d922"
resource_group = "rg-az220"
workspace_name = "Aakriti_ML"

ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

Performing interactive authentication. Please follow the instructions on the terminal.


The default web browser has been opened at https://login.microsoftonline.com/organizations/oauth2/v2.0/authorize. Please continue the login in the web browser. If no web browser is available or if the web browser fails to open, use device code flow with `az login --use-device-code`.
The following tenants don't contain accessible subscriptions. Use 'az login --allow-no-subscriptions' to have tenant level access.
f5222e6c-5fc6-48eb-8f03-73db18203b63 'University of Cincinnati'


Interactive authentication successfully completed.
Aakriti_ML
rg-az220
eastus
afb6ac2c-d7e2-4e20-80e7-b216c765d922


#### Create An Experiment 

- Experiment is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [3]:
from azureml.core import Experiment

# The experiment name has been pre-filled for you.
experiment_name = "mslearn-bike-rental"
experiment = Experiment(workspace = ws, name = experiment_name)

#### Introduction to AmlCompute 

- Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created within your workspace region and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user.
- Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service.
- For more information on Azure Machine Learning Compute, please read this article
- Note: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota.
The training script is already created for you. Let's have a look.
Create project directory Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on

In [4]:
import os
import shutil

project_folder = os.path.join(".", experiment_name)
os.makedirs(project_folder, exist_ok=True)
shutil.copy('script.py', project_folder)

'./mslearn-bike-rental/script.py'

#### Create environment 
- Create Docker based environment with scikit-learn installed.

In [5]:
import hashlib
from azureml.core import Environment
from azureml.core.runconfig import DockerConfiguration
from azureml.core.conda_dependencies import CondaDependencies

myenv = Environment.get(ws, 'AzureML-AutoML', '114')

# Enable Docker
docker_config = DockerConfiguration(use_docker=True)

In [6]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cluster_name = "Aakriti-MLcompute"

# Verify that cluster does not exist already
try:
    cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2',
                                                           max_nodes=4)
    cluster = ComputeTarget.create(ws, cluster_name, compute_config)

cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


#### Provision as a persistent compute target (Basic) 

You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.

vm_size: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above max_nodes: Maximum nodes to autoscale to while running a job on AmlCompute

In [7]:
import uuid
from azureml.core import ScriptRunConfig
from azureml._restclient.models import RunTypeV2
from azureml._restclient.models.create_run_dto import CreateRunDto
from azureml._restclient.run_client import RunClient

codegen_runid = str(uuid.uuid4())
client = RunClient(experiment.workspace.service_context, experiment.name, codegen_runid, experiment_id=experiment.id)

# To test with new training / validation datasets, replace the default dataset id(s) taken from parent run below
training_dataset_id = '91196f95-8433-4381-8568-1688a8905a5c'
dataset_arguments = ['--training_dataset_id', training_dataset_id]

create_run_dto = CreateRunDto(run_id=codegen_runid,
                              parent_run_id='AutoML_84707285-f304-41bd-8de6-2078576ef649_0',
                              description='AutoML Codegen Script Run',
                              target=cluster_name,
                              run_type_v2=RunTypeV2(
                                  orchestrator='Execution', traits=['automl-codegen']))
src = ScriptRunConfig(source_directory=project_folder, 
                      script='script.py', 
                      arguments=dataset_arguments, 
                      compute_target=cluster, 
                      environment=myenv,
                      docker_runtime_config=docker_config)
run_dto = client.create_run(run_id=codegen_runid, create_run_dto=create_run_dto)
 
run = experiment.submit(config=src, run_id=codegen_runid)
run

Experiment,Id,Type,Status,Details Page,Docs Page
mslearn-bike-rental,0a88a6fe-8883-44b6-9878-8b4f347271ce,azureml.scriptrun,Preparing,Link to Azure Machine Learning studio,Link to Documentation


#### Note: if you need to cancel a run, you can follow these instructions.

In [8]:
%%time
# Shows output of the run on stdout.
run.wait_for_completion(show_output=True)

RunId: 0a88a6fe-8883-44b6-9878-8b4f347271ce
Web View: https://ml.azure.com/runs/0a88a6fe-8883-44b6-9878-8b4f347271ce?wsid=/subscriptions/afb6ac2c-d7e2-4e20-80e7-b216c765d922/resourcegroups/rg-az220/workspaces/Aakriti_ML&tid=f3308007-477c-4a70-8889-34611817c55a

Execution Summary
RunId: 0a88a6fe-8883-44b6-9878-8b4f347271ce
Web View: https://ml.azure.com/runs/0a88a6fe-8883-44b6-9878-8b4f347271ce?wsid=/subscriptions/afb6ac2c-d7e2-4e20-80e7-b216c765d922/resourcegroups/rg-az220/workspaces/Aakriti_ML&tid=f3308007-477c-4a70-8889-34611817c55a

CPU times: user 323 ms, sys: 96.1 ms, total: 419 ms
Wall time: 6.62 s


{'runId': '0a88a6fe-8883-44b6-9878-8b4f347271ce',
 'target': 'Aakriti-MLcompute',
 'status': 'Completed',
 'startTimeUtc': '2022-05-30T15:27:19.333614Z',
 'endTimeUtc': '2022-05-30T15:28:45.599623Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlctrain',
  'ContentSnapshotId': 'a343bf20-536e-4802-908e-709561cbf68f',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [{'dataset': {'id': '91196f95-8433-4381-8568-1688a8905a5c'}, 'consumptionDetails': {'type': 'Reference'}}],
 'outputDatasets': [],
 'runDefinition': {'script': 'script.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--training_dataset_id',
   '91196f95-8433-4381-8568-1688a8905a5c'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'Aakriti-MLcompute',
  'dataReferences': {},
  'data': {},
  'outputData': {},
  'datacaches': [],
  'jobName': None,
  'maxRu

In [9]:
run.get_metrics()

{'residuals': "{'schema_type': 'residuals', 'schema_version': '1.0.0', 'data': {'bin_edges': [-1715.3899999999999, -1372.31, -1029.23, -686.16, -343.08, 0.0, 343.08, 686.16, 1029.23, 1372.31, 1715.3899999999999], 'bin_counts': [0, 1, 2, 8, 78, 85, 7, 2, 0, 0], 'mean': -7.229347949136273, 'stddev': 243.2250842016752, 'res_count': 183}}",
 'normalized_mean_absolute_error': 0.047189559962206765,
 'normalized_root_mean_squared_log_error': nan,
 'predicted_true': "{'schema_type': 'predictions', 'schema_version': '1.0.0', 'data': {'bin_edges': [2.0, 42.82, 256.25, 469.67, 683.1, 896.53, 1109.95, 1323.38, 1536.81, 1750.23, 1963.66, 2177.09, 2390.51, 2603.94, 2827.0], 'bin_counts': [2, 44, 32, 15, 27, 27, 13, 10, 2, 2, 1, 2, 4, 2], 'bin_averages': [106.96215204327328, 196.56127474097366, 379.30418086804946, 664.8489072506758, 783.8316265149052, 1016.9138579976866, 1238.868540926408, 1402.862427489684, 1473.163148314953, 2032.9201345585166, 2106.724464461851, 1484.3620235883532, 1890.4814344693

#### Download Fitted Model

In [13]:
import joblib

# Load the fitted model from the script run.

# Note that if training dependencies are not installed on the machine
# this notebook is being run from, this step can fail.
try:
    run.download_file("outputs/model.pkl", "model.pkl")
    model = joblib.load("model.pkl")
except ImportError:
    print('Required dependencies are missing; please run pip install azureml-automl-runtime.')
    raise

Required dependencies are missing; please run pip install azureml-automl-runtime.


ModuleNotFoundError: No module named 'sklearn_pandas'

#### You can now inference using this model.

- For classification/regression, call model.predict()
- For forecasting, call model.forecast()