# Exercise 3 - Compute Contexts

In the previous exercise, you used *datastores* and *datasets* to define shared sources of data that can be consumed in *experiments* and used to train machine learning models. In this exercise, you'll extend your experiments beyond the local compute context and take advantage of the cloud to run experiments in dynamically created compute contexts.

> **Important**: This exercise assumes you have completed the previous exercises in this series - specifically, you must have:
>
> - Created an Azure ML Workspace.
> - Uploaded the diabetes.csv data file to the workspace's default datastore.
> - Registered a **Diabetes Dataset** dataset in the workspace.

## Task 1: Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK. Let's start by ensuring you still have the latest version installed.

In [1]:
#!pip install --upgrade azureml-sdk[notebooks]
import azureml.core
print("Ready to use Azure ML", azureml.core.VERSION)

Ready to use Azure ML 1.0.83


Now you're ready to connect to your workspace. When you created it in the previous exercise, you saved its configuration; so now you can simply load the workspace from its configuration file.

> **Note**: If the authenticated session with your Azure subscription has expired since you completed the previous exercise, you'll be prompted to reauthenticate.

In [2]:
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to work with', ws.name)

Ready to work with contosoml


## Task 2: Run an Experiment on Remote Compute
In many cases, your local compute resources may not be sufficient to process a complex or long-running experiment that needs to process a large volume of data; and you may want to take advantage of the ability to dynamically create and use compute resources in the cloud.

Azure ML supports a range of compute targets, which you can define in your workpace and use to run experiments; paying for the resources only when using them.

> **Note**: In this exercise, you'll use an *Azure Machine Learning Compute* container cluster. For more details of the options for compute targets, see the [Azure ML documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-compute-target).

In [3]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # Create an AzureMl Compute resource (a container cluster)
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           vm_priority='dedicated', 
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Creating
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned


Look at the **Compute** tab in the workspace in the [Azure portal](https://portal.azure.com) to verify that the compute resource has been created.

You can also use the following code to enumerate the compute targets in your workspace.

In [4]:
for target_name in ws.compute_targets:
    target = ws.compute_targets[target_name]
    print(target.name, target.type)

cpu-cluster AmlCompute


In [5]:
import os
from azureml.core import Experiment

# Create an experiment
experiment_name = 'diabetes_training'
experiment = Experiment(workspace = ws, name = experiment_name)

# Create a folder for the experiment files
experiment_folder = './' + experiment_name
os.makedirs(experiment_folder, exist_ok=True)

print("Experiment:", experiment.name)

Experiment: diabetes_training


We create a separate Python script file to enable greater flexibility in terms of the Python environment, or even the compute platform, on which the experiment code is to be run; and makes it easier you to manage experiment scripts in a source-controlled environment.
First, let's create an experiment and a local folder into which we'll put the files needed to run it.

In [6]:
%%writefile $experiment_folder/diabetes_training.py
# Import libraries
import argparse
from azureml.core import Workspace, Dataset, Experiment, Run
import pandas as pd
import numpy as np
import joblib
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Set regularization parameter
parser = argparse.ArgumentParser()
parser.add_argument('--regularization', type=float, dest='reg_rate', default=0.01, help='regularization rate')
args = parser.parse_args()
reg = args.reg_rate

# Get the experiment run context
run = Run.get_context()

# load the diabetes dataset
dataset_name = 'Diabetes Dataset'
print("Loading data from " + dataset_name)
diabetes = Dataset.get_by_name(workspace=run.experiment.workspace, name=dataset_name).to_pandas_dataframe()

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# plot ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
fig = plt.figure(figsize=(6, 4))
# Plot the diagonal 50% line
plt.plot([0, 1], [0, 1], 'k--')
# Plot the FPR and TPR achieved by our model
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
run.log_image(name = "ROC", plot = fig)
plt.show()

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')

run.complete()

Writing ./diabetes_training/diabetes_training.py


Now you're ready to run the experiment on the remote compute.

This time, rather than the generic **Estimator** class, you'll use the **SKLearn** class, which is an estimator that is specifically designed for scikit-learn model training. You'll also specify the package dependencies in the constructor for the estimator. The only reason for this is to see that there are different ways to accomplish essentially the same task.

> **Note**: Once again, this will take a while to run as the nodes in the remote compute must be started and configured before the experiment script is run.

In [7]:
from azureml.train.sklearn import SKLearn

# Set the script parameters
script_params = {
    '--regularization': 0.1
}


# Create a new estimator that uses the remote compute
remote_estimator = SKLearn(source_directory=experiment_folder,
                           script_params=script_params,
                           compute_target = cpu_cluster,
                           conda_packages=['pandas','ipykernel','matplotlib'],
                           pip_packages=['azureml-sdk','argparse','pyarrow'],
                           entry_script='diabetes_training.py')

# Run the experiment
run = experiment.submit(config=remote_estimator)
run.wait_for_completion(show_output=True)


RunId: diabetes_training_1581123592_51156155
Web View: https://ml.azure.com/experiments/diabetes_training/runs/diabetes_training_1581123592_51156155?wsid=/subscriptions/7d48758f-d40b-4252-854c-e7d8f2ed7645/resourcegroups/contosoml_RG/workspaces/contosoml

Streaming azureml-logs/20_image_build_log.txt

2020/02/08 01:00:12 Downloading source code...
2020/02/08 01:00:13 Finished downloading source code
2020/02/08 01:00:14 Creating Docker network: acb_default_network, driver: 'bridge'
2020/02/08 01:00:14 Successfully set up Docker network: acb_default_network
2020/02/08 01:00:14 Setting up Docker configuration...
2020/02/08 01:00:15 Successfully set up Docker configuration
2020/02/08 01:00:15 Logging in to registry: contosoml.azurecr.io
2020/02/08 01:00:16 Successfully logged into contosoml.azurecr.io
2020/02/08 01:00:16 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_network'
2020/02/08 01:00:16 Scanning for dependencies...
2020/02/08 01:00:


libiconv-1.15        | 2.0 MB    |            |   0% [0m[91m
libiconv-1.15        | 2.0 MB    | ########   |  81% [0m[91m
libiconv-1.15        | 2.0 MB    | ########## | 100% [0m[91m

jupyter_core-4.6.1   | 70 KB     |            |   0% [0m[91m
jupyter_core-4.6.1   | 70 KB     | ########## | 100% [0m[91m

openssl-1.0.2u       | 3.2 MB    |            |   0% [0m[91m
openssl-1.0.2u       | 3.2 MB    | #######6   |  76% [0m[91m
openssl-1.0.2u       | 3.2 MB    | #########6 |  97% [0m[91m
openssl-1.0.2u       | 3.2 MB    | ########## | 100% [0m[91m

fontconfig-2.13.1    | 340 KB    |            |   0% [0m[91m
fontconfig-2.13.1    | 340 KB    | ########## | 100% [0m[91m

pygments-2.5.2       | 669 KB    |            |   0% [0m[91m
pygments-2.5.2       | 669 KB    | ########2  |  82% [0m[91m
pygments-2.5.2       | 669 KB    | ########## | 100% [0m[91m

tk-8.6.10            | 3.2 MB    |            |   0% [0m[91m
tk-8.6.10            | 3.2 MB    | #######6   |  

Executing transaction: ...working... 
done
Collecting azureml-sdk
  Downloading azureml_sdk-1.0.85-py3-none-any.whl (4.6 kB)
Collecting argparse
  Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Collecting pyarrow
  Downloading pyarrow-0.16.0-cp36-cp36m-manylinux1_x86_64.whl (62.9 MB)
Collecting azureml-defaults
  Downloading azureml_defaults-1.0.85-py2.py3-none-any.whl (2.9 kB)
Collecting scikit-learn==0.20.3
  Downloading scikit_learn-0.20.3-cp36-cp36m-manylinux1_x86_64.whl (5.4 MB)
Collecting scipy==1.2.1
  Downloading scipy-1.2.1-cp36-cp36m-manylinux1_x86_64.whl (24.8 MB)
Collecting numpy==1.16.2
  Downloading numpy-1.16.2-cp36-cp36m-manylinux1_x86_64.whl (17.3 MB)
Collecting joblib==0.13.2
  Downloading joblib-0.13.2-py2.py3-none-any.whl (278 kB)
Collecting azureml-train-automl-client==1.0.85.*
  Downloading azureml_train_automl_client-1.0.85.1-py3-none-any.whl (69 kB)
Collecting azureml-train==1.0.85.*
  Downloading azureml_train-1.0.85-py3-none-any.whl (3.2 kB)
Collectin

  Created wheel for liac-arff: filename=liac_arff-2.4.0-py3-none-any.whl size=13333 sha256=c1b3324dbd6bf76e7d32df1bc571ad7f7e31000850d543be1dbed8d904ba0929
  Stored in directory: /root/.cache/pip/wheels/ba/2a/e1/6f7be2e2ea150e2486bff64fd6f0670f4f35f4c8f31c819fb8
  Building wheel for dill (setup.py): started
  Building wheel for dill (setup.py): finished with status 'done'
  Created wheel for dill: filename=dill-0.3.1.1-py3-none-any.whl size=78530 sha256=0d36bea6e5f89864c62598b2442a2aa74dd922d0e88c25d684e27d9f6f7dda13
  Stored in directory: /root/.cache/pip/wheels/09/84/74/d2b4feb9ac9488bc83c475cb2cbe8e8b7d9cea8320d32f3787
  Building wheel for pycparser (setup.py): started
  Building wheel for pycparser (setup.py): finished with status 'done'
  Created wheel for pycparser: filename=pycparser-2.19-py2.py3-none-any.whl size=111031 sha256=60769fa2a37801d9c39136ffdf77e46c53bdaaf9816a391a8117b5523c5c6fbd
  Stored in directory: /root/.cache/pip/wheels/c6/6b/83/2608afaa57ecfb0a66ac89191a8d9bad

6504d449e70c: Verifying Checksum
6504d449e70c: Download complete
b0a763e8ee03: Verifying Checksum
b0a763e8ee03: Download complete
11917a028ca4: Verifying Checksum
11917a028ca4: Download complete
6cc007ad9140: Verifying Checksum
6cc007ad9140: Download complete
6c1698a608f3: Verifying Checksum
6c1698a608f3: Download complete
ec6fc499aa91: Verifying Checksum
ec6fc499aa91: Download complete
96154ae90d38: Verifying Checksum
96154ae90d38: Download complete
29e72d77d02b: Verifying Checksum
29e72d77d02b: Download complete
a6c378d11cbf: Verifying Checksum
a6c378d11cbf: Download complete
781fcb033dd7: Verifying Checksum
781fcb033dd7: Download complete
c7311f3b29e5: Verifying Checksum
c7311f3b29e5: Download complete
997d732be66a: Verifying Checksum
997d732be66a: Download complete
a1298f4ce990: Pull complete
04a3282d9c4b: Pull complete
9b0d3db6dc03: Pull complete
8269c605f3f1: Pull complete
6504d449e70c: Pull complete
4e38f320d0d4: Pull complete
b0a763e8ee03: Pull complete
11917a028ca4: Pull compl

{'runId': 'diabetes_training_1581123592_51156155',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2020-02-08T01:12:29.143075Z',
 'endTimeUtc': '2020-02-08T01:15:12.061253Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': 'e42b9a67-8e81-4250-9a2e-0c15fbcbf44d',
  'azureml.git.repository_uri': 'https://github.com/Sahiep/AzureMLGettingStarted',
  'mlflow.source.git.repoURL': 'https://github.com/Sahiep/AzureMLGettingStarted',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': '9449b28c8526aae14250b181eeeb12728cdcc838',
  'mlflow.source.git.commit': '9449b28c8526aae14250b181eeeb12728cdcc838',
  'azureml.git.dirty': 'True',
  'AzureML.DerivedImageName': 'azureml/azureml_79a615d298dd7cdccaf71fa038da2aa2',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [{'dataset': {'id': 'edf7c34e-47c8-4773-9bfb-f657ee746b

In [8]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
diabetes_training,diabetes_training_1581123592_51156155,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


Now let's register the new version of the model.

In [9]:
from azureml.core import Model

# Register model
run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model', tags={'Training context':'remote compute'}, properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_model version: 1
	 Training context : remote compute
	 AUC : 0.8568632924585982
	 Accuracy : 0.7893333333333333




So far, you've trained the model using a variety of compute options, but always using the same basic algorithm and parameters. As a result, the performance of the model has remained fairly consistent no matter how you've run the training script - and it's not really all that good!

Now that you've seen how to control compute options for a model training experiment, it's time to see how you can leverage the compute scalability of the cloud to experiment with different algorithms and parameters in order to find the best possible model for your data.