# Exercise 3 - Compute Contexts

In [the previous exercise](./02%20-%20From%20Data%20to%20Model.ipynb), you used *datastores* and *datasets* to define shared sources of data that can be consumed in *experiments* and used to train machine learning models. In this exercise, you'll extend your experiments beyond the local compute context and take advantage of the cloud to run experiments in dynamically created compute contexts.

> **Important**: This exercise assumes you have completed the previous exercises in this series - specifically, you must have:
>
> - Created an Azure ML Workspace.
> - Uploaded the diabetes.csv data file to the workspace's default datastore.
> - Registered a **Diabetes Dataset** dataset in the workspace.

## Task 1: Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK. Let's start by ensuring you still have the latest version installed.

In [None]:
#!pip install --upgrade azureml-sdk[notebooks]
import azureml.core
print("Ready to use Azure ML", azureml.core.VERSION)

Now you're ready to connect to your workspace. When you created it in the previous exercise, you saved its configuration; so now you can simply load the workspace from its configuration file.

> **Note**: If the authenticated session with your Azure subscription has expired since you completed the previous exercise, you'll be prompted to reauthenticate.

In [None]:
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to work with', ws.name)

## Task 2: Run an Experiment on Remote Compute
In many cases, your local compute resources may not be sufficient to process a complex or long-running experiment that needs to process a large volume of data; and you may want to take advantage of the ability to dynamically create and use compute resources in the cloud.

Azure ML supports a range of compute targets, which you can define in your workpace and use to run experiments; paying for the resources only when using them.

> **Note**: In this exercise, you'll use an *Azure Machine Learning Compute* container cluster. For more details of the options for compute targets, see the [Azure ML documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-compute-target).

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # Create an AzureMl Compute resource (a container cluster)
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           vm_priority='lowpriority', 
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Look at the **Compute** tab in the workspace in the [Azure portal](https://portal.azure.com) to verify that the compute resource has been created.

You can also use the following code to enumerate the compute targets in your workspace.

In [None]:
for target_name in ws.compute_targets:
    target = ws.compute_targets[target_name]
    print(target.name, target.type)

Now you're ready to run the experiment on the remote compute.

This time, rather than the generic **Estimator** class, you'll use the **SKLearn** class, which is an estimator that is specifically designed for scikit-learn model training. You'll also specify the package dependencies in the constructor for the estimator. The only reason for this is to see that there are different ways to accomplish essentially the same task.

> **Note**: Once again, this will take a while to run as the nodes in the remote compute must be started and configured before the experiment script is run.

In [None]:
from azureml.train.sklearn import SKLearn

# Set the script parameters
script_params = {
    '--regularization': 0.1
}


# Create a new estimator that uses the remote compute
remote_estimator = SKLearn(source_directory=experiment_folder,
                           script_params=script_params,
                           compute_target = cpu_cluster,
                           conda_packages=['pandas','ipykernel','matplotlib'],
                           pip_packages=['azureml-sdk','argparse','pyarrow'],
                           entry_script='diabetes_training.py')

# Run the experiment
run = experiment.submit(config=remote_estimator)
run.wait_for_completion(show_output=True)


In [None]:
run

Now let's register the new version of the model.

In [None]:
from azureml.core import Model

# Register model
run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model', tags={'Training context':'remote compute'}, properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

So far, you've trained the model using a variety of compute options, but always using the same basic algorithm and parameters. As a result, the performance of the model has remained fairly consistent no matter how you've run the training script - and it's not really all that good!

Now that you've seen how to control compute options for a model training experiment, it's time to see how you can leverage the compute scalability of the cloud to experiment with different algorithms and parameters in order to find the best possible model for your data.