# Train with RAPIDS on AzureML

## Prerequisites

* Install the Azure Machine Learning Python SDK and create an Azure ML Workspace

In [None]:
# verify installation and check Azure ML SDK version
import azureml.core

print('SDK version:', azureml.core.VERSION)

In [None]:
# data_dir = '../../data_airline_updated'

## Initialize workspace

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`.

In [None]:
from azureml.core.workspace import Workspace

# if a locally-saved configuration file for the workspace is not available, use the following to load workspace
# ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

datastore = ws.get_default_datastore()
print("Default datastore's name: {}".format(datastore.name))

In [None]:
# datastore.upload(src_dir='../../data_airline_updated', target_path='data_airline', overwrite=False, show_progress=True)

In [None]:
path_on_datastore = 'data_airline'
ds_data = datastore.path(path_on_datastore)
print(ds_data)

## Create AmlCompute

You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.

As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `Standard_NC6s_v3` GPU VMs.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

#choose a name for your cluster
gpu_cluster_name = "gpu-cluster"

if gpu_cluster_name in ws.compute_targets:
    gpu_cluster = ws.compute_targets[gpu_cluster_name]
    if gpu_cluster and type(gpu_cluster) is AmlCompute:
        print('Found compute target. Will use {0} '.format(gpu_cluster_name))
else:
    print("creating new cluster")
    # m_size parameter below could be modified to one of the RAPIDS-supported VM types
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = 'Standard_NC6s_v3', max_nodes = 1)

    #create the cluster
    gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)
    
    #can poll for a minimum number of nodes and for a specific timeout. 
    #if no min node count is provided it uses the scale settings for the cluster
    gpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
#use get_status() to get a detailed status for the current cluster. 
print(gpu_cluster.get_status().serialize())

## Train model on the remote compute

Now that you have your data and training script prepared, you are ready to train on your remote compute.

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on.

In [None]:
import os

project_folder = './train_rapids'
os.makedirs(project_folder, exist_ok=True)

### Prepare training script

Now you will need to create your training script. We log the parameters and the highest accuracy the model achieves:

```python

run.log('Accuracy', np.float(accuracy))
```

These run metrics will become particularly important when we begin hyperparameter tuning our model in the "Tune model hyperparameters" section.

Once your script is ready, copy the training script `train_rapids.py` into your project directory.

In [None]:
notebook_path = os.path.realpath('__file__'+'/../../code')
rapids_script = os.path.join(notebook_path, 'train_rapids.py')
azure_script = os.path.join(notebook_path, 'rapids_csp_azure.py')

In [None]:
import shutil

shutil.copy(rapids_script, project_folder)
shutil.copy(azure_script, project_folder)

### Create an experiment

Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace.

In [None]:
from azureml.core import Experiment

experiment_name = 'train_rapids'
experiment = Experiment(ws, name=experiment_name)

### Create environment

In [None]:
from azureml.core import Environment

# Create the environment
rapids_env = Environment('rapids_env')


# Specify docker steps as a string. Alternatively, load the string from a file

dockerfile = """
FROM rapidsai/rapidsai:0.15-cuda10.2-runtime-ubuntu18.04-py3.7
RUN source activate rapids && \
pip install azureml-sdk==1.13.0 && \
pip install azureml-widgets
"""
# FROM rapidsai/rapidsai-nightly:0.13-cuda10.0-runtime-ubuntu18.04-py3.7

# Set base image to None, because the image is defined by dockerfile
rapids_env.docker.base_image = None
rapids_env.docker.base_dockerfile = dockerfile

rapids_env.python.user_managed_dependencies = True

### Create a RAPIDS estimator

In [None]:
from azureml.train.estimator import Estimator

script_params = {
    '--data_dir': ds_data.as_mount(),
    '--n_estimators': 100,
    '--max_depth': 8,
    '--n_bins': 8,
    '--max_features': 0.6,
}

estimator = Estimator(source_directory=project_folder,
                      script_params=script_params,
                      compute_target=gpu_cluster, 
                      entry_script='train_rapids.py',
#                       custom_docker_image=image_name,
#                       user_managed=user_managed_dependencies
                     environment_definition=rapids_env)

### Submit job

Run your experiment by submitting your estimator object. Note that this call is asynchronous.

In [None]:
run = experiment.submit(estimator)

## Monitor your run

Monitor the progress of the run with a Jupyter widget.The widget is asynchronous and provides live updates every 10-15 seconds until the job completes.

In [None]:
from azureml.widgets import RunDetails

RunDetails(run).show()

In [None]:
# run.cancel()