# Azure ML Reinforcement Learning Sample

Azure ML reinforcement learning is a managed service for running distributed RL (reinforcement learning) simulation and training using the Ray framework.

This example uses Ray rllib to train a Pong playing agent.

### Import libraries

In [None]:
# Azure ML Core imports
import azureml.core
from azureml.core import Workspace
from azureml.core import Experiment
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.runconfig import EnvironmentDefinition
from azureml.widgets import RunDetails
from azureml.tensorboard import Tensorboard

# Azure ML Reinforcement Learning imports
from azureml.contrib.train.rl import ReinforcementLearningEstimator, Ray
from azureml.contrib.train.rl import WorkerConfiguration

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

### Get Azure ML Workspace

Get the Azure ML workspace you created in the 'Workspace Setup' notebook.

Currently, the workspace must be in one of the following regions: `eastus`, `westeurope`, and `westus2`.

In [None]:
ws = Workspace.from_config()
ws.get_details()

### Specify the name of your vnet

The resource group you use must contain a vnet.  Specify the name of the vnet here.

In [None]:
vnet_name = 'your_vnet'

### Define head computing cluster

In this example, we show how to set up separate computing clusters for the Ray head and Ray workers.  First we define the head cluster.

In [None]:
# choose a name for the Ray head cluster
compute_name = 'head-gpu'
compute_min_nodes = 0
compute_max_nodes = 2

# This example uses GPU VM. For using CPU VM, set SKU to STANDARD_D2_V2
vm_size = 'STANDARD_NC6'

if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print(f'found compute target. just use it {compute_name}')
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes,
                                                               vnet_resourcegroup_name = ws.resource_group,
                                                               vnet_name = vnet_name,
                                                               subnet_name = 'default')

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
     # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

### Define worker computing cluster

Now create a computer cluster for the Ray workers.  These are virtual machines to run worker jobs.  Ray can distribute multiple worker tasks on each worker virtual machine.

In [None]:
# choose a name for your Ray worker cluster
worker_compute_name = 'worker-cpu'
worker_compute_min_nodes = 0 
worker_compute_max_nodes = 5

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
worker_vm_size = 'STANDARD_D2_V2'

# Create the compute target if it hasn't been created already
if worker_compute_name in ws.compute_targets:
    worker_compute_target = ws.compute_targets[worker_compute_name]
    if worker_compute_target and type(worker_compute_target) is AmlCompute:
        print('found compute target. just use it {worker_compute_name}')
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = worker_vm_size,
                                                                min_nodes = worker_compute_min_nodes, 
                                                                max_nodes = worker_compute_max_nodes,
                                                               vnet_resourcegroup_name = ws.resource_group,
                                                               vnet_name = vnet_name,
                                                               subnet_name = 'default')

    # create the cluster
    worker_compute_target = ComputeTarget.create(ws, worker_compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    worker_compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
     # For a more detailed view of current AmlCompute status, use get_status()
    print(worker_compute_target.get_status().serialize())

### Create Azure ML Experiment

In [None]:
experiment_name='rllib-pong'

exp = Experiment(workspace=ws, name=experiment_name)

### Create Reinforcement Learning Estimator

The `ReinforcementLearningEstimator` is used to submit a job to Azure Machine Learning to start the Ray experiment run.

You define a `WorkerCOnfiguration` to point to your worker compute cluster, the number of virtual machines you want to use, whether to use GPU, and any dependencies needed by the workers.

In our case, we define the same PIP packages as dependencies for both head and worker notes.  For this problem, the game simulations will run directly on the worker compute.

In [None]:
# The pip packages we will use for both head and worker
pip_packages={
}

# Specify the Ray worker configuration
worker_conf = WorkerConfiguration(
    
    # Azure ML compute cluster to run Ray workers
    compute_target=worker_compute_target, 
    
    # Number of workers
    node_count = 4,
    
    # GPU
    use_gpu=False, 
    
    # PIP packages to use
    pip_packages=pip_packages
)

estimator = ReinforcementLearningEstimator(
    
    # Location of source files
    source_directory='files',
    
    # Python script file
    entry_script="rllib-pong.py",
    
    # Parameters to pass to the script file
    # Define above.
    script_params={},
    
    # The Azure ML compute target set up for Ray head nodes
    compute_target=compute_target,
    
    # Pip packages
    pip_packages=pip_packages,
    
    # GPU usage
    use_gpu=True,
    
    # RL framework.  Currently must be Ray.
    rl_framework=Ray(version="0.7.2"),
    
    # Ray worker configuration defined above.
    worker_configuration=worker_conf,
    
    # Simulator configuration (future use)
    simulator_configuration=None,
    
    # How long to wait for job to start
    job_queue_timeout=3600,
    
    # How long to wait for whole cluster to start
    cluster_coordination_timeout_seconds=3600,
    
    # Maximum time for the whole Ray job to run
    # This will cut off the run after an hour
    max_run_duration_seconds=3600
)

### Submit the estimator to start experiment

In [None]:
run = exp.submit(config=estimator)

### Monitor experiment

Azure ML provides a Jupyter widget to show the real-time status of the experiment.

In [None]:
RunDetails(run).show()

### Stop experiment

To cancel the run, call run.cancel().

If you want to cancel the run from the Azure Workspace portal, cancel one of the child runs.  
Canceling a ReinforcementLearningEstimator run in the portal is not currently supported.

In [None]:
# run.cancel()

### Tensorboard

You can also monitor details of your experiment with Tensorboard.

In [None]:
# You need to wait until the Ray worker run is in the Running state before you can start Tensorboard.
tb = Tensorboard([list(run.get_children())[0]])
tb.start()

In [None]:
# Stop Tensorboard
tb.stop()