Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Multi-step forecasting using a simple encoder-decoder

The hyperparameters of recurrent neural network are tuned using Hyperdrive, a feature of Azure Machine Learning (Azure ML) service. To run this notebook, follow instructions in [configuration notebook](../configuration.ipynb) and provision Azure ML workspace.

The running time depends on the size of your Azure ML cluster. When running with a cluster of xx VMs of NC6 size, the experiment finishes within YY hours.

To clean up all provisioned Azure services, delete the resource group either from Azure Portal or by running:
```bash
az group delete -n <resource-group-name>
```


In [None]:
import azureml
from azureml.core import Workspace

# check core SDK version number
print("You are using Azure ML SDK Version: ", azureml.core.VERSION)

## Initialize workspace
Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the configuration notebook. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`.

In [None]:
try:
    ws = Workspace.from_config()
    print(
        "Workspace name: " + ws.name,
        "Azure region: " + ws.location,
        "Subscription id: " + ws.subscription_id,
        "Resource group: " + ws.resource_group,
        sep="\n",
    )
except:
    print(
        "Workspace not accessible. Change your parameters or create a new workspace using configuration.ipynb notebook."
    )

## Create an Azure ML experiment
Let's create an experiment named `rnn-multistep` and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

In [None]:
from azureml.core import Experiment

exp = Experiment(workspace=ws, name='rnn-multistep')

## Upload data and scripts to default datastore 
A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share.

In [None]:
ds = ws.get_default_datastore()

In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on an AmlCompute cluster for training.

In [None]:
ds.upload_files(
    files=["../../data/GEFCom2014.zip"],
    target_path="energy",
    overwrite=True,
    show_progress=True,
)
ds.upload_files(
    files=["../../common/extract_data.py", "../../common/utils.py"],
    target_path="common",
    overwrite=True,
    show_progress=True,
)

## Create or Attach existing AmlCompute
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.

If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:
1. create the configuration (this step is local and only takes a second)
2. create the cluster (this step will take about **20 seconds**)
3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "gpucluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print("Found existing compute target")
except ComputeTargetException:
    print("Creating a new compute target...")
    compute_config = AmlCompute.provisioning_configuration(
        vm_size="STANDARD_NC6", min_nodes=0, max_nodes=4
    )
    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    # can poll for a minimum number of nodes and for a specific timeout.
    # if no min node count is provided it uses the scale settings for the cluster
    compute_target.wait_for_completion(
        show_output=True, min_node_count=None, timeout_in_minutes=20
    )

# use get_status() to get a detailed status for the current cluster.
print(compute_target.get_status().serialize())

### Azure ML concepts  
Please note the following three things in the code below:
1. The script accepts arguments using the argparse package. In this case there is an argument `--datadir` which specifies the file system folder in which the script can find the MNIST data
```
    parser = argparse.ArgumentParser()
    parser.add_argument('--datadir')
```
2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_context()`. Further down the script is using the `run` to report the loss and accuracy at the end of each epoch via callback.
```
    run.log('Loss', log['loss'])
    run.log('Accuracy', log['acc'])
```
3. When running the script on Azure ML, you can write files out to a folder `./outputs` that is relative to the root directory. This folder is specially tracked by Azure ML in the sense that any files written to that folder during script execution on the remote target will be picked up by Run History; these files (known as artifacts) will be available as part of the run history record.

## Create TensorFlow estimator & add Keras
Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the `gpucluster` as compute target, and pass the mount-point of the datastore to the training code as a parameter.
The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed. In this case, we add `keras` package (for the Keras framework), and additional packages required for running the training script on the compute target.

We also specify `entry_script` as one of the parameters. This is the script that is going to be executed on the compute target. Please examine the script for better understanding of the training process, and the metrics that are being logged. You'll notice that for each training run, we execute `N_EXPERIMENTS = 5` runs, in order to run the experiment with different weight initializations. We noticed that there is a good variation in obtained metrics (validation and test MAPE) across the experiments, so we use average validation MAPE over all `N_EXPERIMENTS` as the evaluation metric.

In [None]:
from azureml.train.dnn import TensorFlow

script_folder = './rnn_multistep_aml'

script_params = {
    "--datadir": ds.path("energy").as_mount(),
    "--scriptdir": ds.path("common").as_mount(),
    "--latent-dim-1": 5,
    "--latent-dim-2": 0,
    "--batch-size": 32,
    "--T": 72,
    "--learning-rate": 0.01,
    "--alpha": 0,
}


est = TensorFlow(
    source_directory=script_folder,
    script_params=script_params,
    compute_target=compute_target,
    conda_packages=["pandas", "numpy"],
    pip_packages=["keras", "matplotlib", "scikit-learn", "xlrd", "azureml-sdk"],
    entry_script="RNN_multi_step_vector_output.py",
    use_gpu=True,
)

## Submit job to run
Submit the estimator to the Azure ML experiment to kick off the execution.

In [None]:
run = exp.submit(est)

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()

In [None]:
run.get_metrics()

In [None]:
run.get_file_names()

## Hyperparameter tuning
We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling.

In [None]:
from azureml.train.hyperdrive import (
    RandomParameterSampling,
    HyperDriveConfig,
    PrimaryMetricGoal,
    choice,
)

ps = RandomParameterSampling(
    {
        "--latent-dim-1": choice(5,10,15),
        "--latent-dim-2": choice(0,5,10),
        "--batch-size": choice(8,16,32),
        "--T": choice(72,168,336),
        "--learning-rate": choice(0.01, 0.001, 0.0001),
        "--alpha": choice(0.1,0.001,0), 
    }
)


Next, we will create a new estimator without the above parameters since they will be passed in later by Hyperdrive configuration. Note we still need to keep the `datadir` and `scriptdir` parameters since they are not hyperparamters we will sweep.

In [None]:
from azureml.train.dnn import TensorFlow

est = TensorFlow(
    source_directory=script_folder,
    script_params={
        "--datadir": ds.path("energy").as_mount(),
        "--scriptdir": ds.path("common").as_mount(),
    },
    compute_target=compute_target,
    conda_packages=["pandas", "numpy"],
    pip_packages=["keras", "matplotlib", "scikit-learn", "xlrd", "azureml-sdk"],
    entry_script="RNN_multi_step_vector_output.py",
    use_gpu=True,
)

Now we are ready to configure a run configuration object, and specify the primary metric `meanValidationMAPE` that's recorded in your training runs. We also want to tell the service that we are looking to minimize this value. We also set the number of total runs to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster.

In [None]:
hdc = HyperDriveConfig(
    estimator=est,
    hyperparameter_sampling=ps,
    primary_metric_name="meanValidationMAPE",
    primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
    max_total_runs=20,
    max_concurrent_runs=4,
    max_duration_minutes=30,
)

Finally, let's launch the hyperparameter tuning job.

In [None]:
hdr = exp.submit(config=hdc)

We can use a run history widget to show the progress. Be patient as this might take a while to complete.

In [None]:
from azureml.widgets import RunDetails

RunDetails(hdr).show()

## Find and register the best model
When all the jobs finish, we can find out the one that has the lowest mean validation MAPE.

In [None]:
best_run = hdr.get_best_run_by_primary_metric()
print(best_run.get_details()['runDefinition']['arguments'])

In [None]:
best_run.get_metrics()

Now let's list the model files uploaded during the run.

In [None]:
print(best_run.get_file_names())

We can then register the folder (and all files in it) as a model named `rnn-multistep-best` under the workspace for deployment.

In [None]:
model = best_run.register_model(model_name='rnn-multistep-best', model_path='outputs/model')