# Train the Best Model
In this notebook, we recover into Python the HyperDrive run that was created in the previous notebook, use it to find the best child run discovered by the HyperDrive search, use that best run's parameters to train a model, and finally register that model.

The steps in this notebook are
- [import libraries](#import),
- [read in the Azure ML workspace](#workspace),
- [recover the HyperDrive run](#recover),
- [use the best hyperparameters found to train a model](#results), and
- [register the trained model](#register).

## Imports  <a id='import'></a>

In [None]:
import os
import pandas as pd
from azureml.core import Workspace, Experiment
from azureml.train.hyperdrive import HyperDriveRun
from azureml.train.estimator import Estimator
from azureml.widgets import RunDetails
import azureml.core
from get_auth import get_auth
print('azureml.core.VERSION={}'.format(azureml.core.VERSION))

## Read in the Azure ML workspace  <a id='workspace'></a>
Read in the the workspace created in a previous notebook.

In [None]:
auth = get_auth()
ws = Workspace.from_config(auth=auth)
ws_details = ws.get_details()
print('Name:\t\t{}\nLocation:\t{}'
      .format(ws_details['name'],
              ws_details['location']))

## Recover the run  <a id='recover'></a>
Get an experiment that ran the search.

In [None]:
exp = Experiment(workspace=ws, name='hypetuning')

Get the ID of the HyperDrive run created in the last notebook. That ID was printed with the run when it was submitted in the previous notebook, and we also saved it in a file. You can also find that ID in Azure Portal on your experiment's page. To see it, you may need to add a `RunId` column to the experiment's table of runs.

In [None]:
run_id_path = "run_id.txt"
with open(run_id_path, "r") as fp:
    run_id = fp.read()
run_id

Use the ID of the HyperDrive run to get a handle to it.

In [None]:
run = HyperDriveRun(exp, run_id)
run

## Use the best hyperparameters to train a model <a id='results'></a>
We can automatically select the best run.

In [None]:
best_run = run.get_best_run_by_primary_metric()
if best_run is None:
    raise Exception("No best run was found")
best_run

Here is the best run's hyperparameter set.

In [None]:
parameter_values = best_run.get_details()['runDefinition']['arguments']
best_parameters = dict(zip(parameter_values[::2], parameter_values[1::2]))
pd.Series(best_parameters, name='Value').to_frame()

We can use these parameters to train and save the best model. We will train with a boosted number of estimators.

In [None]:
estimators = 8 * int(best_parameters['--estimators'])
estimators

Create a new set of parameters to train the best model.

In [None]:
ds = ws.get_default_datastore()
model_parameters = best_parameters.copy()
model_parameters['--data-folder'] = ds.as_mount()
model_parameters['--estimators'] = estimators
model_parameters['--save'] = 'FAQ_ranker'
pd.Series(model_parameters, name='Value').to_frame()

Get the compute target.

In [None]:
cluster_name = 'hypetuning'
compute_target = ws.compute_targets[cluster_name]

Train and save the best model.

In [None]:
model_est = Estimator(source_directory=os.path.join('.', 'scripts'),
                      entry_script='TrainClassifier.py',
                      script_params=model_parameters,
                      compute_target=compute_target,
                      conda_packages=['pandas==0.23.4',
                                      'scikit-learn==0.21.3',
                                      'lightgbm==2.2.1'])
model_run = exp.submit(model_est)
model_run

Wait for the model to be created and saved. This step can take up to two hours.

In [None]:
%%time

model_run_status = model_run.wait_for_completion(wait_post_processing=True)
print(model_run_status['status'])
if model_run_status['status'] not in ['Completed', 'Finalizing']:
    raise Exception('The run did not successfully complete.')

## Register the best model <a id='register'></a>

In [None]:
model = model_run.register_model(model_name='FAQ_ranker', model_path=os.path.join('outputs', 'FAQ_ranker.pkl'))
print(model.name, model.version, sep = '\t')

The [next notebook](06_Test_Best_Model.ipynb) applies the best model to the test data.