Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Train Locally
In this notebook, you will perform the following using Azure Machine Learning.
* Load workspace.
* Configure & execute a local run in a user-managed Python environment.
* Configure & execute a local run in a system-managed Python environment.
* Configure & execute a local run in a Docker environment.
* Register model for operationalization.

In [None]:
import os

from azure_utils.machine_learning.utils import get_workspace_from_config
from azureml.core import Experiment
from azureml.core import ScriptRunConfig
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration

## Initialize Model Hyperparameters

This notebook uses a training script that uses 
[lightgbm](https://lightgbm.readthedocs.io/en/latest/Python-API.html#scikit-learn-api). 
Here we set the number of estimators. 

In [None]:
num_estimators = "10"

## Initialize Workspace

Initialize a workspace object from persisted configuration file.

In [None]:
ws = get_workspace_from_config()
print(ws.name, ws.resource_group, ws.location, sep="\n")

## Create An Experiment
**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics 
and output artifacts from your experiments.

In [None]:
experiment_name = "mlaks-train-on-local"
exp = Experiment(workspace=ws, name=experiment_name)

## Configure & Run

In this section, we show three different ways of locally training your model through Azure ML SDK for demonstration 
purposes. Only one of these runs is sufficient to register the model.


### User-managed environment
Below, we use a user-managed run, which means you are responsible to ensure all the necessary packages that are 
available in the Python environment you choose to run the script. We will use the environment created for this 
tutorial which has Azure ML SDK and other dependencies installed.

In [None]:
# Editing a run configuration property on-fly.
run_config_user_managed = RunConfiguration()

run_config_user_managed.environment.python.user_managed_dependencies = True

# Choose the specific Python environment of this tutorial by pointing to the Python path
run_config_user_managed.environment.python.interpreter_path = (
    "/anaconda/envs/az-ml-realtime-score/bin/python"
)

#### Submit script to run in the user-managed environment
Note that the whole `scripts` folder is submitted for execution, including the `item_selector.py` and `label_rank.py` 
files. The model will be written to `outputs` directory which is a special directory such that all content in this 
directory is automatically uploaded to your workspace. 

In [None]:
scrpt = "create_model.py"
args = [
    "--inputs",
    os.path.abspath("./data_folder"),
    "--outputs",
    "outputs",
    "--estimators",
    num_estimators,
    "--match",
    "5",
]

In [None]:
src = ScriptRunConfig(
    source_directory="./scripts",
    script=scrpt,
    arguments=args,
    run_config=run_config_user_managed,
)
#run = exp.submit(src)

#### Get run history details

In [None]:
#run

Block to wait till run finishes.

In [None]:
#run.wait_for_completion(show_output=True)

Let's check that the model is now available in your workspace.

In [None]:
#run.get_file_names()

Let's retrieve the accuracy of the model from run logs by querying the run metrics.

In [None]:
#run.get_metrics()

### System-managed environment
You can also ask the system to build a new conda environment and execute your scripts in it. The environment is built 
once and will be reused in subsequent executions as long as the conda dependencies remain unchanged. 

In [None]:
run_config_system_managed = RunConfiguration()
run_config_system_managed.environment.python.user_managed_dependencies = False
run_config_system_managed.auto_prepare_environment = True

Let's specify the conda and pip dependencies.

In [None]:
# Specify conda dependencies with scikit-learn and pandas
conda_pack = ["scikit-learn==0.19.1", "pandas==0.23.3"]
requirements = ["lightgbm==2.1.2", "azureml-defaults==1.0.57"]

In [None]:
cd = CondaDependencies.create(conda_packages=conda_pack, pip_packages=requirements)
run_config_system_managed.environment.python.conda_dependencies = cd

#### Submit script to run in the system-managed environment
A new conda environment is built based on the conda dependencies object. If you are running this for the first time,  
this might take up to 5 minutes. But this conda environment is reused so long as you don't change the conda 
dependencies.

In [None]:
src = ScriptRunConfig(
    source_directory="./scripts",
    script=scrpt,
    arguments=args,
    run_config=run_config_system_managed,
)
run = exp.submit(src)
run

Block and wait till run finishes.

In [None]:
run.wait_for_completion(show_output = True)

In [None]:
run.get_file_names()

In [None]:
run.get_metrics()

### Docker-based execution
**IMPORTANT**: You must have Docker engine installed locally in order to use this execution mode. If your kernel is 
already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.

You can also ask the system to pull down a Docker image and execute your scripts in it. We will use the 
`continuumio/miniconda3` image for that purpose.

In [None]:
run_config_docker = RunConfiguration()
run_config_docker.environment.python.user_managed_dependencies = False
run_config_docker.auto_prepare_environment = True
run_config_docker.environment.docker.enabled = True
run_config_docker.environment.docker.base_image = "continuumio/miniconda3"

# Specify conda and pip dependencies
cd = CondaDependencies.create(conda_packages=conda_pack, pip_packages=requirements)
run_config_docker.environment.python.conda_dependencies = cd

Here, we map the local `data_folder` that includes the training and testing data to the docker container using `-v` 
flag.

In [None]:
host_dir = os.path.abspath("./data_folder")
container_dir = "/data_folder"
docker_arg = "{}:{}".format(host_dir, container_dir)

This time the run will use the mapped `data_folder` inside the docker container to find the data files.

In [None]:
args = [
    "--inputs",
    "/data_folder",
    "--outputs",
    "outputs",
    "--estimators",
    num_estimators,
    "--match",
    "5",
]

In [None]:
run_config_docker.environment.docker.arguments.append("-v")
run_config_docker.environment.docker.arguments.append(docker_arg)

In [None]:
src = ScriptRunConfig(
    source_directory="./scripts",
    script=scrpt,
    arguments=args,
    run_config=run_config_docker,
)

In [None]:
run = exp.submit(src)

In [None]:
run.wait_for_completion(show_output=True)

## Register Model

In [None]:
run.get_metrics()

## Register Model

We now register the model with the workspace so that we can later deploy the model.

In [None]:
# supply a model name, and the full path to the serialized model file.
model = run.register_model(
    model_name="question_match_model", model_path="./outputs/model.pkl"
)

In [None]:
print(model.name, model.version, model.url, sep="\n")