Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# 02. Train locally
* Create or load workspace.
* Create scripts locally.
* Create `train.py` in a folder, along with a `my.lib` file.
* Configure & execute a local run in a user-managed Python environment.
* Configure & execute a local run in a system-managed Python environment.
* Configure & execute a local run in a Docker environment.
* Query run metrics to find the best model
* Register model for operationalization.

## Prerequisites
Make sure you go through the [Configuration](../../../configuration.ipynb) Notebook first if you haven't.

In [1]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

ModuleNotFoundError: No module named 'azureml.core'

## Initialize Workspace

Initialize a workspace object from persisted configuration.

In [2]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

Found the config file in: D:\Users\sravana\divergence-sciencebox\ipynb\2019-03-MARCH-ENTERPRISE-ML\config.json
eml01-student99
eml-training
southcentralus
3c3bb71f-3a4c-436f-9e0a-7407d75a82fa


## Create An Experiment
**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [3]:
from azureml.core import Experiment
experiment_name = 'train-on-local'
exp = Experiment(workspace=ws, name=experiment_name)

## View `train.py`

`train.py` is already created for you.

In [4]:
with open('./train.py', 'r') as f:
    print(f.read())

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.

from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from azureml.core.run import Run
from sklearn.externals import joblib
import os
import numpy as np
import mylib

os.makedirs('./outputs', exist_ok=True)

X, y = load_diabetes(return_X_y=True)

run = Run.get_context()

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=0)
data = {"train": {"X": X_train, "y": y_train},
        "test": {"X": X_test, "y": y_test}}

# list of numbers from 0.0 to 1.0 with a 0.05 interval
alphas = mylib.get_alphas()

for alpha in alphas:
    # Use Ridge algorithm to create a regression model
    reg = Ridge(alpha=alpha)
    reg.fit(data["train"]["X"], data["train

Note `train.py` also references a `mylib.py` file.

In [5]:
with open('./mylib.py', 'r') as f:
    print(f.read())

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.

import numpy as np


def get_alphas():
    # list of numbers from 0.0 to 1.0 with a 0.05 interval
    return np.arange(0.0, 1.0, 0.05)



## Configure & Run
### User-managed environment
Below, we use a user-managed run, which means you are responsible to ensure all the necessary packages are available in the Python environment you choose to run the script.

In [6]:
from azureml.core.runconfig import RunConfiguration

# Editing a run configuration property on-fly.
run_config_user_managed = RunConfiguration()

run_config_user_managed.environment.python.user_managed_dependencies = True

# You can choose a specific Python environment by pointing to a Python path 
#run_config.environment.python.interpreter_path = '/home/johndoe/miniconda3/envs/sdk2/bin/python'

#### Submit script to run in the user-managed environment
Note whole script folder is submitted for execution, including the `mylib.py` file.

In [7]:
from azureml.core import ScriptRunConfig

src = ScriptRunConfig(source_directory='./', script='train.py', run_config=run_config_user_managed)
run = exp.submit(src)

#### Get run history details

In [8]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
train-on-local,train-on-local_1551996286_0273ebdd,azureml.scriptrun,Running,Link to Azure Portal,Link to Documentation


Block to wait till run finishes.

In [9]:
run.wait_for_completion(show_output=True)

RunId: train-on-local_1551996286_0273ebdd

Execution Summary
RunId: train-on-local_1551996286_0273ebdd



{'runId': 'train-on-local_1551996286_0273ebdd',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2019-03-07T22:04:47.393135Z',
 'endTimeUtc': '2019-03-07T22:05:10.595056Z',
 'properties': {'azureml.runsource': 'experiment',
  'ContentSnapshotId': 'bfae406c-950d-457a-891e-3fc1cb35ad64'},
 'runDefinition': {'Script': 'train.py',
  'Arguments': [],
  'SourceDirectoryDataStore': None,
  'Framework': 0,
  'Communicator': 0,
  'Target': 'local',
  'DataReferences': {},
  'JobName': None,
  'AutoPrepareEnvironment': True,
  'MaxRunDurationSeconds': None,
  'NodeCount': 1,
  'Environment': {'Python': {'InterpreterPath': 'python',
    'UserManagedDependencies': True,
    'CondaDependencies': {'name': 'project_environment',
     'dependencies': ['python=3.6.2', {'pip': ['azureml-defaults']}]}},
   'EnvironmentVariables': {'EXAMPLE_ENV_VAR': 'EXAMPLE_VALUE'},
   'Docker': {'BaseImage': 'mcr.microsoft.com/azureml/base:0.2.0',
    'Enabled': False,
    'SharedVolumes': True,
    'Prepa

### System-managed environment
You can also ask the system to build a new conda environment and execute your scripts in it. The environment is built once and will be reused in subsequent executions as long as the conda dependencies remain unchanged. 

In [10]:
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies

run_config_system_managed = RunConfiguration()

run_config_system_managed.environment.python.user_managed_dependencies = False
run_config_system_managed.auto_prepare_environment = True

# Specify conda dependencies with scikit-learn
cd = CondaDependencies.create(conda_packages=['scikit-learn'])
run_config_system_managed.environment.python.conda_dependencies = cd

#### Submit script to run in the system-managed environment
A new conda environment is built based on the conda dependencies object. If you are running this for the first time,  this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies.

In [11]:
src = ScriptRunConfig(source_directory="./", script='train.py', run_config=run_config_system_managed)
run = exp.submit(src)

#### Get run history details

In [12]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
train-on-local,train-on-local_1551996345_51d00a9a,azureml.scriptrun,Preparing,Link to Azure Portal,Link to Documentation


Block and wait till run finishes.

In [13]:
run.wait_for_completion(show_output = True)

RunId: train-on-local_1551996345_51d00a9a

Streaming azureml-logs/60_control_log.txt

Streaming log file azureml-logs/60_control_log.txt
Logging experiment preparation status in history service.
Running ['conda', '--version']
Creating Conda environment...
Solving environment: ...working... done


  current version: 4.5.11
  latest version: 4.6.7

Please update conda by running

    $ conda update -n base -c defaults conda



Downloading and Extracting Packages
numpy-1.16.2         | 49 KB     | ########## | 100% 
scipy-1.2.1          | 14.0 MB   | ########## | 100% 
python-3.6.2         | 17.1 MB   | ########## | 100% 
numpy-base-1.16.2    | 4.1 MB    | ########## | 100% 
scikit-learn-0.20.2  | 5.2 MB    | ########## | 100% 
wheel-0.33.1         | 57 KB     | ########## | 100% 
pip-19.0.3           | 1.9 MB    | ########## | 100% 
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting azureml-defaults==1.

Collecting azure-mgmt-containerregistry>=2.0.0 (from azureml-core==1.0.2.*->azureml-defaults==1.0.2->-r C:\Users\sravana\AppData\Local\Temp\azureml_runs\train-on-local_1551996345_51d00a9a\azureml-setup\condaenv.tivrpd5o.requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/7a/4b/06040d992f93531e32c5f7cf7884f3edfec11f76f802dd9224c1116c3129/azure_mgmt_containerregistry-2.7.0-py2.py3-none-any.whl (509kB)
Collecting azure-mgmt-storage>=1.5.0 (from azureml-core==1.0.2.*->azureml-defaults==1.0.2->-r C:\Users\sravana\AppData\Local\Temp\azureml_runs\train-on-local_1551996345_51d00a9a\azureml-setup\condaenv.tivrpd5o.requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/d9/87/ab44b9d9627ff91825ba5f5a39092ebfe97a90679008609db4c479036591/azure_mgmt_storage-3.1.1-py2.py3-none-any.whl (696kB)
Collecting azure-graphrbac>=0.40.0 (from azureml-core==1.0.2.*->azureml-defaults==1.0.2->-r C:\Users\sravana\AppData\Local\Temp\azureml_runs\train-on-local_

Collecting humanfriendly>=4.7 (from azure-cli-core>=2.0.38->azureml-core==1.0.2.*->azureml-defaults==1.0.2->-r C:\Users\sravana\AppData\Local\Temp\azureml_runs\train-on-local_1551996345_51d00a9a\azureml-setup\condaenv.tivrpd5o.requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/90/df/88bff450f333114680698dc4aac7506ff7cab164b794461906de31998665/humanfriendly-4.18-py2.py3-none-any.whl (73kB)
Collecting knack==0.5.1 (from azure-cli-core>=2.0.38->azureml-core==1.0.2.*->azureml-defaults==1.0.2->-r C:\Users\sravana\AppData\Local\Temp\azureml_runs\train-on-local_1551996345_51d00a9a\azureml-setup\condaenv.tivrpd5o.requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/27/46/0a6d7471efcc519e392640f6933c0f644bbf602971e64797108292cb3623/knack-0.5.1-py2.py3-none-any.whl
Collecting jmespath (from azure-cli-core>=2.0.38->azureml-core==1.0.2.*->azureml-defaults==1.0.2->-r C:\Users\sravana\AppData\Local\Temp\azureml_runs\train-on-local_1551996345

Collecting pycparser (from cffi!=1.11.3,>=1.8->cryptography!=1.9,!=2.0.*,!=2.1.*,!=2.2.*->azureml-core==1.0.2.*->azureml-defaults==1.0.2->-r C:\Users\sravana\AppData\Local\Temp\azureml_runs\train-on-local_1551996345_51d00a9a\azureml-setup\condaenv.tivrpd5o.requirements.txt (line 1))
Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.5.0->msrest>=0.5.1->azureml-core==1.0.2.*->azureml-defaults==1.0.2->-r C:\Users\sravana\AppData\Local\Temp\azureml_runs\train-on-local_1551996345_51d00a9a\azureml-setup\condaenv.tivrpd5o.requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/16/95/699466b05b72b94a41f662dc9edf87fda4289e3602ecd42d27fcaddf7b56/oauthlib-3.0.1-py2.py3-none-any.whl (142kB)
Building wheels for collected packages: futures, antlr4-python3-runtime
  Building wheel for futures (setup.py): started
  Building wheel for futures (setup.py): finished with status 'done'
  Stored in directory: C:\Users\sravana\AppData\Local\pip\Cache\wheels\f3\f9\c7\4fbf1faa6038f

{'runId': 'train-on-local_1551996345_51d00a9a',
 'target': 'local',
 'status': 'Finalizing',
 'startTimeUtc': '2019-03-07T22:10:36.759832Z',
 'properties': {'azureml.runsource': 'experiment',
  'ContentSnapshotId': 'b5b34175-e9e0-4f6e-ac05-6e60c865ae95'},
 'runDefinition': {'Script': 'train.py',
  'Arguments': [],
  'SourceDirectoryDataStore': None,
  'Framework': 0,
  'Communicator': 0,
  'Target': 'local',
  'DataReferences': {},
  'JobName': None,
  'AutoPrepareEnvironment': True,
  'MaxRunDurationSeconds': None,
  'NodeCount': 1,
  'Environment': {'Python': {'InterpreterPath': 'python',
    'UserManagedDependencies': False,
    'CondaDependencies': {'name': 'project_environment',
     'dependencies': ['python=3.6.2',
      {'pip': ['azureml-defaults==1.0.2']},
      'scikit-learn']}},
   'EnvironmentVariables': {'EXAMPLE_ENV_VAR': 'EXAMPLE_VALUE'},
   'Docker': {'BaseImage': 'mcr.microsoft.com/azureml/base:0.2.0',
    'Enabled': False,
    'SharedVolumes': True,
    'Preparation': 

### Docker-based execution
**IMPORTANT**: You must have Docker engine installed locally in order to use this execution mode. If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.
NOTE: The GPU base image must be used on Microsoft Azure Services only such as ACI, AML Compute, Azure VMs, and AKS.

You can also ask the system to pull down a Docker image and execute your scripts in it.

In [None]:
run_config_docker = RunConfiguration()
run_config_docker.environment.python.user_managed_dependencies = False
run_config_docker.auto_prepare_environment = True
run_config_docker.environment.docker.enabled = True
run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE

# Specify conda dependencies with scikit-learn
cd = CondaDependencies.create(conda_packages=['scikit-learn'])
run_config_docker.environment.python.conda_dependencies = cd

src = ScriptRunConfig(source_directory="./", script='train.py', run_config=run_config_docker)

Submit script to run in the system-managed environment
A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies.




In [None]:
import subprocess

# Check if Docker is installed and Linux containers are enables
if subprocess.run("docker -v", shell=True) == 0:
    out = subprocess.check_output("docker system info", shell=True, encoding="ascii").split("\n")
    if not "OSType: linux" in out:
        print("Switch Docker engine to use Linux containers.")
    else:
        run = exp.submit(src)
else:
    print("Docker engine not installed.")

In [None]:
#Get run history details
run

In [None]:
run.wait_for_completion(show_output=True)

## Query run metrics

In [None]:
# get all metris logged in the run
run.get_metrics()
metrics = run.get_metrics()

Let's find the model that has the lowest MSE value logged.

In [None]:
import numpy as np

best_alpha = metrics['alpha'][np.argmin(metrics['mse'])]

print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(
    min(metrics['mse']), 
    best_alpha
))

You can also list all the files that are associated with this run record

In [None]:
run.get_file_names()

We know the model `ridge_0.40.pkl` is the best performing model from the eariler queries. So let's register it with the workspace.

In [None]:
# supply a model name, and the full path to the serialized model file.
model = run.register_model(model_name='best_ridge_model', model_path='./outputs/ridge_0.40.pkl')

In [None]:
print(model.name, model.version, model.url)

Now you can deploy this model following the example in the 01 notebook.