# 2. Layering Azure ML on the data scientist's training code

This notebook is a sample you can use to show a Data Scientist how to leverage AML Service for transient training compute provisioning, model training, and registering the trained model into AML model registry.  Section 8 in this notebook will be leveraged as a .py script in the DevOps pipeline.

### 1. Declare environment related variables
Make changes to match your environment.

In [9]:
# Azure subscription
subscription_id = "<your subscription GUID>" 

# Resource Group 
resource_group = "ncr-mlops-rg" 

# Workspace Name and Azure Region of the Azure Machine Learning Workspace
workspace_name = "ncramlws" 
workspace_region = "eastus2" 

# Other variables
experiment_name = 'chd-prediction-manual'
project_dir = './chd'
deployment_dir = './deploy'
model_name = 'chd-predictor-manual'
model_description = 'Model to predict coronory heart disease'

# AML managed compute to be spun up for training
vm_name = "chd-manual"

### 2. Load necessary packages

In [2]:
import os
import logging

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.compute import ComputeTarget
from azureml.core.model import Model
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.core import Workspace
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.widgets import RunDetails

### 3. Instantiate Azure Machine Learning Workspace reference

In [3]:
# Instantiate an AML workspace - leverage existing, if not - create new
ws = Workspace.create(
    name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group, 
    location = workspace_region,
    exist_ok = True) #Leverage existing

ws.write_config()
print('Workspace configuration succeeded')

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code ALEZ64NA8 to authenticate.
Interactive authentication successfully completed.
Workspace configuration succeeded


### 4. Instantiate experiment 

In [4]:
# Instantiate an experiment in the AML workspace
experiment = Experiment(ws, experiment_name)

### 5. Configure project directory

In [5]:
# Create project directory
if not os.path.exists(project_dir):
    os.makedirs(project_dir)

### 6. Provision Azure Machine Learning Managed Compute for training

In [10]:
# Provision AML managed compute 
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

try:
    compute_target = ComputeTarget(workspace=ws, name=vm_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D12_V2',
                                                           min_nodes=1, max_nodes=1)

    # create the cluster
    compute_target = ComputeTarget.create(ws, vm_name, compute_config)
    # Show output
    compute_target.wait_for_completion(show_output=True)

Creating a new compute target...
Creating
Succeeded..................
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned


### 7. Configure environment 

In [12]:
# Create Docker based environment with scikit-learn installed
training_venv = Environment("training_venv")

training_venv.docker.enabled = True
training_venv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])

### 8. Create training script

In [13]:
%%writefile $project_dir/train.py

# Load necessary packages
import pandas as pd
import numpy as np
import pickle
import os

import sklearn
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel

# Read training dataset into pandas dataframe
# Replace with your dataset URL
dataset_url = ('https://ncrmlopssa.blob.core.windows.net/chd-dataset/framingham.csv')
df = pd.read_csv(dataset_url)

# create a boolean array of smokers
smoke = (df['currentSmoker']==1)
# Apply mean to NaNs in cigsPerDay but using a set of smokers only
df.loc[smoke,'cigsPerDay'] = df.loc[smoke,'cigsPerDay'].fillna(df.loc[smoke,'cigsPerDay'].mean())

# Fill out missing values
df['BPMeds'].fillna(0, inplace = True)
df['glucose'].fillna(df.glucose.mean(), inplace = True)
df['totChol'].fillna(df.totChol.mean(), inplace = True)
df['education'].fillna(1, inplace = True)
df['BMI'].fillna(df.BMI.mean(), inplace = True)
df['heartRate'].fillna(df.heartRate.mean(), inplace = True)

# Features and label
features = df.iloc[:,:-1]
result = df.iloc[:,-1] # the last column is what we are about to forecast

# Train & Test split
X_train, X_test, y_train, y_test = train_test_split(features, result, test_size = 0.2, random_state = 14)

# RandomForest classifier
clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
clf.fit(X_train, y_train)

# Create a selector object that will use the random forest classifier to identify
# features that have an importance of more than 0.12
sfm = SelectFromModel(clf, threshold=0.12)

# Train the selector
sfm.fit(X_train, y_train)

# Features selected
feat_labels = list(features.columns.values) # creating a list with features' names
for feature_list_index in sfm.get_support(indices=True):
    print(feat_labels[feature_list_index])

# Feature importance
importances = clf.feature_importances_
std = np.std([tree.feature_importances_ for tree in clf.estimators_],
             axis=0)
indices = np.argsort(importances)[::-1]

print("Feature ranking:")
for f in range(X_train.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

# With only imporant features. Can check X_important_train.shape[1]
X_important_train = sfm.transform(X_train)
X_important_test = sfm.transform(X_test)

clf_important = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=-1)
clf_important.fit(X_important_train, y_train)

# Save the model to disk
os.makedirs('./outputs/model', exist_ok=True)

filename = './outputs/model/chd-rf-model'
pickle.dump(clf_important, open(filename, 'wb'))
print("model saved in ././outputs/model/chd-rf-model folder")
print("Saving model completed")

Writing ./chd/train.py


### 9. Start the model training/experiment using compute and script from earlier

In [14]:
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

src = ScriptRunConfig(source_directory=project_dir, script='train.py')

# Set compute target to the one created in previous step
src.run_config.target = compute_target.name

# Set environment
src.run_config.environment = training_venv
 
run = experiment.submit(config=src)
run

Experiment,Id,Type,Status,Details Page,Docs Page
chd-prediction-manual,chd-prediction-manual_1580157944_33127674,azureml.scriptrun,Starting,Link to Azure Machine Learning studio,Link to Documentation


### 10. Poll for model training/experiment completion

In [15]:
%%time
# Shows output of the run on stdout
run.wait_for_completion(show_output=True)

RunId: chd-prediction-manual_1580157944_33127674
Web View: https://ml.azure.com/experiments/chd-prediction-manual/runs/chd-prediction-manual_1580157944_33127674?wsid=/subscriptions/8911c4ed-b897-45e1-b9e3-78b46acf7a6d/resourcegroups/ncr-mlops-rg/workspaces/ncramlws

Streaming azureml-logs/55_azureml-execution-tvmps_cafebedcfce63c21aa8a9da85b59ce03e087c956699c1d6ec2747319f634e677_d.txt

2020-01-27T20:45:58Z Starting output-watcher...
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_19232feefffd599043edd6ed03eca15c
a1298f4ce990: Pulling fs layer
04a3282d9c4b: Pulling fs layer
9b0d3db6dc03: Pulling fs layer
8269c605f3f1: Pulling fs layer
6504d449e70c: Pulling fs layer
4e38f320d0d4: Pulling fs layer
b0a763e8ee03: Pulling fs layer
11917a028ca4: Pulling fs layer
a6c378d11cbf: Pulling fs layer
6cc007ad9140: Pulling fs layer
6c1698a608f3: Pulling fs layer
5e0fb50547ec: Pulling fs layer
eec42ec654a6: Pulling fs layer
c83a6da04974: Pulling fs layer
41223a6c4331: Pul

{'runId': 'chd-prediction-manual_1580157944_33127674',
 'target': 'chd-manual',
 'status': 'Completed',
 'startTimeUtc': '2020-01-27T20:45:58.869692Z',
 'endTimeUtc': '2020-01-27T20:48:27.42488Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '11917454-a0bf-4182-94e4-a64f29838f1d',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'runDefinition': {'script': 'train.py',
  'arguments': [],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'chd-manual',
  'dataReferences': {},
  'data': {},
  'jobName': None,
  'maxRunDurationSeconds': None,
  'nodeCount': 1,
  'environment': {'name': 'training_venv',
   'version': 'Autosave_2020-01-27T20:45:46Z_0f665ad0',
   'python': {'interpreterPath': 'python',
    'userManagedDependencies': False,
    'condaDependencies': {'channels': ['conda-forge'],
     'dependencies': ['pyth

### 11. Check experiment status

In [16]:
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

### 12. Check for success and Register model to model registry

In [17]:
if run.get_status() == 'Completed':
    print("Training completed successfully!")
    model_run = run.register_model(model_name=model_name,  
                               model_path="././outputs/model/chd-rf-model",
                               tags={"type": "classification", "description": model_description, "run_id": run.id})
    print("Model registered with version number: ", model_run.version)
else:
    print("Training failed!")
    Exception("Training failed!")
    

Training completed successfully!
Model registered with version number:  1


## Next
Return to the lab guide for the next step.