# Aggregated Overview

## 1 Getting Started
- Connect to your Workspace
- Run an Experiment Script
- View Experiment Results
- View Experiment Run History

## 2 Training Models
v1
- Create a Training Script
- Use an Estimator to run the Script as an Experiment
- Register the Trained Model

v2
- Create a Parameterized Training Script
- Use a Framework-Specific Estimator
- Register a New Version of the Model

## 3 Working with Data
- View Datastores
- Upload Data to a Datastore
- Train a Model from a Datastore
- Create a Tabular Dataset
- Create a Files Dataset
- Register a Dataset
- Train a Model from a Tabular Dataset
- Train a Model from a Files Dataset

## 4 Working with Compute
- Prepare Dataset
- Create a Training Script
- Define an Environment
- Run an Experiment on a Remote Compute Cluster

## 5 Creating a Pipeline
- Prepare Training Data
- Create Scripts for Pipeline Steps (train model, register model)
- Prepare Compute Target & Environment for Pipeline Steps
- Create and Run a Pipeline
- Publish the Pipeline

## 6 Deploying a Model
- Train and Register a Model
- Deploy a Model as a Webservice
- Use the Webservice

# Connect to Your Workspace
1. Connect to your Workspace

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, "loaded")

# View and Create Datastores and Datasets

In [None]:
# VIEW DATASTORES

# Get the default datastore
default_ds = ws.get_default_datastore()

# Enumerate all datastores, indicating which is the default
for ds_name in ws.datastores:
    print(ds_name, "- Default =", ds_name == default_ds.name)

In [None]:
# UPLOAD DATA TO DATASTORE

default_ds.upload_files(files=['./data/diabetes.csv', './data/diabetes2.csv'], # Upload the diabetes csv files in /data
                        target_path='diabetes-data/', # Put it in a folder path in the datastore
                        overwrite=True, # Replace existing files of the same name
                        show_progress=True)

In [None]:
# CREATE A TABULAR DATASET

from azureml.core import Dataset

# Get the default datastore
default_ds = ws.get_default_datastore()

#Create a tabular dataset from the path on the datastore (this may take a short while)
tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

# Display the first 20 rows as a Pandas dataframe
tab_data_set.take(20).to_pandas_dataframe()

In [None]:
# CREATE A FILES DATASET

#Create a file dataset from the path on the datastore (this may take a short while)
file_data_set = Dataset.File.from_files(path=(default_ds, 'diabetes-data/*.csv'))

# Get the files in the dataset
for file_path in file_data_set.to_path():
    print(file_path)

In [None]:
# REGISTER DATASETS

# Register the tabular dataset
try:
    tab_data_set = tab_data_set.register(workspace=ws, 
                                         name='diabetes dataset',
                                         description='diabetes data',
                                         tags = {'format':'CSV'},
                                         create_new_version=True)
except Exception as ex:
    print(ex)

# Register the file dataset
try:
    file_data_set = file_data_set.register(workspace=ws,
                                           name='diabetes file dataset',
                                           description='diabetes files',
                                           tags = {'format':'CSV'},
                                           create_new_version=True)
except Exception as ex:
    print(ex)

print('Datasets registered')

In [None]:
# UPLOAD AND REGISTER DATASETS

from azureml.core import Dataset

default_ds = ws.get_default_datastore()

if 'diabetes dataset' not in ws.datasets:
    default_ds.upload_files(
        files=['./data/diabetes.csv', './data/diabetes2.csv'], # Upload the diabetes csv files in /data
        target_path='diabetes-data/', # Put it in a folder path in the datastore
        overwrite=True, # Replace existing files of the same name
        show_progress=True)

    #Create a tabular dataset from the path on the datastore (this may take a short while)
    tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

    # Register the tabular dataset
    try:
        tab_data_set = tab_data_set.register(workspace=ws, 
                                             name='diabetes dataset',
                                             description='diabetes data',
                                             tags = {'format':'CSV'},
                                             create_new_version=True)
        print('Dataset registered.')
    except Exception as ex:
        print(ex)
else:
    print('Dataset already registered.')

In [None]:
# VIEW DATASETS

print("Datasets:")
for dataset_name in list(ws.datasets.keys()):
    dataset = Dataset.get_by_name(ws, dataset_name)
    print("\t", dataset.name, 'version', dataset.version)

# Train Models in Experiments

(Lots of redundancy in these code blocks. Need to stitch them together.)

1. Run an Experiment Script
2. Create a Training Script
2. Use an Estimator to run the Script as an Experiment
2. Create a Parameterized Training Script
2. Use a Framework-Specific Estimator
3. Train a Model from a Datastore
3. Train a Model from a Tabular Dataset
3. Train a Model from a Files Dataset
4. Create a Training Script
5. Create Scripts for Pipeline Steps (train model, register model)
4. Define an Environment
4. Run an Experiment on a Remote Compute Cluster

In [None]:
# WRITE A PYTHON SCRIPT CONTAINING CODE FOR THE EXPERIMENT, SAVE IT IN THE EXPERIMENT FOLDER

%%writefile $training_folder/diabetes_training.py
# Import libraries
from azureml.core import Run
import pandas as pd
import numpy as np
import joblib
import argparse
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Get the experiment run context
run = Run.get_context()

# Set regularization hyperparameter
parser = argparse.ArgumentParser()
parser.add_argument('--reg_rate', type=float, dest='reg', default=0.01, help='regularization rate')
parser.add_argument('--output_folder', type=str, dest='output_folder', default="diabetes_model", help='output folder')

args = parser.parse_args()
output_folder = args.output_folder
reg = args.reg

# load the diabetes dataset
print("Loading Data...")
# load the diabetes dataset
diabetes = run.input_datasets['diabetes'].to_pandas_dataframe()

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness',
                 'SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# Save the trained model
# note file saved in the outputs folder is automatically uploaded into experiment record
os.makedirs(output_folder, exist_ok=True)
output_path = output_folder + "/model.pkl"
joblib.dump(value=model, filename=output_path)

run.complete()

In [None]:
# CONFIGURE AND SUBMIT THE EXPERIMENT

from azureml.train.sklearn import SKLearn
from azureml.widgets import RunDetails


script_params = {'--reg_rate': 0.1}

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Get the environment
diabetes_env = Environment.get(ws, 'diabetes-experiment-env')

# Create an estimator
estimator = Estimator(source_directory=experiment_folder,
                    entry_script='diabetes_training.py',
                    script_params=script_params,
                    compute_target = 'local',  # or cluster name
                    inputs=[diabetes_ds.as_named_input('diabetes')], 
                    environment_definition = diabetes_env
                   )

# Create an experiment
experiment_name = 'diabetes-training'
experiment = Experiment(workspace = ws, name = experiment_name)

# Run the experiment
run = experiment.submit(config=estimator)

# Show the run details while running
RunDetails(run).show()
run.wait_for_completion()

# Review Experiements and Register Models

In [None]:
# VIEW EXPERIMENT PROGRESS

from azureml.widgets import RunDetails

RunDetails(run).show()

In [None]:
# VIEW EXPERIMENT RESULTS

import json

# Get run details
details = run.get_details()
print(details)

# Get logged metrics
metrics = run.get_metrics()
print(json.dumps(metrics, indent=2))

# Get output files
files = run.get_file_names()
print(json.dumps(files, indent=2))

In [None]:
# VIEW EXPERIMENT RUN HISTORY

from azureml.core import Experiment, Run

diabetes_experiment = ws.experiments['diabetes-experiment']
for logged_run in diabetes_experiment.get_runs():
    print('Run ID:', logged_run.id)
    metrics = logged_run.get_metrics()
    for key in metrics.keys():
        print('-', key, metrics.get(key))

In [None]:
# REGISTER TRAINED MODEL

from azureml.core import Model

# Register the model
run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
                   tags={'Training context':'Estimator'},
                   properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

In [None]:
# VIEW REGISTERED MODELS

from azureml.core import Model

for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

In [None]:
# REGISTER MODEL AS A PIPELINE STEP

%%writefile $experiment_folder/register_diabetes.py
# Import libraries
import argparse
import joblib
from azureml.core import Workspace, Model, Run

# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument('--model_folder', type=str, dest='model_folder', default="diabetes_model", help='model location')
args = parser.parse_args()
model_folder = args.model_folder

# Get the experiment run context
run = Run.get_context()

# load the model
print("Loading model from " + model_folder)
model_file = model_folder + "/model.pkl"
model = joblib.load(model_file)

Model.register(workspace=run.experiment.workspace,
               model_path = model_file,
               model_name = 'diabetes_model',
               tags={'Training context':'Pipeline'})

run.complete()

# Build and Run Pipelines

In [None]:
# PREPARE A COMPUTE TARGET

azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = "aml-cluster"

# Verify that cluster exists
try:
    pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If not, create it
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4,
                                                           idle_seconds_before_scaledown=1800)
    pipeline_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

pipeline_cluster.wait_for_completion(show_output=True)

In [None]:
# CREATE PYTHON ENVIRONMENT AND RUN CONFIGURATION

from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration

# Create a Python environment for the experiment
diabetes_env = Environment("diabetes-pipeline-env")
diabetes_env.python.user_managed_dependencies = False # Let Azure ML manage dependencies
diabetes_env.docker.enabled = True # Use a docker container

# Create a set of package dependencies
diabetes_packages = CondaDependencies.create(conda_packages=['scikit-learn','pandas'],
                                             pip_packages=['azureml-sdk'])

# Add the dependencies to the environment
diabetes_env.python.conda_dependencies = diabetes_packages

# Register the environment (just in case you want to use it again)
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'diabetes-pipeline-env')

# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()

# Use the compute you created above. 
pipeline_run_config.target = pipeline_cluster

# Assign the environment to the run configuration
pipeline_run_config.environment = registered_env

print ("Run configuration created.")

In [None]:
# DEFINE PIPELINE STEPS

from azureml.pipeline.core import PipelineData
from azureml.pipeline.steps import PythonScriptStep, EstimatorStep
from azureml.train.estimator import Estimator

# Get the training dataset
diabetes_ds = ws.datasets.get("diabetes dataset")

# Create a PipelineData (temporary Data Reference) for the model folder
model_folder = PipelineData("model_folder", datastore=ws.get_default_datastore())

estimator = Estimator(source_directory=experiment_folder,
                      compute_target = pipeline_cluster,
                      environment_definition=pipeline_run_config.environment,
                      entry_script='train_diabetes.py')

# Step 1, run the estimator to train the model
train_step = EstimatorStep(name = "Train Model",
                           estimator=estimator, 
                           estimator_entry_script_arguments=['--output_folder', model_folder],
                           inputs=[diabetes_ds.as_named_input('diabetes_train')],
                           outputs=[model_folder],
                           compute_target = pipeline_cluster,
                           allow_reuse = True)

# Step 2, run the model registration script
register_step = PythonScriptStep(name = "Register Model",
                                 source_directory = experiment_folder,
                                 script_name = "register_diabetes.py",
                                 arguments = ['--model_folder', model_folder],
                                 inputs=[model_folder],
                                 compute_target = pipeline_cluster,
                                 runconfig = pipeline_run_config,
                                 allow_reuse = True)

print("Pipeline steps defined")

In [None]:
# RUN PIPELINE AS EXPERIMENT

from azureml.core import Experiment
from azureml.pipeline.core import Pipeline
from azureml.widgets import RunDetails

# Construct the pipeline
pipeline_steps = [train_step, register_step]
pipeline = Pipeline(workspace = ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
experiment = Experiment(workspace = ws, name = 'diabetes-training-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
print("Pipeline submitted for execution.")

RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion()

# Publish the Pipeline

In [None]:
# PUBLISH THE PIPELINE AS A REST SERVICE

published_pipeline = pipeline.publish(name="Diabetes_Training_Pipeline",
                                      description="Trains diabetes model",
                                      version="1.0")
rest_endpoint = published_pipeline.endpoint
print(rest_endpoint)

In [None]:
# GET AUTHORIZATION HEADER

from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()

In [None]:
# CALL THE REST INTERFACE

import requests
experiment_name = 'Run-diabetes-pipeline'

response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": experiment_name})
run_id = response.json()["Id"]
run_id