# Lab01 - Training and deploying a model in AML

In this lab, we will be using a subset of NYC Taxi & Limousine Commission - green taxi trip records available from [Azure Open Datasets](https://azure.microsoft.com/en-us/services/open-datasets/). The data is enriched with holiday and weather data. We will use data transformations and the GradientBoostingRegressor algorithm from the scikit-learn library to train a regression model to predict taxi fares in New York City based on input features such as, number of passengers, trip distance, datetime, holiday information and weather information.

The primary goal of this lab is to learn how to leverage Azure Machine Learning (AML) to provision compute resources to train machine learning models, and then deploy the trained models either to a managed Azure Container Instance (ACI) or to a containerized platform such as Azure Kubernetes Services (AKS).

**Import required libraries**

In [None]:
import os
import numpy as np
import pandas as pd
import pickle
import sklearn
import joblib
import math
import json

import azureml
from azureml.core import Workspace, Experiment, Run
from azureml.core.model import Model
from azureml.core.dataset import Dataset
from azureml.core.datastore import Datastore
from azureml.data.datapath import DataPath
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core.compute import ComputeTarget, AmlCompute, AksCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.estimator import Estimator
from azureml.widgets import RunDetails
from azureml.core.model import Model
from azureml.core.webservice import Webservice
from azureml.core import ScriptRunConfig
from azureml.core import Environment
from azureml.core.environment import CondaDependencies
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice, Webservice, AksWebservice
from azureml.exceptions import WebserviceException

print('The azureml.core version is {}'.format(azureml.core.VERSION))

## Connect to the Azure Machine Learning Workspace

The AML Python SDK is required for leveraging the experimentation, model management and model deployment capabilities of Azure Machine Learning services. Run the following cell to connect to the AML **Workspace**.

In [None]:
ws = Workspace.from_config()
print("The workspace name is: {}".format(ws.name))
print("The workspace resource group is: {}".format(ws.resource_group))
print("The workspace region is: {}".format(ws.location))

### Upload the training data to the blob store

In [None]:
input_location = "./data"
target_path = "training-data"
datastore = ws.get_default_datastore()
datastore.upload(input_location, 
                 target_path = target_path, 
                 overwrite = True, 
                 show_progress = True)

### Create a Tabular dataset and review the training data

In [None]:
training_data_path = DataPath(datastore=datastore, 
                              path_on_datastore=os.path.join(target_path, "nyc-taxi-data.csv"),
                              name="training-data")
train_ds = Dataset.Tabular.from_delimited_files(path=training_data_path)
train_ds.to_pandas_dataframe().head()

### Register the training dataset

In [None]:
dataset_name = "nyc-taxi-dataset"
description = "Dataset to predict NYC taxi fares."
registered_dataset = train_ds.register(ws, dataset_name, description=description, create_new_version=True)
print('Registered dataset name {} and version {}'.format(registered_dataset.name, registered_dataset.version))

## Create New Compute Cluster

AML Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. In Azure Machine Learning there are two options to create a compute cluster to run your model training jobs. First option is to use the AML Studio to create the compute cluster and the second option is to use the AML Python SDK to create the compute cluster. Let’s review both approaches below.

### Option #1: Create compute cluster from AML Studio

- From within the AML Studio, navigate to **Compute, Compute clusters** and then select **+ New**

![Create new compute cluster](./images/create_amlcompute_01.png 'Create New Compute Cluster')

- In the **Select virtual machine** dialog, make the following selections and then select **Next**:
    - Location: `Select a location closest to your AML workspace location`
    - Virtual machine size: **Standard_DS12_v2**
    
![Create new compute cluster - Select virtual machine](./images/create_amlcompute_02.png 'Select Virtual Machine')

- In the **Configure Settings** dialog, make the following selections and then select **Create**:
    - Compute name: **amlcompute-ad**
    - Minimum number of nodes: **0**
    - Maximum number of nodes: **2**
    
![Create new compute cluster - Configure Settings](./images/create_amlcompute_03.png 'Configure Settings')
    
It will take few minutes to provision the AML Compute Cluster.

### Option #2: Create compute cluster using AML Python SDK

Run the following cell to create a new AML compute cluster named: `amlcompute-ad`. Note that if you already created a new compute cluster from the AML studio, the below code will simply access that existing cluster, if not, it will create a new compute cluster.

In [None]:
cluster_name = "amlcompute-ad"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_v2', min_nodes=0, max_nodes=2)
    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    compute_target.wait_for_completion(show_output=True)

# Use the 'status' property to get a detailed status for the current AmlCompute. 
print(compute_target.status.serialize())

## Remotely train the machine learning model using the AML Compute Cluster

### Create the training script

The training script builds and trains the machine learning model. Review the code below to understand how we are using the `GradientBoostingRegressor` algorithm from the scikit-learn library to train a regression model to predict taxi fares in New York City based on input features such as, number of passengers, trip distance, datetime, holiday information and weather information. After training the model, the script will register the trained model in the AML model registry with the name: ` nyc-taxi-fare-predictor`.

Run the next two cells to create and save the training script `train.py` in the `scripts` folder.

In [None]:
script_file_folder = './scripts'
script_file_name = 'train.py'
script_file_full_path = os.path.join(script_file_folder, script_file_name)
os.makedirs(script_file_folder, exist_ok=True)

In [None]:
%%writefile $script_file_full_path
import argparse
import os
import pandas as pd
import numpy as np
import math
import pickle

import azureml.core
from azureml.core import Workspace, Experiment, Run
from azureml.core import Dataset
from azureml.core.model import Model

import sklearn
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn_pandas import DataFrameMapper
from sklearn.metrics import mean_squared_error

print("In train.py")
print("As a data scientist, this is where I write my training code.")

parser = argparse.ArgumentParser("train")

parser.add_argument("--dataset_name", type=str, help="dataset name", dest="dataset_name", required=True)
parser.add_argument("--model_name", type=str, help="model name", dest="model_name", required=True)
parser.add_argument("--model_description", type=str, help="model desc", dest="model_description", required=True)

args = parser.parse_args()

print("Argument 1: %s" % args.dataset_name)
print("Argument 2: %s" % args.model_name)
print("Argument 3: %s" % args.model_description)

run = Run.get_context()
ws = run.experiment.workspace

input_dataset = ws.datasets[args.dataset_name]
print('Dataset name {} and version {}'.format(args.dataset_name, input_dataset.version))
data = input_dataset.to_pandas_dataframe()
print('Training data loaded!')

x_df = data.drop(['totalAmount'], axis=1)
y_df = data['totalAmount']

X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=0)

numerical = ['vendorID', 'passengerCount', 'tripDistance', 'hour_of_day', 
             'day_of_week', 'day_of_month', 'month_num', 
             'snowDepth', 'precipTime', 'precipDepth', 'temperature']

categorical = ['normalizeHolidayName', 'isPaidTimeOff']

numeric_transformations = [([f], Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])) for f in numerical]

categorical_transformations = [([f], OneHotEncoder(handle_unknown='ignore', sparse=False)) for f in categorical]

transformations = numeric_transformations + categorical_transformations

# df_out will return a data frame, and default = None will pass the engineered features unchanged
mapper = DataFrameMapper(transformations, input_df=True, df_out=True, default=None, sparse=False)

clf = Pipeline(steps=[('preprocessor', mapper),
                      ('regressor', GradientBoostingRegressor())])

clf.fit(X_train, y_train)

y_predict = clf.predict(X_test)
y_actual = y_test.values.flatten().tolist()
rmse = math.sqrt(mean_squared_error(y_actual, y_predict))
run.log('RMSE', rmse, 'Model RMSE on test set')
print('The RMSE score on test data for GradientBoostingRegressor: ', rmse)

output_folder = './outputs'
os.makedirs(output_folder, exist_ok=True)
output_filename = os.path.join(output_folder, 'nyc-taxi-fare.pkl')
pickle.dump(clf, open(output_filename, 'wb'))
print('Model file nyc-taxi-fare.pkl saved!')

modelfiles_folder = output_folder
model_name = args.model_name
model_description = args.model_description #'Model to predict taxi fares in NYC.'

os.chdir(modelfiles_folder)
datasheet = {"Type": "GradientBoostingRegressor", 
             "Run id": run.id, 
             "Training dataset name": input_dataset.name, 
             "Training dataset version": input_dataset.version, 
             "RMSE score": rmse}

model = Model.register(
    model_path='nyc-taxi-fare.pkl',  # this points to a local file
    model_name=model_name,  # this is the name the model is registered as
    tags=datasheet,
    description=model_description, 
    datasets=[('training data', input_dataset)], 
    workspace=ws
)

print("Model registered: {} \nModel Description: {} \nModel Version: {}".format(model.name, 
                                                                                model.description, 
                                                                                model.version))
print('Done!')

### Create and register the Model Training Environment

AML environments are an encapsulation of the environment where your machine learning training happens. They define Python packages, environment variables, Docker settings and other attributes in declarative fashion. Environments are versioned: you can update them and retrieve old versions to revisit and review your work.

Environments allow you to:
* Encapsulate dependencies of your training process, such as Python packages and their versions.
* Reproduce the Python environment on your local computer in a remote run on VM or ML Compute cluster
* Reproduce your experimentation environment in production setting.
* Revisit and audit the environment in which an existing model was trained.

Environment, compute target and training script together form run configuration: the full specification of training run.

In [None]:
train_env = Environment.get(workspace=ws, name='AzureML-Minimal').clone('Custom-Train-Env')
cd = train_env.python.conda_dependencies
cd.add_pip_package("numpy")
cd.add_pip_package("pandas")
cd.add_pip_package("joblib")
cd.add_pip_package("scikit-learn==0.24.1")
cd.add_pip_package("sklearn-pandas==2.2.0")
train_env.register(workspace=ws)
print('Registered training env.')

### Create the ScriptRunConfig with the custom Enviroment

In this case we pass the following parameters to the training script:

- **dataset_name**: Name of the registered dataset to use for model training
- **model_name**: Name of the model to use in the AML model registry
- **model_description**: Model description to save with the registered model

In [None]:
dataset_name = 'nyc-taxi-dataset'
model_name = 'nyc-taxi-fare-predictor'
model_description = 'Model to predict taxi fares in NYC.'

src = ScriptRunConfig(source_directory=script_file_folder, 
                      script=script_file_name, 
                      arguments=['--dataset_name', dataset_name,
                                 '--model_name', model_name,
                                 '--model_description', model_description
                                ], 
                      compute_target=compute_target, 
                      environment=train_env)

### Submit the training run

The code pattern to submit a training run to Azure Machine Learning compute targets is always:

- Create an experiment to run.
- Submit the experiment.
- Wait for the run to complete.

In [None]:
experiment_name = 'lab01-exp'
experiment = Experiment(ws, experiment_name)
run = experiment.submit(src)

### Monitor the Run Metrics

Using the azureml Jupyter widget, you can monitor the training run. Run the cell below to monitor the experiment run. Wait till the model training is completed and the experiment run status is **Completed** before proceeding beyond the next cell.

In [None]:
RunDetails(run).show()

## Deploy the model to Azure Container Instance as a Web Service

You can deploy a model as a real-time web service to several kinds of compute target, including local compute, an Azure Machine Learning compute instance, an Azure Container Instance (ACI), an Azure Kubernetes Service (AKS) cluster, an Azure Function, or an Internet of Things (IoT) module. In the section we will review how to deploy the model to **ACI** that is typically used for low-scale CPU-based workloads that require less than 48 GB of RAM.

### Create the scoring web service

When deploying models for scoring with Azure Machine Learning services, you need to define the code for a simple web service that will load your model and use it for scoring. By convention this service has two methods init which loads the model and run which scores data using the loaded model.

This scoring service code will later be deployed inside of a specially prepared Docker container. Run the cell below to create the scoring script `score.py`.

In [None]:
%%writefile score.py
import json
import numpy as np
import pandas as pd
import sklearn
import joblib
from azureml.core.model import Model

columns = ['vendorID', 'passengerCount', 'tripDistance', 'hour_of_day', 'day_of_week', 
            'day_of_month', 'month_num', 'snowDepth', 'precipTime', 'precipDepth', 
            'temperature', 'normalizeHolidayName', 'isPaidTimeOff']

def init():

    global trained_model
    model_path = Model.get_model_path('nyc-taxi-fare-predictor')
    trained_model = joblib.load(model_path)
    print('model loaded')

def run(input_json):
    # Get predictions and explanations for each data point
    inputs = json.loads(input_json)
    data_df = pd.DataFrame(np.array(inputs).reshape(-1, len(columns)), columns = columns)
    # Make prediction
    predictions = trained_model.predict(data_df)
    # You can return any data type as long as it is JSON-serializable
    return {'predictions': predictions.tolist()}

### Create and register the Inferencing Environment

In [None]:
inference_env = Environment(name='Custom-Inference-Env')
cd = CondaDependencies()
cd.add_pip_package("azureml-defaults")
cd.add_pip_package("inference-schema")
cd.add_pip_package("numpy")
cd.add_pip_package("pandas")
cd.add_pip_package("joblib")
cd.add_pip_package("scikit-learn==0.24.1")
cd.add_pip_package("sklearn-pandas==2.2.0")
inference_env.python.conda_dependencies = cd
inference_env.register(workspace=ws)
print('Registered inferencing env.')

### Load the Registered Model

Load the model that was registered during model training from the AML model registry.

In [None]:
model_name = 'nyc-taxi-fare-predictor'
registered_model = Model(ws, name=model_name)
registered_model

### Package Model and deploy to ACI

The steps involved include:
- Create the inference config that specifies the scoring script and the deployment environment
- Create the deployment configuration that specifies the characteristics of the compute
-  Finally, deploy the model that specifies the registered model to deploy, the inference config and the deployment config

Run the following two cell:  you may be waiting 5-15 minutes for completion, while the _Running_ tag adds progress dots.

You will see output similar to the following when your web service is ready: 

`
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
`

In [None]:
inference_config = InferenceConfig(entry_script='score.py', source_directory='./', environment=inference_env)

description = 'NYC Taxi Fare Predictor ACI Service'

aci_config = AciWebservice.deploy_configuration(
                        cpu_cores=3, 
                        memory_gb=15, 
                        location='eastus', 
                        description=description, 
                        auth_enabled=False, 
                        tags = {'name': 'ACI container', 
                                'model_name': registered_model.name, 
                                'model_version': registered_model.version
                                }
                        )

In [None]:
aci_service_name='nyc-taxi-aci-service'

aci_service = Model.deploy(workspace=ws,
                           name=aci_service_name,
                           models=[registered_model],
                           inference_config=inference_config,
                           deployment_config= aci_config, 
                           overwrite=True)

aci_service.wait_for_deployment(show_output=True)
print(aci_service.state)

### Test Deployment

Test your deployed web service.

In [None]:
data1 = [1, 2, 5, 9, 4, 27, 5, 0, 0.0, 0.0, 65, 'Memorial Day', True]

data2 = [[1, 3, 10, 15, 4, 27, 7, 0, 2.0, 1.0, 80, 'None', False], 
         [1, 2, 5, 9, 4, 27, 5, 0, 0.0, 0.0, 65, 'Memorial Day', True]]

In [None]:
result = aci_service.run(json.dumps(data1))
print('Predictions for data1')
print(result)
result = aci_service.run(json.dumps(data2))
print('Predictions for data2')
print(result)

## Deploy the model to Azure Kubernetes Service as a Web Service

**Azure Kubernetes Service (AKS)** is used for high-scale production deployments. In this section we will review how provision an AKS cluster and then how to deploy the registered model as a scoring web service to the AKS cluster.

### Create AKS Cluster for Production Deployment

In Azure Machine Learning there are two options to create an AKS cluster to deploy your trained models. First option is to use the AML Studio and the second option is to use the AML Python SDK. Let’s review both approaches below.

### Option #1: Create an AKS cluster from AML Studio

- From within the Azure Machine Learning Studio, navigate to **Compute, Inference Clusters** and select **+ New**

![Create new inference cluster](./images/setup-aks-01.png 'Create New Inference Cluster')

- In the **Select virtual machine** dialog, make the following selections and then select **Next**:
    - Location: `Select a location closest to your AML workspace location`
    - Virtual machine size: **Standard_D3_v2**
    
![Create new inference cluster - Select virtual machine](./images/setup-aks-02.png 'Select Virtual Machine')

- In the **Configure Settings** dialog, make the following selections and then select **Create**:
    - Compute name: **aks-cluster01**
    - Cluster purpose: **Dev-test**
    - Number of nodes: **1**
    - Network configuration: **Basic**
    
![Create new inference cluster - Configure Settings](./images/setup-aks-03.png 'Configure Settings')

**Note**: It can take several minutes to provision the inference cluster. Please wait for the cluster to be ready before proceeding.

### Option #2: Create an AKS cluster using AML Python SDK

Run the following cell to create an AKS cluster: `aks-cluster01`. Note that if you already created the AKS cluster from the AML studio, the below code will simply access that existing cluster, if not, it will create a new AKS cluster.

In the cell below please specify the **aks_region** that is closest to your AML Workspace.

In [None]:
aks_name = "aks-cluster01"
aks_region = "eastus2"
compute_list = ws.compute_targets
aks_target = None
if aks_name in compute_list:
    aks_target = compute_list[aks_name]
else:
    print("No AKS found. Creating new AKS: {} for model deployment.".format(aks_name))
    prov_config = AksCompute.provisioning_configuration(location=aks_region)
    # Create the cluster
    aks_target = ComputeTarget.create(workspace=ws, name=aks_name, provisioning_configuration=prov_config)
    aks_target.wait_for_completion(show_output=True)
    print(aks_target.provisioning_state)
    print(aks_target.provisioning_errors)

### Package Model and deploy to AKS

Run the following two cell:  you may be waiting 5-15 minutes for completion, while the _Running_ tag adds progress dots.

You will see output similar to the following when your web service is ready: 

`
Succeeded
AKS service creation operation finished, operation "Succeeded"
Healthy
`

In [None]:
description = 'NYC Taxi Fare Predictor AKS Service'
aks_config = AksWebservice.deploy_configuration(description = description, 
                                                tags = {'name': 'AKS container', 
                                                        'model_name': registered_model.name, 
                                                        'model_version': registered_model.version
                                                       }
                                               )

In [None]:
aks_service_name='nyc-taxi-aks-service'

aks_service = Model.deploy(workspace=ws,
                           name=aks_service_name,
                           models=[registered_model],
                           inference_config=inference_config,
                           deployment_config= aks_config, 
                           deployment_target=aks_target, 
                           overwrite=True)

aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

### Test Deployment

Finally, test your deployed web service.

In [None]:
data1 = [1, 2, 5, 9, 4, 27, 5, 0, 0.0, 0.0, 65, 'Memorial Day', True]

data2 = [[1, 3, 10, 15, 4, 27, 7, 0, 2.0, 1.0, 80, 'None', False], 
         [1, 2, 5, 9, 4, 27, 5, 0, 0.0, 0.0, 65, 'Memorial Day', True]]

In [None]:
result = aks_service.run(json.dumps(data1))
print('Predictions for data1')
print(result)
result = aks_service.run(json.dumps(data2))
print('Predictions for data2')
print(result)