# Objective 

Illustrate mechanisms within AWS SageMaker and Azure ML to deploy models including:

- API deployment
- Safe rollout using A/B testing

# Introduction

Model deployment involves taking a trained machine learning model and making it available for real-time predictions or inferences. When it comes to deploying models, there are two primary outcomes to consider: APIs (Application Programming Interfaces) and edge deployment. These two modes offer distinct advantages and cater to different use cases.

API deployment involves hosting the machine learning model on a server and exposing it as a service through an API. This allows clients or applications to send requests to the API and receive predictions or inferences in response. API deployment is ideal for scenarios where there is a centralized infrastructure and clients have reliable network connectivity. Examples of API deployment include using frameworks like Flask or FastAPI to build a RESTful API for deploying a natural language processing (NLP) model or an image recognition model.

On the other hand, edge deployment brings the model closer to the data source or the client device itself, reducing latency and enabling real-time predictions without relying on a network connection. In this mode, the model runs directly on edge devices such as smartphones, IoT devices, or embedded systems. Edge deployment is particularly valuable in scenarios where low latency, privacy, or intermittent network connectivity is crucial. Examples of edge deployment include deploying a computer vision model on a surveillance camera to detect anomalies in real-time or deploying a speech recognition model on a smartphone for offline voice commands.

## What are APIs?

A common method to deploy a model on the web is to wrap the saved model as a API service and allow users (clients) to send requests. Incoming requests are parsed into the appropriate input format by the service and presented to the model for inference. This inference is returned to the user as a response. Each request is handled by a specific resource (in our case a model) that is identified by a unique *endpoint*. 

Think of an endpoint as the unique URL that is shared with the client to interface with the model; all they can do is to send a request to the endpoint (i.e., they have no access to any detail on how the response is generated). Intuitively, it is like a storefront (a unique address) where they come to collect their predictions. They do not worry about *how* the predictions are made. An endpoint separates the user-facing "front end" from a predictve model infused "back end". 

But what exactly is an Application Programming Interface (API)?

APIs prescribe the mechanism through which any two computers can exchange information over a network. Given that there could be many ways to execute this exchange, it would be prudent to formalize this exchange as a set of rules that we agree. These rules are encoded as REST principles. 

![rest-api](assets/rest-api.drawio.png)

REpresentational State Transfer (REST) APIs are programming language agnostic and encode a set of rules that constitute a REST-ful API. These rules are:

- Clients can only make POST, GET, PUT, or DELETE requests
- These requests can contain an optional payload (usually a [JSON object](https://www.json.org/json-en.html))
- All requests should return a response with a code indicating the status of the response (200's - Success, 400's - Improper request, 500's - Server side errors)

In the context of ML deployment, clients send a POST request with a payload containing the input data needed by the model to make a prediction. For example, to get a classification result on their input, showrooms should attach the features of a diamond as a payload and upload it to the unique URL encoded by the endpoint. The server parses this input, presents it to the model, collects the prediction and sends a response (along with a status code) back to the client. In sum, customers *post* an input and the business serves a response.

Common web frameworks used in production that implement the REST framework in Python are [Flask](https://palletsprojects.com/p/flask/) and [FastAPI](https://fastapi.tiangolo.com/). Flask is a popular REST implementation that is used by [SageMaker](https://aws.amazon.com/blogs/machine-learning/part-2-model-hosting-patterns-in-amazon-sagemaker-getting-started-with-deploying-real-time-models-on-sagemaker/) to create a web server for ML models. The advantage of Flask is that owing to its longer existence, it enjoys a wider ecosystem compared with FastAPI. Beyond these general purpose implementations, specialized implementations also exist. For example, [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) implements the REST framework in C++ for TensorFlow models and hence can be more performant for deep learning models.

## Models as APIs

<div class="alert alert-block alert-warning">

<b>Business Context (Review)</b> 
    
For this session consider the case of a popular diamond jeweller - Brilliant Earth - with 30 showrooms across the US facing a price prediction problem. A common customer question that echoes in their retail outlets is the impact on price because of changes in some aspects of the ornament. For example, usually customers ask: "If I decreased the carat of the diamonds used in this design, by how much would the price reduce?". Such queries often require an expert intervention on the shopfloor and result in a subdued customer experience. The company also wants to implement a price predictor tool on their website so customers can engage with the brand better. At the moment, no such tool exists and the business team estimates that a price predictor will improve traffic to the website and also improve the time spent on the website.

The dataset used in this session is scraped from the [Brilliant Earth website](https://www.brilliantearth.com/) and hosted on [Open ML](https://www.openml.org/search?type=data&status=active&id=43355).

</div>

An example of a fully fleshed out endpoint for the diamond price prediction problem is [here](https://pgurazada1-diamond-price-predictor.hf.space/). In the rest of this session, we present the details behind building a REST API with the models estimated at their core.

# Setup

**General Imports**

In [1]:
import logging

import pandas as pd

**AWS imports & authentication**

In [2]:
import sagemaker
import boto3

from sagemaker.sklearn.estimator import SKLearn
from sagemaker.sklearn.estimator import SKLearnModel

from sagemaker.session import production_variant

A `sagemaker` session is a cloud equivalent to a fully functional local development setup (i.e., access enabled to data and compute). We can point a session to a default bucket that will host all the artifacts accessed and created during the session (remember nothing stays local). 

In [3]:
deployment_session = sagemaker.Session(
    default_bucket="sagemaker-deployment-examples"
)

In [4]:
try:
    aws_role = sagemaker.get_execution_role()
except ValueError:
    print("Config file not found on local machine, use SageMaker Studio")

From within SageMaker studio, execution role is inherited. Outside the Studio environment, the execution role should be explictly specified. This execution role should have [AmazonSageMakerFullAccess](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam-awsmanpol.html) permissions. Local compute [access](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) should also be [enabled](https://stackoverflow.com/a/47767351).

In [5]:
print(f"AWS execution role associated with the account {aws_role}")

AWS execution role associated with the account arn:aws:iam::321112151583:role/default-sagemaker-access


**Azure imports & authentication**

In [6]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

from azure.ai.ml import Input
from azure.ai.ml import command

from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    CodeConfiguration
)

In [7]:
subscription_id = "5bcad9c4-40fb-4136-b614-cc90116dd8b3"
resource_group = "tf"
workspace = "cloud-teach"

In [8]:
logger = logging.getLogger("azure.core.pipeline.policies.http_logging_policy")
logger.setLevel(logging.WARNING)

From VMs within the Azure ML workspace, the default Azure credentials are inherited. However, interactive browser credentials could be used to authenticate an Azure account to the Azure ML workspace.

In [9]:
az_credentials = DefaultAzureCredential(
    exclude_interactive_browser_credential=False
)

In [10]:
ml_client = MLClient(
    az_credentials, subscription_id, resource_group, workspace
)

# Data

## AWS

In [11]:
diamonds_df = pd.read_csv('s3://sagemaker-ap-south-1-321112151583/prices/diamond-prices.csv')

In [12]:
diamonds_df.head()

Unnamed: 0,id,url,shape,price,carat,cut,color,clarity,report,type,date_fetched
0,10086429.0,https://www.brilliantearth.com//loose-diamonds...,Round,400.0,0.3,Very Good,J,SI2,GIA,natural,2020-11-29 12-26 PM
1,10016334.0,https://www.brilliantearth.com//loose-diamonds...,Emerald,400.0,0.31,Ideal,I,SI1,GIA,natural,2020-11-29 12-26 PM
2,9947216.0,https://www.brilliantearth.com//loose-diamonds...,Emerald,400.0,0.3,Ideal,I,VS2,GIA,natural,2020-11-29 12-26 PM
3,10083437.0,https://www.brilliantearth.com//loose-diamonds...,Round,400.0,0.3,Ideal,I,SI2,GIA,natural,2020-11-29 12-26 PM
4,9946136.0,https://www.brilliantearth.com//loose-diamonds...,Emerald,400.0,0.3,Ideal,I,SI1,GIA,natural,2020-11-29 12-26 PM


## Azure

In [13]:
for registered_data in ml_client.data.list():
    print(registered_data.name)

winequality-local
winequality-red
user-likes-media
socialmediaengagement
imdb_reviews
diamond-prices-jan
diamond-prices-feb
wine-quality-indicator
diamond-prices-may


In [14]:
diamond_prices_data = ml_client.data.get(
    name="diamond-prices-jan",
    version=1
)

In [15]:
diamonds_df = pd.read_csv(diamond_prices_data.path)

In [16]:
diamonds_df.head()

Unnamed: 0,id,url,shape,price,carat,cut,color,clarity,report,type,date_fetched
0,10086429,https://www.brilliantearth.com//loose-diamonds...,Round,400,0.3,'Very Good',J,SI2,GIA,natural,'2020-11-29 12-26 PM'
1,10016334,https://www.brilliantearth.com//loose-diamonds...,Emerald,400,0.31,Ideal,I,SI1,GIA,natural,'2020-11-29 12-26 PM'
2,9947216,https://www.brilliantearth.com//loose-diamonds...,Emerald,400,0.3,Ideal,I,VS2,GIA,natural,'2020-11-29 12-26 PM'
3,10083437,https://www.brilliantearth.com//loose-diamonds...,Round,400,0.3,Ideal,I,SI2,GIA,natural,'2020-11-29 12-26 PM'
4,9946136,https://www.brilliantearth.com//loose-diamonds...,Emerald,400,0.3,Ideal,I,SI1,GIA,natural,'2020-11-29 12-26 PM'


# Model Training 

## AWS

We estimate two models for the diamond prices data - a decision tree regressor (`dt.py`) and a gradient boosted regressor (`gb.py`).

The input data is hosted in the default S3 bucket of the `sagemaker` session as an unprocessed csv file. 

In [17]:
sklearn_dt_estimator = SKLearn(
    entry_point="aws/train/dt.py",
    framework_version="1.2-1",
    instance_type="ml.m5.xlarge",
    instance_count=1,
    volume_size=1,
    role=aws_role,
    sagemaker_session=deployment_session
)

In [18]:
sklearn_dt_estimator.fit(
    inputs={
    'train': 's3://sagemaker-ap-south-1-321112151583/prices/'
    },
    wait=False,
    job_name='2023-06-12-estimate-dt-003'
)

Using provided s3_resource


INFO:sagemaker:Creating training-job with name: 2023-06-12-estimate-dt-003


In [19]:
sklearn_dt_estimator.logs()

2023-06-12 07:56:43 Starting - Starting the training job...
2023-06-12 07:56:58 Starting - Preparing the instances for training...
2023-06-12 07:57:46 Downloading - Downloading input data...
2023-06-12 07:58:06 Training - Downloading the training image...
2023-06-12 07:58:36 Training - Training image download completed. Training in progress.2023-06-12 07:58:42,200 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
2023-06-12 07:58:42,204 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2023-06-12 07:58:42,212 sagemaker_sklearn_container.training INFO     Invoking user training script.
2023-06-12 07:58:42,396 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2023-06-12 07:58:42,408 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2023-06-12 07:58:42,420 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2023-06-12 07:58:4

In [20]:
sklearn_gb_estimator = SKLearn(
    entry_point="aws/train/gb.py",
    framework_version="1.2-1",
    role=aws_role,
    sagemaker_session=deployment_session,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    volume_size=1
)

In [22]:
sklearn_gb_estimator.fit(
    inputs={
    'train': 's3://sagemaker-ap-south-1-321112151583/prices/'
    },
    wait=False,
    job_name='2023-06-12-estimate-gb-003'
)

Using provided s3_resource


INFO:sagemaker:Creating training-job with name: 2023-06-12-estimate-gb-003


In [23]:
sklearn_gb_estimator.logs()

2023-06-12 08:00:08 Starting - Starting the training job...
2023-06-12 08:00:27 Starting - Preparing the instances for training......
2023-06-12 08:01:23 Downloading - Downloading input data..
2023-06-12 08:02:36 Training - Training image download completed. Training in progress.
2023-06-12 08:02:36 Uploading - Uploading generated training model2023-06-12 08:02:22,619 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
2023-06-12 08:02:22,623 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2023-06-12 08:02:22,631 sagemaker_sklearn_container.training INFO     Invoking user training script.
2023-06-12 08:02:22,820 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2023-06-12 08:02:22,832 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2023-06-12 08:02:22,844 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2023-06-12 08:

There are two key aspects of the training scripts (`dt.py` and `gb.py`) that are new here:

**1. The training workflow is encapsulated within a "main guard"** 

```python
if __name__ == "__main__":
    main()
```

This allows the training modules to be executed only when the training script is called from the command line. This is a good practise to ensure that the training process does not execute when the script is used as a part of the pipeline.

**2. Model pipelines are estimated rather than the models themselves**

```python
preprocessor = make_column_transformer(
        (StandardScaler(), numeric_features),
        (OneHotEncoder(handle_unknown='ignore'), categorical_features)
)

model_dt = DecisionTreeRegressor()

model_pipeline = make_pipeline(preprocessor, model_dt)
```

By estimating a preprocessing pipeline along with the model, we ensure that the data processing is "packaged" along with the model estimation. This is a good practise if the preprocessing involves standard, light-weight steps. Extensive preprocessing steps are best handled through a pipeline job. This way we avoid potentially costly data transfers between two steps - pre-processing and model estimation. Packaging preprocessing wth the model estimation also helps complex pipeline patterns during inference.

Output from the training script is persisted to the bucket allocated for the training job within the `output` folder.

In [24]:
sklearn_dt_estimator.model_data

's3://sagemaker-deployment-examples/2023-06-12-estimate-dt-003/output/model.tar.gz'

In [25]:
sklearn_gb_estimator.model_data

's3://sagemaker-deployment-examples/2023-06-12-estimate-gb-003/output/model.tar.gz'

Note that in this stage, we could have extracted the best model through hyperparameter tuning. However, for the purpose of model deployment, we are only concerned with obtaining the final model file that represents the best model for the training data.

## Azure

In [26]:
dt_train_job = command(
    inputs={
        "data": Input(type="uri_file", path="azureml:diamond-prices-jan:1")
    },
    code="azure/train/dt.py",
    command="python dt.py --data ${{inputs.data}}",
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    display_name="2023-06-12-decision-tree-regression-example-003",
    experiment_name="2023-06-12-estimate-dt-003"
)

In [27]:
ml_client.create_or_update(dt_train_job)

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
INFO:azure.identity._internal.interact

Experiment,Name,Type,Status,Details Page
2023-06-12-estimate-dt-003,modest_salt_hvgbf8s4sk,command,Starting,Link to Azure Machine Learning studio


In [28]:
gb_train_job = command(
    inputs={
        "data": Input(type="uri_file", path="azureml:diamond-prices-jan:1")
    },
    code="azure/train/gb.py",
    command="python gb.py --data ${{inputs.data}}",
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    display_name="2023-06-12-gradient-boosting-regression-example-003",
    experiment_name="2023-06-12-estimate-gb-003"
)

In [29]:
ml_client.create_or_update(gb_train_job)

[32mUploading gb.py[32m (< 1 MB): 100%|██████████| 1.79k/1.79k [00:00<00:00, 51.4kB/s]
[39m

INFO:azure.identity._internal.interactive:InteractiveBrowserCredential.get_token succeeded
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from InteractiveBrowserCredential


Experiment,Name,Type,Status,Details Page
2023-06-12-estimate-gb-003,elated_seal_7h7br1q8z1,command,Starting,Link to Azure Machine Learning studio


In [39]:
ml_client.jobs.get("modest_salt_hvgbf8s4sk")

INFO:azure.identity._internal.interactive:InteractiveBrowserCredential.get_token succeeded
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from InteractiveBrowserCredential


Experiment,Name,Type,Status,Details Page
2023-06-12-estimate-dt-003,modest_salt_hvgbf8s4sk,command,Completed,Link to Azure Machine Learning studio


In [37]:
ml_client.jobs.get("elated_seal_7h7br1q8z1")

INFO:azure.identity._internal.interactive:InteractiveBrowserCredential.get_token succeeded
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from InteractiveBrowserCredential


Experiment,Name,Type,Status,Details Page
2023-06-12-estimate-gb-003,elated_seal_7h7br1q8z1,command,Completed,Link to Azure Machine Learning studio


There are three key aspects of the training scripts (`dt.py` and `gb.py`) that are new here:

**1. The training workflow is encapsulated within a "main guard"** 

```python
if __name__ == "__main__":
    main()
```

This allows the training modules to be executed only when the training script is called from the command line. This is a good practise to ensure that the training process does not execute when the script is used as a part of a larger pipeline.

**2. Model pipelines are estimated rather than the models themselves**

```python
preprocessor = make_column_transformer(
        (StandardScaler(), numeric_features),
        (OneHotEncoder(handle_unknown='ignore'), categorical_features)
)

model_dt = DecisionTreeRegressor()

model_pipeline = make_pipeline(preprocessor, model_dt)
```

By estimating a preprocessing pipeline along with the model, we ensure that the data processing is "packaged" along with the model estimation. This is a good practise if the preprocessing involves standard, light-weight steps. Extensive preprocessing steps are best handled through a pipeline job. This way we avoid potentially costly data transfers between two steps - pre-processing and model estimation. Packaging preprocessing wth the model estimation also helps complex pipeline patterns during inference.

**3. Given the deep integration of `mlflow` within Azure ML, we can log and register models during the estimation process itself**

```python
mlflow.sklearn.log_model(
        sk_model=model_pipeline,
        registered_model_name="gbr-diamond-price-predictor",
        artifact_path="gbr-diamond-price-predictor"
    )
```

The advantage here is that if a model with the registered name exists within the Azure ML workspace, it automatically gets updated with a new version.

# Creating an Endpoint

Since the gradient boosted model has a better R-squared, let us deploy the gradient boosted model as the first version of the diamond price predictor.

## AWS

### Create a `Model` object

In [40]:
sklearn_dt_estimator.model_data, sklearn_gb_estimator.model_data

('s3://sagemaker-deployment-examples/2023-06-12-estimate-dt-003/output/model.tar.gz',
 's3://sagemaker-deployment-examples/2023-06-12-estimate-gb-003/output/model.tar.gz')

In [41]:
model_gb = SKLearnModel(
    model_data=sklearn_gb_estimator.model_data,
    entry_point="aws/infer/inference.py",
    framework_version="1.2-1",
    role=aws_role,
    sagemaker_session=deployment_session
)

In [42]:
model_dt = SKLearnModel(
    model_data=sklearn_dt_estimator.model_data,
    entry_point="aws/infer/inference.py",
    framework_version="1.2-1",
    role=aws_role,
    sagemaker_session=deployment_session
)

### Prepare an inference script

The inference script (`inference.py`) guides the `sagemaker` model server on input handling and generating model predictions. SageMaker defines clear guidelines on the functions within this script that will be invoked when a prediction request is received (the file `inference.py` presents detailed comments that delineate what each function in the script accomplishes).

![aws-inference](assets/aws-inference.drawio.png)

### Infrastructure and Execution

Once the server logic is implemented in the inference script, we define the infrastructure we need to host and serve the model. SageMaker handles the resources needed to create the model server and generates an endpoint with the name specified. 

In [43]:
predictor_gb = model_gb.deploy(
    endpoint_name='diamond-price-gb',
    instance_type="ml.m5.xlarge", 
    initial_instance_count=1,
    wait=False
)

INFO:sagemaker:Creating model with name: sagemaker-scikit-learn-2023-06-12-08-08-48-902
INFO:sagemaker:Creating endpoint-config with name diamond-price-gb
INFO:sagemaker:Creating endpoint with name diamond-price-gb


### Testing

In order to test the endpoints created in the previous step, we collect test data as traffic and present it to the end points.  This helps iron out potential errors before the endpoint is rolled out to customers. Usually, data that the model has never seen before is used to test deployments.

The input type for a prediction request to our model as defined in `inference.py` is `csv`.

In [47]:
sample_df = diamonds_df.sample(2)

In [48]:
sample_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, 15586 to 1706
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            2 non-null      int64  
 1   url           2 non-null      object 
 2   shape         2 non-null      object 
 3   price         2 non-null      int64  
 4   carat         2 non-null      float64
 5   cut           2 non-null      object 
 6   color         2 non-null      object 
 7   clarity       2 non-null      object 
 8   report        2 non-null      object 
 9   type          2 non-null      object 
 10  date_fetched  2 non-null      object 
dtypes: float64(1), int64(2), object(8)
memory usage: 192.0+ bytes


In [49]:
numeric_features = ['carat']
categorical_features = ['shape', 'cut', 'color', 'clarity', 'report', 'type']

In [50]:
features = numeric_features + categorical_features

In [51]:
sample_Xtest = sample_df[features]
sample_ytest = sample_df['price']

In [52]:
sample_Xtest

Unnamed: 0,carat,shape,cut,color,clarity,report,type
15586,0.5,Pear,Ideal,J,VS2,GIA,natural
1706,0.3,Round,'Very Good',F,SI1,GIA,natural


Note that at this point the endpoints are in service but are not publicly accessible. However, these endpoint can be invoked within the domain using the `sagemaker` runtime. As the code below indicates, we create a temporary `csv` file from the sample data frame created in the previous step to be presented to the corresponding endpoint.

In [53]:
runtime = boto3.client("sagemaker-runtime")

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Let us look at the response from the Gradient Boosted Regressor.

In [54]:
response = runtime.invoke_endpoint(
    EndpointName=predictor_gb.endpoint_name,
    Body=sample_Xtest.to_csv(header=True, index=False).encode("utf-8"),
    ContentType="text/csv"
)

To confirm that the endpoint is REST-ful, we can check the status code of its response.

In [55]:
response['ResponseMetadata']['HTTPStatusCode']

200

In [56]:
print(response["Body"].read())

b'[719.0960325112374, 798.6152914735877]'


We can compare this response with the ground truth.

In [57]:
sample_ytest

15586    880
1706     550
Name: price, dtype: int64

### Cleanup

At this point we have a model that can receive external traffic. However, there are further steps to go before a full rollout happens. To avoid costs incurred on idle endpoints during the testing phase, it is a good practise to delete end points. Production end points should ideally be generated and maintained by a separate team (even if they are using the same code).

In [58]:
predictor_gb.delete_endpoint(delete_endpoint_config=True)

INFO:sagemaker:Deleting endpoint configuration with name: diamond-price-gb
INFO:sagemaker:Deleting endpoint with name: diamond-price-gb


## Azure

### Create endpoint

In [59]:
online_endpoint_name = "diamond-price-predictor-001"

In [60]:
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Model to predict diamond prices",
    auth_mode="aml_token"
)

By creating a `ManagedOnlineEndpoint` we let Azure handle all the resource creation and management.

In [61]:
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

INFO:azure.identity._internal.interactive:InteractiveBrowserCredential.get_token succeeded
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from InteractiveBrowserCredential


ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://diamond-price-predictor-001.centralindia.inference.ml.azure.com/score', 'openapi_uri': 'https://diamond-price-predictor-001.centralindia.inference.ml.azure.com/swagger.json', 'name': 'diamond-price-predictor-001', 'description': 'Model to predict diamond prices', 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/5bcad9c4-40fb-4136-b614-cc90116dd8b3/resourcegroups/tf/providers/microsoft.machinelearningservices/workspaces/cloud-teach/onlineendpoints/diamond-price-predictor-001', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/5bcad9c4-40fb-4136-b614-cc90116dd8b3/providers/Microsoft.MachineLearningServices/locations/centralindia/mfeOperationsStatus/oe:3150d3d4-458f-4007-bee4-187ff3e17685:7b9b00a2-38d4-48bf-96c5-7a06397d3a20?api-version=2022-02-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/5bcad9c4-40fb-4136-b614-cc901

### Collect registered model

In [62]:
registered_model_gb = ml_client.models.get(
    name="gbr-diamond-price-predictor-june", 
    version=1
)

In [63]:
registered_model_gb.version

'1'

### Prepare a scoring script

Scoring scrips guide the Azure ML model server on input handling and generating model predictions. Azure ML defines clear guidelines on the functions within this script that will be invoked when a prediction request is received (the file `score.py` presents detailed comments that delineate what each function in the script accomplishes).

![azure-score](assets/azure-score.drawio.png)

### Infrastructure & Execution

The base model that we will deploy is referred to as the "blue" model by convention. After creation, this endpoint is intended to serve 100% of the traffic with the variant tagged as the blue version (the gradient boosted model in this case).

Once the server logic is implemented in the scoring script, we define the infrastructure we need to host and serve the model. Azure ML handles the resources needed to create the model server and attaches it to the endpoint with the name specified (note that the managed endpoint was created in the first step).

In [64]:
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=registered_model_gb,
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    code_configuration=CodeConfiguration(
        code='./azure/infer',
        scoring_script='score.py'
    ),
    instance_type="Standard_DS1_v2",
    instance_count=1
)

In [65]:
ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

Instance type Standard_DS1_v2 may be too small for compute resources. Minimum recommended compute SKU is Standard_DS3_v2 for general purpose endpoints. Learn more about SKUs here: https://learn.microsoft.com/en-us/azure/machine-learning/referencemanaged-online-endpoints-vm-sku-list
Check: endpoint diamond-price-predictor-001 exists
[32mUploading infer (0.0 MBs): 100%|██████████| 3159/3159 [00:00<00:00, 23238.26it/s]
[39m

INFO:azure.identity._internal.interactive:InteractiveBrowserCredential.get_token succeeded
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from InteractiveBrowserCredential


...........................................................................

ManagedOnlineDeployment({'private_network_connection': None, 'provisioning_state': 'Succeeded', 'endpoint_name': 'diamond-price-predictor-001', 'type': 'Managed', 'name': 'blue', 'description': None, 'tags': {}, 'properties': {'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/5bcad9c4-40fb-4136-b614-cc90116dd8b3/providers/Microsoft.MachineLearningServices/locations/centralindia/mfeOperationsStatus/od:3150d3d4-458f-4007-bee4-187ff3e17685:3581e241-04f8-41dc-a63c-553b4c834be8?api-version=2023-04-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/5bcad9c4-40fb-4136-b614-cc90116dd8b3/resourceGroups/tf/providers/Microsoft.MachineLearningServices/workspaces/cloud-teach/onlineEndpoints/diamond-price-predictor-001/deployments/blue', 'Resource__source_path': None, 'base_path': '/home/pavankumargurazada/Desktop/GL/DSC/Week-5_ Model Serving/v2', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7ff293735d30>, 'model': '/subscriptions/5b

### Testing

In [66]:
(diamonds_df.drop(columns='price')
            .sample(100)
            .to_json('sample-data.json', orient='split', lines=False))

In [67]:
print(
    ml_client.online_endpoints.invoke(
        endpoint_name=online_endpoint_name,
        deployment_name="blue",
        request_file="sample-data.json"
    )
)

[152.9277669073601, 798.6152914735893, 4022.365207032341, 1439.691008250427, 1197.4015941284324, 1108.3858642166927, 1577.7379552616715, 3171.3579188826207, 4999.8662936063965, 2307.5045354318954, 2948.9061456475197, 2333.2150066103704, 4096.100902295317, 1889.494974623238, 1500.2930144413015, 1102.7523382470047, 1088.932965459935, 742.7988689915061, 2096.129780038896, 1019.0269711759134, 796.54852758088, 1054.7336263488771, 1038.3866050911945, 1223.8797379320433, 2780.2852277075385, 2999.0049581332178, 1199.2215664314206, 5194.840443474771, -61.119358780374895, 2493.298258199252, 1862.231771198827, 2346.728378275172, 1387.892208169646, 2402.153791313322, 4349.2777062581845, 1185.6498914772485, -132.04901106903355, 2282.9415083406657, 2918.3942297423755, 6629.414736625384, 3116.1294663900035, 1351.401920308371, 2531.62638093825, 5895.899333682611, 1193.1810363865072, 187.52219709164356, 561.986474339491, 1955.6022694136961, 775.9229815184757, 6974.977163181508, 5173.376336537412, 1152.

# Canary Deployment

An important scenario in model deployment is the need to upgrade an existing baseline model to a newer version. To ensure a careful transition from the existing model to the new version, a recommended approach is through a canary deployment. This method involves directing a controlled portion of the live traffic to the upgraded endpoint, followed by A/B testing to determine if the upgraded version performs better than the baseline on live data.

The canary deployment process starts by diverting a small percentage of live traffic, typically between 1% and 5%, to the upgraded version. Gradually, the traffic is increased if there are no errors. This approach allows for incremental testing and monitoring of the new model's performance in a real-world environment.

Let's take a closer look at how canary deployment works in action. We begin by creating two model variants, each representing one of the two models we estimated on the data.

## AWS

### Create variants

To create variants from the model binaries, we reference the container configuration used by the `SKLearnModel` objects. Containerization is a popular method to package the model and server, along with all the runtime requirements, into a standalone resource. This approach ensures that the server can be deployed easily on any virtual machine without the need for manual duplication of the configuration options required to run the server.

There are popular containerization tools available that allow us to quickly package all the runtime requirements into a reusable container. Two commonly used tools are [Docker](https://www.docker.com/) and [Podman](https://podman.io/). These tools simplify the process of creating containers, making it easier to manage and deploy the model and server components as a single unit.

In [68]:
model_dt.prepare_container_def()

INFO:sagemaker.image_uris:Defaulting to only supported image scope: cpu.


{'Image': '720646828776.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3',
 'Environment': {'SAGEMAKER_PROGRAM': 'inference.py',
  'SAGEMAKER_SUBMIT_DIRECTORY': 's3://sagemaker-deployment-examples/sagemaker-scikit-learn-2023-06-12-08-26-20-294/sourcedir.tar.gz',
  'SAGEMAKER_CONTAINER_LOG_LEVEL': '20',
  'SAGEMAKER_REGION': 'ap-south-1'},
 'ModelDataUrl': 's3://sagemaker-deployment-examples/2023-06-12-estimate-dt-003/output/model.tar.gz'}

In [69]:
model_gb.prepare_container_def()

INFO:sagemaker.image_uris:Defaulting to only supported image scope: cpu.


{'Image': '720646828776.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3',
 'Environment': {'SAGEMAKER_PROGRAM': 'inference.py',
  'SAGEMAKER_SUBMIT_DIRECTORY': 's3://sagemaker-deployment-examples/sagemaker-scikit-learn-2023-06-12-08-08-48-902/sourcedir.tar.gz',
  'SAGEMAKER_CONTAINER_LOG_LEVEL': '20',
  'SAGEMAKER_REGION': 'ap-south-1'},
 'ModelDataUrl': 's3://sagemaker-deployment-examples/2023-06-12-estimate-gb-003/output/model.tar.gz'}

As the above output indicates, both model objects reference an image `720646828776.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3` that is managed by AWS. By building this image into a container we get an environment where Python 3, scikit-learn 1.2.1 and its dependencies (e.g., numpy and scipy) are preinstalled. When this container is run, we get a python runtime that executes the script `inference.py` with all its requirements (i.e., packages and model data) copied over to this runtime.

Now that we have all the information on the infrastructure the model needs to fire predictions, we can register the two model binaries against a common endpoint as variants using the corresponding container configurations. 

We begin by registering the models and their container environments.

In [70]:
deployment_session.create_model(
    name='decision-tree-regressor',
    role=aws_role,
    container_defs=model_dt.prepare_container_def()
)

INFO:sagemaker.image_uris:Defaulting to only supported image scope: cpu.
INFO:sagemaker:Creating model with name: decision-tree-regressor


'decision-tree-regressor'

In [71]:
deployment_session.create_model(
    name='gradient-boosted-regressor',
    role=aws_role,
    container_defs=model_gb.prepare_container_def()
)

INFO:sagemaker.image_uris:Defaulting to only supported image scope: cpu.
INFO:sagemaker:Creating model with name: gradient-boosted-regressor


'gradient-boosted-regressor'

Now, we create two variants by referencing these two registered models.

In [72]:
variant1 = production_variant(
    model_name='decision-tree-regressor',
    instance_type="ml.m5.xlarge",
    initial_instance_count=1,
    variant_name="Variant1",
    initial_weight=0.95,
    volume_size=1
)

In [73]:
variant2 = production_variant(
    model_name='gradient-boosted-regressor',
    instance_type="ml.m5.xlarge",
    initial_instance_count=1,
    variant_name="Variant2",
    initial_weight=0.05,
    volume_size=1
)

In [74]:
(variant1, variant2)

({'ModelName': 'decision-tree-regressor',
  'VariantName': 'Variant1',
  'InitialVariantWeight': 0.99,
  'InitialInstanceCount': 1,
  'InstanceType': 'ml.m5.xlarge',
  'VolumeSizeInGB': 1},
 {'ModelName': 'gradient-boosted-regressor',
  'VariantName': 'Variant2',
  'InitialVariantWeight': 0.01,
  'InitialInstanceCount': 1,
  'InstanceType': 'ml.m5.xlarge',
  'VolumeSizeInGB': 1})

As we note above, initially the two variants are configured to receive 99% (decision tree regressor) and 1% (gradient boosted regressor) respectively.

### Deploy variants

Now we can deploy the variants against the same endpoint allowing `sagemaker` to route incoming traffic in the ratio 99% and 1% to the two variants.

In [75]:
canary_endpoint_name = "diamond-price-pred-2023-06-12"
print(f"EndpointName = {canary_endpoint_name}")

EndpointName = diamond-price-pred-2023-06-12


In [76]:
deployment_session.endpoint_from_production_variants(
    name=canary_endpoint_name, 
    production_variants=[variant1, variant2]
)

INFO:sagemaker:Creating endpoint-config with name diamond-price-pred-2023-06-12
INFO:sagemaker:Creating endpoint with name diamond-price-pred-2023-06-12


----!

'diamond-price-pred-2023-06-12'

We can verify the specification of the canary endpoint from the UI to ensure that the traffic flow is correctly configured.

### Test deployment

In [78]:
for invocation_num in range(100):
    
    sample_df = diamonds_df.sample(1)
    sample_Xtest = sample_df[features]
    
    response = runtime.invoke_endpoint(
        EndpointName=canary_endpoint_name,
        Body=sample_Xtest.to_csv(header=True, index=False).encode("utf-8"),
        ContentType="text/csv"
    )

We can check the traffic allocation patterns by looking at the invocation traffic to the endpoint on CloudWatch (expect a slight lag for data to land).

### Safe rollout

Once the updated variant is tested, we can slowly increase the weights assigned to the upgrade gradually pushing all the traffic over to the new variant.

In [84]:
sagemaker_client = boto3.Session().client('sagemaker')

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


In [92]:
sagemaker_client.update_endpoint_weights_and_capacities(
    EndpointName=canary_endpoint_name,
    DesiredWeightsAndCapacities=[
        {'VariantName': 'Variant1', 'DesiredWeight': 0.8},
        {'VariantName': 'Variant2', 'DesiredWeight': 0.2}
    ]
)

{'EndpointArn': 'arn:aws:sagemaker:ap-south-1:321112151583:endpoint/diamond-price-pred-2023-06-12',
 'ResponseMetadata': {'RequestId': '370aae92-806a-4d61-9851-2d887ebad59b',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '370aae92-806a-4d61-9851-2d887ebad59b',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '98',
   'date': 'Mon, 12 Jun 2023 08:55:44 GMT'},
  'RetryAttempts': 0}}

## Azure

### Create variants

In [96]:
registered_model_dt = ml_client.models.get(
    name="dt-diamond-price-predictor-june", 
    version=1
)

In [97]:
registered_model_dt.version

'1'

### Green deployment

In [98]:
green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name=online_endpoint_name,
    model=registered_model_dt,
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    code_configuration=CodeConfiguration(
        code='./azure/infer',
        scoring_script='score.py'
    ),
    instance_type="Standard_DS1_v2",
    instance_count=1
)

In [99]:
ml_client.online_deployments.begin_create_or_update(green_deployment).result()

Instance type Standard_DS1_v2 may be too small for compute resources. Minimum recommended compute SKU is Standard_DS3_v2 for general purpose endpoints. Learn more about SKUs here: https://learn.microsoft.com/en-us/azure/machine-learning/referencemanaged-online-endpoints-vm-sku-list
Check: endpoint diamond-price-predictor-001 exists
[32mUploading infer (0.0 MBs): 100%|██████████| 1578/1578 [00:00<00:00, 47906.80it/s]
[39m



.................

INFO:azure.identity._internal.interactive:InteractiveBrowserCredential.get_token succeeded
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from InteractiveBrowserCredential


.

INFO:azure.identity._internal.interactive:InteractiveBrowserCredential.get_token succeeded
INFO:azure.identity._credentials.default:DefaultAzureCredential acquired a token from InteractiveBrowserCredential


...........................................................

ManagedOnlineDeployment({'private_network_connection': None, 'provisioning_state': 'Succeeded', 'endpoint_name': 'diamond-price-predictor-001', 'type': 'Managed', 'name': 'green', 'description': None, 'tags': {}, 'properties': {'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/5bcad9c4-40fb-4136-b614-cc90116dd8b3/providers/Microsoft.MachineLearningServices/locations/centralindia/mfeOperationsStatus/od:3150d3d4-458f-4007-bee4-187ff3e17685:e55655b8-9aff-44cd-8e6f-99ce180cb55d?api-version=2023-04-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/5bcad9c4-40fb-4136-b614-cc90116dd8b3/resourceGroups/tf/providers/Microsoft.MachineLearningServices/workspaces/cloud-teach/onlineEndpoints/diamond-price-predictor-001/deployments/green', 'Resource__source_path': None, 'base_path': '/home/pavankumargurazada/Desktop/GL/DSC/Week-5_ Model Serving/v2', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7ff291388ac0>, 'model': '/subscriptions/

### Testing

At this stage, even though the endpoint is aware of a "green" version and we can invoke it, it is not yet receiving public traffic.

In [101]:
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="green",
    request_file="sample-data.json"
)

'[370.0, 646.2162162162163, 5003.076923076923, 1347.0588235294117, 991.1538461538462, 889.8809523809524, 1020.0, 3842.9411764705883, 4380.0, 2238.5185185185187, 2780.0, 2203.3333333333335, 3572.5, 1824.0, 1050.0, 843.6036036036036, 913.5416666666666, 923.3333333333334, 2239.6428571428573, 869.3069306930693, 763.3333333333334, 1150.0, 822.8571428571429, 1004.0, 3352.5, 3329.4594594594596, 750.0, 4380.0, 370.0, 2282.5, 1556.6666666666667, 2050.0, 929.0, 3066.6666666666665, 6880.0, 967.5, 488.8888888888889, 2157.5, 2500.0, 5736.666666666667, 2315.0, 1270.1980198019803, 2328.75, 7812.727272727273, 921.9469026548672, 390.0, 580.0, 1932.820512820513, 690.0, 7265.0, 2980.0, 1251.7391304347825, 2420.0, 2482.5, 3942.5, 5273.333333333333, 1776.6666666666667, 4100.0, 899.5652173913044, 2300.0, 16630.0, 2315.0, 1297.361111111111, 1457.142857142857, 1876.6666666666667, 5420.0, 2144.84375, 2967.5, 430.0, 6000.0, 690.0, 828.5227272727273, 660.0, 2191.3333333333335, 2260.0, 1795.7142857142858, 1570.0,

### Safe rollout

Once the green variant is tested, we can define the traffic proportions to be allocated dynamically, gradually increasing the traffic seen by the green endpoint, eventually rolling over completely.

In [102]:
endpoint.traffic = {"blue": 99, "green": 1}

In [103]:
ml_client.begin_create_or_update(endpoint).result()

ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://diamond-price-predictor-001.centralindia.inference.ml.azure.com/score', 'openapi_uri': 'https://diamond-price-predictor-001.centralindia.inference.ml.azure.com/swagger.json', 'name': 'diamond-price-predictor-001', 'description': 'Model to predict diamond prices', 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/5bcad9c4-40fb-4136-b614-cc90116dd8b3/resourcegroups/tf/providers/microsoft.machinelearningservices/workspaces/cloud-teach/onlineendpoints/diamond-price-predictor-001', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/5bcad9c4-40fb-4136-b614-cc90116dd8b3/providers/Microsoft.MachineLearningServices/locations/centralindia/mfeOperationsStatus/oe:3150d3d4-458f-4007-bee4-187ff3e17685:cf18496f-4d33-406d-a0a8-d998a7fe8a45?api-version=2022-02-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/5bcad9c4-40fb-4136-b614-cc901

In [104]:
for i in range(20):
    ml_client.online_endpoints.invoke(
        endpoint_name=online_endpoint_name,
        request_file="sample-data.json"
    )

# Cleanup

## AWS

To avoid costs incurred on idle endpoints during the testing phase, it is a good practise to delete end points. Production end points should ideally be generated and maintained by a separate team (even if they are using the same code). Data Science teams should not have edit access to production endpoints. 

In [93]:
deployment_session.delete_endpoint(canary_endpoint_name)

INFO:sagemaker:Deleting endpoint with name: diamond-price-pred-2023-06-12


In [94]:
deployment_session.delete_endpoint_config(canary_endpoint_name)

INFO:sagemaker:Deleting endpoint configuration with name: diamond-price-pred-2023-06-12


In [95]:
for model_name in ['decision-tree-regressor', 'gradient-boosted-regressor']:
    deployment_session.delete_model(model_name)

INFO:sagemaker:Deleting model with name: decision-tree-regressor
INFO:sagemaker:Deleting model with name: gradient-boosted-regressor


## Azure

In [105]:
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

<azure.core.polling._poller.LROPoller at 0x7ff293d910a0>

..............................................................................