# MLOps with CDK
This notebook guides the reader through the steps required to show the capabilities of doing MLOps with <a href='https://docs.aws.amazon.com/cdk/latest/guide/home.html'>CDK</a>. We choose a simple use case of classifying hand-written digits to create a Machine Learning model that is registered on SageMaker registry and manually approved by the user. The approval step kicks-off the automated CDK-powered pipeline to deploy this model in Production. 

#### Index
[Data](#data)  
[Training](#training)  
[Export Artefacts](#export)  
[Build Container](#container)  
[Register Model](#register)   
[Approve Model](#approve)  

## Variables

In [None]:
import boto3 

# Name of the Model Package Group
APP_PREFIX = 'cdk-blog'
region = 'eu-west-1'
account = boto3.client('sts').get_caller_identity().get('Account')

## Data <a id='data'></a>

For the purpose of this notebook we use the <a href='https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html'>Boston</a>  dataset from SKLearn, consisting of 506 samples of houses where the goal is to predict their prices (Regression) given 13 other features.

In [None]:
from sklearn.datasets import load_boston

# Load the digits dataset
boston = load_boston()

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Create the flattened feature array and the target array
X = boston.data
y = boston.target

# Perform standard train-test split with a 20% test size
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
sc = StandardScaler()

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Training <a id='training'></a>

We create a simple Random Forest regressor with default hyperparameters, and fit it to the Training set.  
The obtained Test set RMSE should average around 9.

In [None]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor().fit(X_train, y_train)

In [None]:
from sklearn.metrics import mean_squared_error

# Evaluate the model Accuracy on the Training and Test set
train_mse = mean_squared_error(y_train, model.predict(X_train))
test_mse = mean_squared_error(y_test, model.predict(X_test))

print(f'MSE on the Training set: {train_mse:.2f}')
print(f'MSE on the Test set: {test_mse:.2f}')

## Export Artefacts <a id='export'></a>
We export the model artefact by saving the Estimator using the joblib library. This file is then compressed to a .tar.gz format as required by SageMaker containers, and uploaded to the default S3 bucket associated to the current SageMaker Session. Finally, we save a simple test case locally to test the Inference process at the end.

Precondition  
If you are executing this notebook using Sagemaker Notebook instance or Sagemaker Studio instance, please make sure that it has IAM role used with AmazonSageMakerFullAccess policy.

In [None]:
import os
import joblib
from pathlib import Path
import tarfile

# Define the path to the 'model' folder within the current directory, and create it if not present
model_dir = Path("cdk_pipelines", "local", "model")
model_dir.mkdir(exist_ok=True, parents=True)

# Save the model in a joblib format using the joblib library
model_joblib_directory = model_dir / "model.joblib"
joblib.dump(model, str(model_joblib_directory))
print("Model saved to {}".format(model_joblib_directory))

model_output_directory = model_dir / "model.tar.gz"
with tarfile.open(model_output_directory, "w:gz") as tar:
    tar.add(model_joblib_directory, arcname=os.path.basename(model_joblib_directory))

In [None]:
import boto3

s3_client = boto3.client('s3')

bucket = f"{APP_PREFIX}-{account}"

try:
    # Get the default S3 bucket associated to the current SageMaker session
    response = s3_client.create_bucket(
        Bucket=bucket,
        CreateBucketConfiguration={
            'LocationConstraint': region
        },
    )

    print(response)

    response = s3_client.put_bucket_encryption(
        Bucket=bucket,
        ServerSideEncryptionConfiguration={
            'Rules': [
                {
                    'ApplyServerSideEncryptionByDefault': {
                        'SSEAlgorithm': 'AES256'
                    },
                    'BucketKeyEnabled': True
                },
            ]
        },
    )
except:
    print(f"bucket already exists with name - {bucket}")
            

# Upload the model artefact to the S3 bucket using the same prefix as the local file
s3_client.upload_file(str(model_output_directory), bucket, str(model_output_directory))

We store a subset of the data to be used to test the deployed endpoint at a later stage

In [None]:
import json

# Dump a simple test case to a .json file, used to showcase Inference
test_dir = Path("cdk_pipelines", "local", "test")
test_dir.mkdir(exist_ok=True, parents=True)

payload = {'features': X_test[0].tolist()}

with open('./cdk_pipelines/local/test/payload.json', 'w') as f:
    json.dump(payload, f)

## Build container <a id='container'></a>

- laurens to add explanations

In [None]:
!pygmentize cdk_pipelines/code/container/Dockerfile

In [None]:
!pygmentize cdk_pipelines/code/container/build_and_push.sh

In [None]:
!cd cdk_pipelines/code/container; bash build_and_push.sh cdk-blog

## Register Model <a id='register'></a>
After uploading the artefacts to S3, we leverage <a href='https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html'>SageMaker Model Registry</a> to manage model versions and deploy them to production. The steps we perform are:
* We create a SageMaker Model Package Group for our current use case
* We register the newly trained model to this Group
* We approve the new Model 

Note  
The following steps can be performed either using the AWS SDK for Python 3 (boto3) or in SageMaker Studio through the UI. We show an example for both by first creating the Model Group using boto3 and then approving the new model using the SageMaker Studio.

In [None]:
sm_client = boto3.client("sagemaker")

# Define the input payload to create a Model Package Group with name APP_PREFIX
model_package_group_input_dict = {
    "ModelPackageGroupName" : APP_PREFIX,
    "ModelPackageGroupDescription" : f"Model package group for {APP_PREFIX}"
}

try:
    create_model_pacakge_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)
    print("ModelPackageGroup Arn : {}".format(create_model_pacakge_group_response["ModelPackageGroupArn"]))
except Exception:
    print(f"Model Package Group {APP_PREFIX} already created")

In [None]:
# Define the ECR image with the Inference code for the model
account = boto3.client('sts').get_caller_identity().get('Account')
INFERENCE_IMAGE = f'{account}.dkr.ecr.{region}.amazonaws.com/cdk-blog:latest' # check your docker image was pushed to the right region

# Create the Inference Specification for the model
modelpackage_inference_specification = {
    "InferenceSpecification": {
        "Containers": [
            {
                "Image": INFERENCE_IMAGE,
            }
        ],
        "SupportedContentTypes": ["application/x-image"],
        "SupportedResponseMIMETypes": ["application/json"],
    }
}

# Add to the Specification the url where the model is stored
model_url = os.path.join("s3://", bucket, model_output_directory)
modelpackage_inference_specification["InferenceSpecification"]["Containers"][0]["ModelDataUrl"] = model_url

# Define the input payload to register a Model Package in the Group
create_model_package_input_dict = {
    "ModelPackageGroupName": APP_PREFIX,
    "ModelPackageDescription": f"Model for {APP_PREFIX} stored at {model_url}",
    "ModelApprovalStatus": "PendingManualApproval",
}
create_model_package_input_dict.update(modelpackage_inference_specification)

# Invoke the SageMaker client to register the model with the given payload
create_mode_package_response = sm_client.create_model_package(**create_model_package_input_dict)

# Fetch the ARN of the Model Package
model_package_arn = create_mode_package_response["ModelPackageArn"]
print("ModelPackage Version ARN : {}".format(model_package_arn))

In [None]:
# Verify that the Model Package has been published to the Group
sm_client.list_model_packages(ModelPackageGroupName=APP_PREFIX)

## Approve Model <a id='approve'></a>
If you followed all the steps correctly, you will have registered a model in the Model Package Group you created. Now, you have to approve the model before triggering the pipeline that deploys it to Production.  

To approve the model using SageMaker Studio, follow the simple steps highlighted <a href='https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-approve.html'>here</a>. Otherwise, execute the following cell to approve it using boto3.

Once the model is approved and you have deploy the model deployment pipeline, this step will automatically trigger the deployment pipeline and deploy the model. 

In [None]:
# Define the input payload to approve a Model Package in the Group
model_package_update_input_dict = {
    "ModelPackageArn" : model_package_arn,
    "ModelApprovalStatus" : "Approved"
}

# Invoke the SageMaker client to approve the model with the given payload
model_package_update_response = sm_client.update_model_package(**model_package_update_input_dict)