# Automate deployments with Infrastructure-as-Code and CI/CD pipelines

1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Creating a pipeline to build our Docker container image](#Creating-a-pipeline-to-build-our-Docker-container-image)
1. [Adding the Step Functions to the infrastructure pipeline](#Adding-the-Step-Functions-to-the-infrastructure-pipeline)
1. [Creating the ML Training and Deployment Pipeline](#Creating-the-ML-Training-and-Deployment-Pipeline)
1. [Parametrizing our ML Workflow further](#Parametrizing-our-ML-Workflow-further)

## Introduction

In this notebook, we will use the content developed in lab 3 and lab 4, to create the model training and model hosting docker image, as well as the step function based ML workflows and use CodePipeline pipelines to ensure that ML models are consistently and repeatedly deployed using a pipeline that defines how models move from development to production. Human quality gates can be included in the pipeline to have humans evaluate if a model is ready to deploy to a target environment.

There will be an infrastructure-related pipeline, and a training and model management pipeline.

## Setup

### Creating a pipeline to build our Docker container image

1. Download the Cloudformation template under `mlops/infra_pipeline.yml`
2. Open the Amazon [Cloudformation console](https://console.aws.amazon.com/cloudformation/). 
3. Select **Create stack** and choose **With new resources (standard)**
4. Choose **Template is ready** and **Upload a template file**. Search for the downloaded template above.
5. Click **Next** and choose a stack-name, e.g. `sagemaker-immersionday-mlops-infra`
6. Note the two parameters for the path of the **Buildspec** file used by CodeBuild, and ModelName.

Run the cell below to check the contents of our Cloudformation template

In [None]:
!cat mlops/infra_pipeline.yml | pygmentize -l yaml

Note above that the pipeline consists of a 'Source' stage and a 'Build' stage, as well as an S3 bucket to hold the pipeline's artifacts as they are pushed through its different stages. We are using the same role created in our main workshop stack `MLOpsRole` and the `Source` stage is the CodeCommit repo attached to this notebook, `master` branch. The actual build commands performed by the CodeBuild **build project** are described in our buildspec file under `mlops/docker_buildspec.yml`:

In [None]:
!cat mlops/docker_buildspec.yml | pygmentize -l yaml

### Checking docker build results

After creating the stack above, the pipeline will run once with the contents of the `master` branch of our CodeCommit repo. The docker image will be built, tagged according to the version defined in the `container/version.txt` file and pushed to the same ECR repository created in Lab3 - **sagemaker-decision-trees**.

Every time a new commit is made to the `master` branch of this repo, the pipeline will run and a new docker image published to the ECR repo.

Change the version of the docker image, and push to CodeCommit - a new image in ECR with the new version tag should appear after the pipeline runs - [ECR console](https://console.aws.amazon.com/ecr)

In [None]:
!printf "VERSION=0.2.0\nNAME=sagemaker-decision-trees" > container/version.txt
!git add container/version.txt
!git commit -m "Change version of container image"
!git push

### Adding the Step Functions to the infrastructure pipeline

The state machines created in our previous notebook should also be versioned and deployed as IAC. For that we can use a function to output the Cloudformation template using the Step Functions SDK, that describes the state machine, as we did in the final step on the previous lab.

Check the Step Functions YAML Cloudformation template:

In [None]:
!cat mlops/train_workflow.yml | pygmentize -l yaml

Let's add the template to our Git repository, to be used in our updated infrastructure pipeline. But first edit the template to change the name of the state machine, not to conflict with the one created previously.

After you've changed the `StateMachineName` property in the template `mlops/train_workflow.yml` above, commit it to our CodeCommit repo.

In [None]:
!git add mlops/train_workflow.yml
!git commit -m "Add first version of training state machine"
!git push

You can see the various steps in the state machine under `DefinitionString`. Let's update our infrastructure pipeline that builds our container image, to include a step that deploys this Cloudformation template. Note that this could also be part of a separate pipeline, and the best workflow for building and deploying changes to the infrastructure will depend from case to case.

1. Download the Cloudformation template under `mlops/infra_pipeline_stepfunctions.yml`
2. Open the Amazon [Cloudformation console](https://console.aws.amazon.com/cloudformation/). 
3. Select the stack previously created and click on **Update**.
4. Choose **Replace current template**. Search for the downloaded template above.
5. Click **Next** and keep the same stack name and parameters defined before.
6. Click **Next** again and finally **Update**

After the Stack finishes updating, you can check the changed pipeline in the CodePipeline console. After the pipeline finishes running, an extra stack should be visible in the Cloudformation console with an extra **train_workflow** appended to the first pipeline stack name.

### Creating the ML Training and Deployment Pipeline

CodePipeline has a native integration with Step Functions. Furthermore, you can use the codepipeline zipped artifact in S3 to provide input parameters to the pipeline stages.

We will now create a separate pipeline that triggers on the upload to a specific S3 location of a set of parameters for a training job. This pipeline will execute the Step Functions state machine deployed above.

Fill in the variables below, to pass to the training pipeline Cloudformation template (we will use the SageMaker default bucket for the location of the trigger). The TrainingStepFunctionName is the `StateMachineName` defined above in our Step Functions template.

In [None]:
import sagemaker # Amazon SageMaker's Python SDK provides many helper functions
import boto3

session = sagemaker.Session()
region = boto3.Session().region_name
bucket = session.default_bucket()

S3Bucket=bucket
S3Key="mlops/config.zip"
TrainingStepFunctionName="<Name of state machine created in Step Functions by our infra pipeline>"

Now launch the stack creation:

In [None]:
!aws cloudformation deploy --template-file ./mlops/train_pipeline.yml --stack-name sm-id-train-pipeline --parameter-overrides S3DataBucket=$S3Bucket S3DataKey=$S3Key TrainingStepFunctionName=$TrainingStepFunctionName

You can check the new pipeline in the [Codepipeline](https://console.aws.amazon.com/codepipeline) and it will have failed in **Source** action, since we haven't created the trigger .zip file yet. Let's do that now. From the Step Functions state machine we previously created, we know that we can pass three parameters at runtime:

```
'JobName': str, 
'ModelName': str,
'EndpointName': str
```

These are the parameters that need to be included in the config.json file for the training to be launched successfuly.

In [None]:
import datetime, json

timestamp = datetime.datetime.now(tz=None).strftime("%d-%m-%Y-%H-%M-%S")

params = {
    'JobName': "mlops-job-{}".format(timestamp),
    'ModelName': "mlops-model-{}".format(timestamp),
    'EndpointName': "mlops-endpoint-{}".format(timestamp)
}

with open('config.json', 'w') as fp:
    json.dump(params, fp)

Now, let's zip the config file and upload it to the trigger location:

In [None]:
!zip config.zip config.json

!aws s3 cp config.zip s3://$S3Bucket/$S3Key

You can follow the [CodePipeline](https://console.aws.amazon.com/codepipeline) execution and [State Machine](https://console.aws.amazon.com/states) execution in the consoles.

### Parametrizing our ML Workflow further

Currently we are allowing users to pass three variables at run-time, by specifying in a JSON-schema. We could do updates to our State Machine cloudformation template to allow more variables to be passed at run-time - e.g. instance type for training, container used for training, hyperparameters and so on.

- Edit the file **mlops/train_workflow.yml**
- Replace `ml.m4.4xlarge` with `$$.Execution.Input['InstanceType']`
- Replace `813361260812.dkr.ecr.eu-central-1.amazonaws.com/xgboost:1` with `$$.Execution.Input['ContainerImage']`
- Replace

```json
    "HyperParameters": {
      "max_depth": "5",
      "eta": "0.2",
      "gamma": "4",
      "min_child_weight": "6",
      "subsample": "0.8",
      "silent": "0",
      "objective": "binary:logistic",
      "num_round": "100"
    }
```

    with:

```json
    "HyperParameters": {
      "max_depth.$": "$$.Execution.Input['max_depth']",
      "eta.$": "$$.Execution.Input['eta']",
      "gamma.$": "$$.Execution.Input['gamma']",
      "min_child_weight.$": "$$.Execution.Input['min_child_weight']",
      "subsample.$": "$$.Execution.Input['subsample']",
      "silent.$": "$$.Execution.Input['silent']",
      "objective.$": "$$.Execution.Input['objective']",
      "num_round.$": "$$.Execution.Input['num_round']"
    }
```

Don't forget to add .$ to every variable in the template that will be resolved at runtime:

e.g. `"InstanceType.$": "$$.Execution.Input['InstanceType']"`

Now, commit and push the modified state machine template. The Cloudformation stack in our infra pipeline will update with the new set of runtime parameters. 

In [None]:
!git add mlops/train_workflow.yml
!git commit -m "Add more runtime inputs to training state machine"
!git push

We can now trigger a new ML training and deployment workflow, using our custom Docker container image and a different instance type.

In [None]:
timestamp = datetime.datetime.now(tz=None).strftime("%d-%m-%Y-%H-%M-%S")

params = {
    'JobName': "mlops-job-{}".format(timestamp),
    'ModelName': "mlops-model-{}".format(timestamp),
    'EndpointName': "mlops-endpoint-{}".format(timestamp),
    'InstanceType': "ml.m5.4xlarge",
    'ContainerImage': "<account-id>.dkr.ecr.<region>.amazonaws.com/sagemaker-decision-trees:0.1.0",
    'HyperParameters': {
      "max_depth": "10",
      "eta": "0.4",
      "gamma": "4",
      "min_child_weight": "5",
      "subsample": "0.8",
      "silent": "0",
      "objective": "binary:logistic",
      "num_round": "50"
    }
}

with open('config.json', 'w') as fp:
    json.dump(params, fp)
    
!zip config.zip config.json

!aws s3 cp config.zip s3://$S3Bucket/$S3Key

The CodePipeline is now running again - you can confirm that the training job is using the container defined above and our changed hyperparameters by navigating to the [SageMaker training jobs console](https://eu-central-1.console.aws.amazon.com/sagemaker/home?region=eu-central-1#/jobs)

If you navigate to [SageMaker Models](https://eu-central-1.console.aws.amazon.com/sagemaker/home?region=eu-central-1#/models), you will find the trained model with the container used in training, as well as the location of the model artifact on S3. We could also track each of the training runs by using [SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments-create.html) when launching the training job. Notice that the new model uses our custom container.

At the end of the run, an endpoint is created to serve inference requests in [Endpoints](https://eu-central-1.console.aws.amazon.com/sagemaker/home?region=eu-central-1#/endpoints).

The endpoint creation could be extracted from the State Machine, and created as a Cloudformation template in a next stage of the pipeline - the same template could then be deployed to a Prod account after an approval stage in the pipeline.

---