# Step 4: Add a model building CI/CD pipeline

<div class="alert alert-warning"> This notebook has been last tested on a SageMaker Studio JupyterLab instance using the <code>SageMaker Distribution Image 3.0.1</code> and with the SageMaker Python SDK version <code>2.245.0</code></div>

In this step you create an automated CI/CD pipeline for model building using [Amazon SageMaker Projects](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects.html). 

**From idea to production in six steps:**
||||
|---|---|---|
|1. |Experiment in a notebook ||
|2. |Scale with SageMaker AI processing jobs and SageMaker SDK ||
|3. |Operationalize with ML pipeline, model registry, and feature store ||
|4. |Add a model building CI/CD pipeline |**<<<< YOU ARE HERE**|
|5. |Add a model deployment pipeline ||
|6. |Add model and data monitoring ||

You are going to use [SageMaker AI-Provided Project Templates](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates-sm.html#sagemaker-projects-templates-code-commit) to provision a CI/CD workflow automation with [AWS CodePipeline](https://aws.amazon.com/codepipeline/) and a GitHub code repository.

SageMaker project templates offer you the following choice of code repositories, workflow automation tools, and pipeline stages:
- **Code repository**: Third-party Git repositories such as GitHub and Bitbucket are supported
- **CI/CD workflow automation**: AWS CodePipeline or Jenkins
- **Pipeline stages**: Model building, training, and deployment

<div class="alert alert-info"> Make sure you using <code>Python 3</code> kernel in JupyterLab for this notebook.</div>

In [94]:
import boto3
import sagemaker 
import json
from time import gmtime, strftime, sleep
from IPython.display import HTML

In [7]:
%store -r 

%store

try:
    initialized
except NameError:
    print("+++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOU HAVE TO RUN 00-start-here notebook   ")
    print("+++++++++++++++++++++++++++++++++++++++++++++++++")

Stored variables and their in-db values:
baseline_s3_url                         -> 's3://sagemaker-us-east-1-906545278380/from-idea-t
bucket_name                             -> 'sagemaker-us-east-1-906545278380'
bucket_prefix                           -> 'from-idea-to-prod/xgboost'
dataset_feature_group_name              -> 'from-idea-to-prod-12-06-30-53'
dataset_file_local_path                 -> 'data/bank-additional/bank-additional-full.csv'
domain_id                               -> 'd-igloxuzrs3z2'
evaluation_s3_url                       -> 's3://sagemaker-us-east-1-906545278380/from-idea-t
experiment_name                         -> 'from-idea-to-prod-experiment-11-21-33-07'
feature_store_bucket_prefix             -> 'from-idea-to-prod/feature-store'
initialized                             -> True
input_s3_url                            -> 's3://sagemaker-us-east-1-906545278380/from-idea-t
mlflow_arn                              -> 'arn:aws:sagemaker:us-east-1:906545278380:mlflow

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
sm = boto3.client("sagemaker")

## Create an MLOps project

### Preparation

<div class="alert alert-info"><b>Prerequisites:</b> You need to establish the AWS CodeStar connection from your AWS account to your GitHub user or organization as described below.<br/><b>Add a tag with the key <code>sagemaker</code> and value <code>true</code> to this AWS CodeStar connection.</b></div>

For the detailed instructions see [Walk Through a SageMaker AI MLOps Project Using Third-party Git Repos](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough-3rdgit.html) in the SageMaker Developer Guide.

<div class="alert alert-info">You need to create two new empty private repositories in your GitHub account. One repository is for model build pipeline and another one for the model deploy pipeline. For example, you can name the repositories <code>model-build</code> and <code>model-deploy</code></div>

#### Set up the GitHub connection

You need to have a GitHub personal account that you can access and connect via [AWS CodeConnection connection](https://docs.aws.amazon.com/dtconsole/latest/userguide/welcome-connections.html).

Connect to your GitHub by following the following instructions (see more details in the documentation [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough-3rdgit.html#sagemaker-proejcts-walkthrough-connect-3rdgit)).

We start by creating two empty repositories in your GitHub private account. These two repositories will be populated with the seed code for the model build workflow, and the model deploy workflow. Navigate to [GitHub.com](https://github.com/)  and go to Your repositories from the user drop down menu.
Create a model-build repository.

![](img/github-model-build.png)

Repeat the sme process for the model-deploy repository

![](img/github-model-deploy.png)

Now you need to connect your private GitHub account to the AWS account via [Connections](https://docs.aws.amazon.com/dtconsole/latest/userguide/welcome-connections.html)  (part of the Developer Tools within AWS) to enable the workflows triggers on code changes. Navigate to [Connections](https://console.aws.amazon.com/codesuite/settings/connections) and create a new connection.

![](img/crete-connection.png)

Now select GitHub as the provider, assign it a name, e.g. `mlops-connection`, and add a new tag, where the key is sagemaker and the value is true.

![](img/github-connection.png)

You will be prompted to authenticate into your account in GitHub. As a best practise, we reccomend to then to install a new App, so you can limit its own permissions to only the two repositories, i.e., model-build and model-deploy we have just created.

![](img/install-github-app.png)
<img src="img/connection-permissions.png" width="500" height="800" alt="Description">

Create the connection and copy paste the ARN in the cell below.
N.B. this whole process can also be done for any of the supported providers as long as you create the empty repositories









In [62]:
# set this variable to the ARN of your code connection your created
code_connection_arn = <SET TO THE ARN OF THE CREATED CODE CONNECTION>

You can create a project programmatically in this notebook - **Option 1** or in Studio UI - **Option 2**.

Option 1 is recommended as it has no dependency on the UX</br>
Option 2 is given to demonstrate [**Create Project** UI flow](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-create.html).

### Option 1: Create project programmatically - recommended
In this section you use `boto3` to create an MLOps project via a SageMaker API.

In [63]:
sc = boto3.client("servicecatalog")

sc_provider_name = "Amazon SageMaker"
sc_product_name = "MLOps template for model building, training, and deployment with third-party Git repositories using CodePipeline"

In [64]:
# find a Service Catalog product with the specific SageMaker project template
p_ids = [p['ProductId'] for p in sc.search_products(
    Filters={
        'FullTextSearch': [sc_product_name]
    },
)['ProductViewSummaries'] if p["Name"]==sc_product_name]

In [65]:
p_ids

['prod-cwbsact3annui']

In [66]:
# If you get any exception from this code, go to the Option 2 and create a project in Studio UI
if not len(p_ids):
    raise Exception("No Amazon SageMaker ML Ops products found!")
elif len(p_ids) > 1:
    raise Exception("Too many matching Amazon SageMaker ML Ops products found!")
else:
    product_id = p_ids[0]
    print(f"ML Ops product id: {product_id}")

ML Ops product id: prod-cwbsact3annui


In [67]:
# output what this project is about
sc.describe_product(Id=product_id)['ProductViewSummary']['ShortDescription']

'Use this template to automate the entire model lifecycle that includes both model buidling and deployment workflows. Ideally suited for continuous integration and continuous deployment (CI/CD) of ML models. Process data, extract features, train and test models, and register them in the model registry. Attach your own Git repository to the project for checking in and managing code versions. Kick off the model deployment workflow by approving the model registered in the model registry for deployment either manually or automatically. You can customize the seed code and the configuration files to suit your requirements. AWS CodePipeline is used to orchestrate the model deployment.\n\nModel building pipeline: SageMaker Pipelines\nCode repository: Third party Git\nOrchestration: AWS CodePipeline\n'

In [68]:
# get the latest template version
provisioning_artifact_id = sorted(
    [i for i in sc.list_provisioning_artifacts(
        ProductId=product_id
    )['ProvisioningArtifactDetails'] if i['Guidance']=='DEFAULT'],
    key=lambda d: d['Name'], reverse=True)[0]['Id']

In [69]:
provisioning_artifact_id

'pa-jt7oyaklc3eym'

In [70]:
sc.describe_provisioning_artifact(ProductId=product_id, ProvisioningArtifactId=provisioning_artifact_id)

{'ProvisioningArtifactDetail': {'Id': 'pa-jt7oyaklc3eym',
  'Name': 'v2.0',
  'Description': 'Adding error handling for access denied for seed code checkin',
  'Type': 'CLOUD_FORMATION_TEMPLATE',
  'CreatedTime': datetime.datetime(2024, 12, 10, 5, 6, 31, tzinfo=tzlocal()),
  'Active': True,
  'Guidance': 'DEFAULT'},
 'Info': {'TemplateUrl': 'https://s3.us-east-1.amazonaws.com/ciclo-us-east-1-prod-product-templates/model_build_deploy_toolchain_3p_git_template.yml'},
 'Status': 'AVAILABLE',
 'ResponseMetadata': {'RequestId': '1166418b-fb61-4e1b-b6a4-9cb05bd4b008',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '1166418b-fb61-4e1b-b6a4-9cb05bd4b008',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '414',
   'date': 'Fri, 14 Feb 2025 11:02:36 GMT'},
  'RetryAttempts': 0}}

In [75]:
# set unique project name
project_name = f"mlops-{strftime('%m-%d-%H-%M-%S', gmtime())}"

Set the parameters of the project template. Set the variables `model_build_code_repository_full_name` and `model_deploy_code_repository_full_name` to the full names of your GitHub repositories. These must be new empty repositories.

In [76]:
# Branch name for the Model Building and Training Code Repository
model_build_code_repository_branch = 'main'
# Full repository name of the Model Building and Training Code Repository, which would be username/reponame or organizationname/reponame
model_build_code_repository_full_name = <ENTER YOUR FULL GITHUB REPO FOR MODEL BUILD NAME HERE> # e.g. username/reponame

# Branch name for the Model Deployment Code Repository
model_deploy_code_repository_branch = 'main'
# Full repository name of the Model Deployment Code Repository, which would be username/reponame or organizationname/reponame
model_deploy_code_repository_full_name = <ENTER YOUR FULL GITHUB REPO FOR MODEL DEPLOY NAME HERE> # e.g. username/reponame

In [77]:
# set project parameters
project_parameters = [
    {
        'Key': 'ModelBuildCodeRepositoryBranch',
        'Value': model_build_code_repository_branch,
    },
    {
        'Key': 'ModelBuildCodeRepositoryFullname',
        'Value': model_build_code_repository_full_name,
    },
    {
        'Key': 'ModelDeployCodeRepositoryBranch',
        'Value': model_deploy_code_repository_branch,
    },
    {
        'Key': 'ModelDeployCodeRepositoryFullname',
        'Value': model_deploy_code_repository_full_name,
    },
        {
        'Key': 'CodeConnectionArn',
        'Value': code_connection_arn,
    },
]

Finally, create a SageMaker project from the service catalog product template:

In [91]:
print(f'''Creating a {project_name} using {sc_product_name} with the following parameters:
{json.dumps(project_parameters, indent=2)}
''')

Creating a mlops-02-14-11-03-33 using MLOps template for model building, training, and deployment with third-party Git repositories using CodePipeline with the following parameters:
[
  {
    "Key": "ModelBuildCodeRepositoryBranch",
    "Value": "main"
  },
  {
    "Key": "ModelBuildCodeRepositoryFullname",
    "Value": "yevgeniyilyin/sagemaker-ai-model-build-2"
  },
  {
    "Key": "ModelDeployCodeRepositoryBranch",
    "Value": "main"
  },
  {
    "Key": "ModelDeployCodeRepositoryFullname",
    "Value": "yevgeniyilyin/sagemaker-ai-model-deploy-2"
  },
  {
    "Key": "CodeConnectionArn",
    "Value": "arn:aws:codeconnections:us-east-1:906545278380:connection/f76f091f-f02a-4390-8abd-184d1735ca3a"
  }
]



In [78]:
# create SageMaker project
r = sm.create_project(
    ProjectName=project_name,
    ProjectDescription="Model build and deploy project",
    ServiceCatalogProvisioningDetails={
        'ProductId': product_id,
        'ProvisioningArtifactId': provisioning_artifact_id,
        'ProvisioningParameters': project_parameters
    },
)

print(r)
project_id = r["ProjectId"]

{'ProjectArn': 'arn:aws:sagemaker:us-east-1:906545278380:project/mlops-02-14-11-03-33', 'ProjectId': 'p-ca7phmcraepa', 'ResponseMetadata': {'RequestId': '954c8ede-e181-4959-8ed0-f04946e002fb', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '954c8ede-e181-4959-8ed0-f04946e002fb', 'content-type': 'application/x-amz-json-1.1', 'content-length': '115', 'date': 'Fri, 14 Feb 2025 11:03:38 GMT'}, 'RetryAttempts': 0}}


<div class="alert alert-info">Wait until project creation is completed by running the next cell</div>




In [84]:
# Project creation takes about 3-5 min
while sm.describe_project(ProjectName=project_name)['ProjectStatus'] not in ['CreateCompleted', 'CreateFailed']:
    print("Waiting for project creation completion")
    sleep(10)
    
print(f"MLOps project {project_name} creation completed")

MLOps project mlops-02-14-11-03-33 creation completed




In [85]:
assert sm.describe_project(ProjectName=project_name)['ProjectStatus'] == 'CreateCompleted', 'Project status must be CreateCompleted!'

### End of Option 1: Create project programmatically
Now you have provisioned a project template in your SageMaker environment. Navigate to the section **Configure the MLOps project**.

---

### Option 2: Create a project in Studio UI
<div class="alert alert-info"><b>Skip this section if you have arealdy created a project programmatically</b></div>

<div class="alert alert-info">You need to complete the <b>Preparation</b> step before creating an MLOps project</div>

Follow the [instructions for **Step 2: Create the Project**](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough-3rdgit.html#sagemaker-proejcts-walkthrough-create-3rdgit) in the Developer Guide to create an MLOps CI/CD project.

For the template choose the **Model building, training, and deployment with third-party Git repositories using CodePipeline**:

![](img/mlops-project-name.png)

Wait until project is created.

### Resolve issues with project creation

#### Error messages
❗ If you see an error message similar to:
```
Your project couldn't be created
Studio encountered an error when creating your project. Try recreating the project again.

CodeBuild is not authorized to perform: sts:AssumeRole on arn:aws:iam::XXXX:role/service-role/AmazonSageMakerServiceCatalogProductsCodeBuildRole (Service: AWSCodeBuild; Status Code: 400; Error Code: InvalidInputException; Request ID: 4cf59a54-0c59-476a-a970-0ac656db4402; Proxy: null)
```

see steps 5-6 of [SageMaker Studio Permissions Required to Use Projects](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-studio-updates.html). Make sure you have all required project roles listed in the **Apps** card under **Projects**. 

Alternatively, you can create the required roles by using the provided CloudFormation template [`cfn-templates/sagemaker-project-templates-roles.yaml`](cfn-templates/sagemaker-project-templates-roles.yaml). 
Run in the repository clone directory from the command line terminal where you have the corresponding permissions:

```sh
aws cloudformation deploy \
    --template-file cfn-templates/sagemaker-project-templates-roles.yaml \
    --stack-name sagemaker-project-template-roles \
    --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
    --parameter-overrides \
    CreateCloudFormationRole=YES \
    CreateCodeBuildRole=YES \
    CreateCodePipelineRole=YES \
    CreateEventsRole=YES \
    CreateProductsExecutionRole=YES 
```

### End of Option 2: Create a project in Studio UI
Now when you have the project created, move to the section **Configure the MLOps project**.

---

## Configure the MLOps project
The provisioned MLOps project implements two CodePipeline pipelines. One is for model building and another for model deployment. The project also populates two GitHub repositories your specified with seed code. This notebook explains and configures the model building pipeline. The next notebook `05-deploy` deals with the second part - the model deployment pipeline.

### Model building pipeline

The project provisions and runs a default model building pipeline automatically as soon as it has been created. This pipeline is a sample placeholder in the project for your own custom pipeline. Ignore the default pipeline for the moment.
The project templates deploys the following resources in your AWS account:

![](img/mlops-model-build-train.png)

The main components are:
1. The project template is made available through SageMaker Projects and AWS Service Catalog portfolio
2. A CodePipeline pipeline with two stages - `Source` to download the source code from your GitHub repository and `Build` to create and execute a SageMaker pipeline
3. A default SageMaker pipeline with model build, train, and register workflow
4. A seed code repository in GitHub with a provided default version of a seed code

This project contains all the required code and the insfrastructure to implement an automated CI/CD pipeline from a pre-defined template. 
To start using the project with your pipeline, you need to complete the following steps:
1. Clone the project GitHub repository to your notebook local storage
2. Replace the ML pipeline template sample code with your actual pipeline construction code, as implemented in the step 3 notebook
3. Modify the `codebuild-buildspec.yml` file to reference the correct Python module name and to set project parameters

Next sections guide you through these steps. For detailed instructions and a hands-on example, refer to the development guide [SageMaker MLOps Project Walkthrough](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough-3rdgit.html).

If you used the option 1 `boto3` to create an MLOps project, the `project_name` and `project_id` are set automatically. You can run the following code cell to print the values. If you followed the UI instructions to create a project, you must set the `project_name` manually.

In [122]:
try:
    print(project_name)
    print(project_id)
    print(model_build_code_repository_full_name)
    print(code_connection_arn)
except NameError:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("Set the code_connection_arn, project_name, and repository full name in the following code cell")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

mlops-02-14-11-03-33
p-ca7phmcraepa
yevgeniyilyin/sagemaker-ai-model-build-2
arn:aws:codeconnections:us-east-1:906545278380:connection/f76f091f-f02a-4390-8abd-184d1735ca3a


In [187]:
# project_name = "<SET TO THE NAME OF THE CREATED PROJECT>" # Keep commented out if you used option 1 to create a project
# model_build_code_repository_full_name = "<SET TO THE FULL NAME OF MODEL BUILD REPO>" # Keep commented out if you used option 1 to create a project

r = sm.describe_project(ProjectName=project_name)
project_id = r['ProjectId']
project_arn = r['ProjectArn']
repository_name = model_build_code_repository_full_name.split('/')[1]
git_folder = project_name
project_folder = f'sagemaker-{project_id}-modelbuild'
project_path = f'{git_folder}/{project_folder}'

%store project_name
%store project_arn
%store project_id
    
print(f"Project path: {project_path}")

Stored 'project_name' (str)
Stored 'project_arn' (str)
Stored 'project_id' (str)
Project path: mlops-02-14-11-03-33/sagemaker-p-ca7phmcraepa-modelbuild


### Explore the project in the Studio UI

Click on the link constructed by the following code cell. Explore the project and project components in the Studio UI.

In [105]:
# Show the project link
display(
    HTML('<b>See <a target="top" href="https://studio-{}.studio.{}.sagemaker.aws/projects/{}/">the project</a> in the Studio UI</b>'.format(
            domain_id, region, project_name))
)

### 1. Clone the project seed code to the JupyterLab file system
You need to clone the project code from the GitHub repository by using JupyterLab terminal and GitHub CLI.

1. Open a new terminal window via **File** > **New** > **Terminal**
2. Copy the output of the following sell and paste it into the terminal command line
3. Clone your model build repository using `git clone` with `codeconnections` credential helper.

In [158]:
def copy_output(text):
    return HTML(f'''
        <div style="position: relative;">
            <pre>{text}</pre>
            <button onclick="navigator.clipboard.writeText(`{text}`)"
                    style="position: absolute; top: 5px; right: 5px;">
                Copy
            </button>
        </div>
    ''')
    
cmd = f'''
git config --global credential.UseHttpPath true
git config --global credential.helper 'cache --timeout=720000'
git config --global credential.helper '!aws codecommit credential-helper $@'

mkdir -p $HOME/{project_path}

git clone https://codeconnections.{region}.amazonaws.com/git-http/{code_connection_arn.split(':')[4]}/{region}/{code_connection_arn.split(':')[-1].split('/')[-1]}/{model_build_code_repository_full_name}.git  $HOME/{project_path}
'''

copy_output(cmd)

### 2. Replace pipeline construction code

The following steps are required to customize the project which contains the seed code. The next code cell executes all the required steps, you don't need to do anything manually. The following text is for your information only.

- The seed source code is in the folder `<project_name>/sagemaker-<project-id>-modelbuild`.
- The original file `codebuild-buildspec.yml` is renamed to `codebuild-buildspec-original.yml`.
- MLOps project's code repository folder containing the pipeline code is renamed from `abalone` folder to `fromideatoprod`.
- The original file with the template pipeline `pipeline.py` is renamed to `pipeline-original.py`.
- Copy the `pipeline_steps` Python modules to the `pipelines` folder in the MLOps project's code repository folder.
- Copy the `requirements.txt` created in the notebook 3 to the `pipelines` folder in the MLOps project's code repository folder.
- Copy SageMaker Python SDK default configuration file `config.yaml` from the notebook 3 to the `pipelines` folder in the MLOps project's code repository folder.

In [161]:
# see the workshop folder name
!pwd

/home/sagemaker-user/amazon-sagemaker-from-idea-to-production


In [162]:
# if you local path for the workshop folder is different, set the correct absolute path to the variable workshop_folder
workshop_folder = "amazon-sagemaker-from-idea-to-production"

In [194]:
!mkdir -p ~/{workshop_folder}/pipelines
!mv ~/{project_path}/codebuild-buildspec.yml ~/{project_path}/codebuild-buildspec-original.yml
!mv ~/{project_path}/setup.py ~/{project_path}/setup-original.py
!mv ~/{project_path}/pipelines/abalone ~/{project_path}/pipelines/fromideatoprod
!mv ~/{project_path}/pipelines/fromideatoprod/pipeline.py ~/{project_path}/pipelines/fromideatoprod/pipeline-original.py
!cp ~/{workshop_folder}/pipeline_steps/* ~/{project_path}/pipelines/
!cp ~/{workshop_folder}/pipeline_steps/* ~/{workshop_folder}/pipelines/
!cp ~/{workshop_folder}/requirements.txt ~/{project_path}
!cp ~/{workshop_folder}/config.yaml ~/{project_path}

mv: cannot stat '/home/sagemaker-user/mlops-02-14-11-03-33/sagemaker-p-ca7phmcraepa-modelbuild/pipelines/abalone': No such file or directory
cp: -r not specified; omitting directory '/home/sagemaker-user/amazon-sagemaker-from-idea-to-production/pipeline_steps/__pycache__'
cp: -r not specified; omitting directory '/home/sagemaker-user/amazon-sagemaker-from-idea-to-production/pipeline_steps/__pycache__'


Execute the following cell to write pipeline construction code to the file `pipeline.py`. Re-use the code from the step 3 notebook as the function `get_pipeline()`.

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <p style=" text-align: center; margin: auto;">The pipeline construction code works with both S3 raw input dataset or feature set from the Feature Store, you need to pass a corresponding input parameter to the pipeline.
    </p>
</div>

In [195]:
%%writefile pipeline.py

import pandas as pd
import json
import boto3
import pathlib
import io
import os
import sagemaker
import mlflow
from time import gmtime, strftime, sleep
from sagemaker.deserializers import CSVDeserializer
from sagemaker.serializers import CSVSerializer

from sagemaker.workflow.execution_variables import ExecutionVariables
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import (
    ProcessingInput, 
    ProcessingOutput, 
    ScriptProcessor
)
from sagemaker.inputs import TrainingInput

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import (
    ProcessingStep, 
    TrainingStep, 
    CreateModelStep,
    CacheConfig
)
from sagemaker.workflow.check_job_config import CheckJobConfig
from sagemaker.workflow.parameters import (
    ParameterInteger, 
    ParameterFloat, 
    ParameterString, 
    ParameterBoolean
)
from sagemaker.workflow.quality_check_step import (
    DataQualityCheckConfig,
    ModelQualityCheckConfig,
    QualityCheckStep,
)
from sagemaker.workflow.clarify_check_step import (
    ModelBiasCheckConfig, 
    ClarifyCheckStep, 
    ModelExplainabilityCheckConfig
)
from sagemaker import Model
from sagemaker.inputs import CreateModelInput
from sagemaker.workflow.model_step import ModelStep
from sagemaker.workflow.fail_step import FailStep
from sagemaker.workflow.conditions import (
    ConditionGreaterThan,
    ConditionGreaterThanOrEqualTo
)
from sagemaker.workflow.parallelism_config import ParallelismConfiguration
from sagemaker.workflow.properties import PropertyFile
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import (
    Join,
    JsonGet
)
from sagemaker.workflow.lambda_step import (
    LambdaStep,
    LambdaOutput,
    LambdaOutputTypeEnum,
)
from sagemaker.lambda_helper import Lambda

from sagemaker.model_metrics import (
    MetricsSource, 
    ModelMetrics, 
    FileSource
)
from sagemaker.drift_check_baselines import DriftCheckBaselines
from sagemaker.workflow.pipeline_definition_config import PipelineDefinitionConfig 
from sagemaker.image_uris import retrieve
from sagemaker.workflow.function_step import step
from sagemaker.workflow.step_outputs import get_step
from sagemaker.model_monitor import DatasetFormat, model_monitoring

from pipelines.preprocess import preprocess
from pipelines.evaluate import evaluate
from pipelines.register import register
from pipelines.extract import prepare_datasets

def get_sagemaker_client(region):
     return boto3.Session(region_name=region).client("sagemaker")

def get_pipeline_session(region, bucket_name):
    """Gets the pipeline session based on the region.

    Args:
        region: the aws region to start the session
        bucket_name: the bucket to use for storing the artifacts

    Returns:
        PipelineSession instance
    """

    boto_session = boto3.Session(region_name=region)
    sagemaker_client = boto_session.client("sagemaker")

    return PipelineSession(
        boto_session=boto_session,
        sagemaker_client=sagemaker_client,
        default_bucket=bucket_name,
    )

def get_pipeline_custom_tags(new_tags, region, sagemaker_project_name=None):
    try:
        print(f"Getting project tags for {sagemaker_project_name}")
        
        sm_client = get_sagemaker_client(region)
        
        project_arn = sm_client.describe_project(ProjectName=sagemaker_project_name)['ProjectArn']
        project_tags = sm_client.list_tags(ResourceArn=project_arn)['Tags']

        print(f"Project tags: {project_tags}")
        
        for project_tag in project_tags:
            new_tags.append(project_tag)
            
    except Exception as e:
        print(f"Error getting project tags: {e}")
        
    return new_tags
    
def get_pipeline(
    region,
    sagemaker_project_id=None,
    sagemaker_project_name=None,
    role=None,
    bucket_name=None,
    bucket_prefix="from-idea-to-prod/xgboost",
    input_s3_url=None,
    feature_group_name=None,
    model_package_group_name="from-idea-to-prod-model-group",
    pipeline_name_prefix="from-idea-to-prod-pipeline",
    process_instance_type="ml.m5.large",
    train_instance_type="ml.m5.xlarge",
    test_score_threshold=0.70,
    tracking_server_arn=None,
):
    """Gets a SageMaker ML Pipeline instance.
    
    Returns:
        an instance of a pipeline
    """
    if feature_group_name is None and input_s3_url is None:
        print("One of feature_group_name or input_s3_url must be provided. Exiting...")
        return None

    session = get_pipeline_session(region, bucket_name)
    sm = session.sagemaker_client
    
    if role is None:
        role = sagemaker.session.get_execution_role(session)

    print(f"sagemaker version: {sagemaker.__version__}")
    print(f"Execution role: {role}")
    print(f"Input S3 URL: {input_s3_url}")
    print(f"Feature group: {feature_group_name}")
    print(f"Model package group: {model_package_group_name}")
    print(f"Pipeline name prefix: {pipeline_name_prefix}")
    print(f"Tracking server ARN: {tracking_server_arn}")
    
    pipeline_name = f"{pipeline_name_prefix}-{sagemaker_project_id}"
    experiment_name = pipeline_name

    output_s3_prefix = f"s3://{bucket_name}/{bucket_prefix}"
    # Set the output S3 url for model artifact
    output_s3_url = f"{output_s3_prefix}/output"
    # Set the output S3 url for feature store query results
    output_query_location = f'{output_s3_prefix}/offline-store/query_results'
    
    # Set the output S3 urls for processed data
    train_s3_url = f"{output_s3_prefix}/train"
    validation_s3_url = f"{output_s3_prefix}/validation"
    test_s3_url = f"{output_s3_prefix}/test"
    evaluation_s3_url = f"{output_s3_prefix}/evaluation"
    
    baseline_s3_url = f"{output_s3_prefix}/baseline"
    prediction_baseline_s3_url = f"{output_s3_prefix}/prediction_baseline"
    
    xgboost_image_uri = sagemaker.image_uris.retrieve(
            "xgboost", 
            region=region,
            version="1.5-1"
    )

    # If no tracking server ARN, try to find an active MLflow server
    if tracking_server_arn is None:
        r = sm.list_mlflow_tracking_servers(
            TrackingServerStatus='Created',
        )['TrackingServerSummaries']
    
        if len(r) < 1:
            print("You don't have any running MLflow servers. Exiting...")
            return None
        else:
            tracking_server_arn = r[0]['TrackingServerArn']
            print(f"Use the tracking server ARN:{tracking_server_arn}")
        
    # Parameters for pipeline execution
    
    # Set processing instance type
    process_instance_type_param = ParameterString(
        name="ProcessingInstanceType",
        default_value=process_instance_type,
    )

    # Set training instance type
    train_instance_type_param = ParameterString(
        name="TrainingInstanceType",
        default_value=train_instance_type,
    )

    # Set model approval param
    model_approval_status_param = ParameterString(
        name="ModelApprovalStatus",
        default_value="PendingManualApproval"
    )

    # Minimal threshold for model performance on the test dataset
    test_score_threshold_param = ParameterFloat(
        name="TestScoreThreshold", 
        default_value=test_score_threshold
    )

    # S3 url for the input dataset
    input_s3_url_param = ParameterString(
        name="InputDataUrl",
        default_value=input_s3_url if input_s3_url else "None",
    )

    # Feature group name for the input featureset
    feature_group_name_param = ParameterString(
        name="FeatureGroupName",
        default_value=feature_group_name if feature_group_name else "None",
    )
    
    # Model package group name
    model_package_group_name_param = ParameterString(
        name="ModelPackageGroupName",
        default_value=model_package_group_name,
    )

    # MLflow tracking server ARN
    tracking_server_arn_param = ParameterString(
        name="TrackingServerARN",
        default_value=tracking_server_arn,
    )
    
    # Define step cache config
    cache_config = CacheConfig(
        enable_caching=True,
        expire_after="P30d" # 30-day
    )

    # Construct the pipeline
    
    # Get datasets
    step_get_datasets = step(
            preprocess, 
            role=role,
            instance_type=process_instance_type_param,
            name=f"preprocess",
            keep_alive_period_in_seconds=3600,
    )(
        input_data_s3_path=input_s3_url_param,
        output_s3_prefix=output_s3_prefix,
        tracking_server_arn=tracking_server_arn_param,
        experiment_name=experiment_name,
        pipeline_run_name=ExecutionVariables.PIPELINE_EXECUTION_ID,
    ) if input_s3_url else step(
        prepare_datasets, 
        role=role,
        instance_type=process_instance_type_param,
        name=f"extract-featureset",
        keep_alive_period_in_seconds=3600,
    )(
        feature_group_name=feature_group_name_param,
        output_s3_prefix=output_s3_prefix,
        query_output_s3_path=output_query_location,
        tracking_server_arn=tracking_server_arn_param,
        experiment_name=experiment_name,
        pipeline_run_name=ExecutionVariables.PIPELINE_EXECUTION_ID,
    )
    
    # Instantiate an XGBoost estimator object
    estimator = sagemaker.estimator.Estimator(
        image_uri=xgboost_image_uri,
        role=role, 
        instance_type=train_instance_type_param,
        instance_count=1,
        output_path=output_s3_url,
        sagemaker_session=session,
        base_job_name=f"{pipeline_name}-train"
    )
    
    # Define algorithm hyperparameters
    estimator.set_hyperparameters(
        num_round=100, # the number of rounds to run the training
        max_depth=3, # maximum depth of a tree
        eta=0.5, # step size shrinkage used in updates to prevent overfitting
        alpha=2.5, # L1 regularization term on weights
        objective="binary:logistic",
        eval_metric="auc", # evaluation metrics for validation data
        subsample=0.8, # subsample ratio of the training instance
        colsample_bytree=0.8, # subsample ratio of columns when constructing each tree
        min_child_weight=3, # minimum sum of instance weight (hessian) needed in a child
        early_stopping_rounds=10, # the model trains until the validation score stops improving
        verbosity=1, # verbosity of printing messages
    )
    
    # train step
    step_train = TrainingStep(
        name=f"train",
        step_args=estimator.fit(
            {
                "train": TrainingInput(
                    step_get_datasets['train_data'],
                    content_type="text/csv",
                ),
                "validation": TrainingInput(
                    step_get_datasets['validation_data'],
                    content_type="text/csv",
                ),
            }
        ),
        cache_config=cache_config,
    )   
    
    # Evaluation step
    step_evaluate = step(
        evaluate,
        role=role,
        instance_type=process_instance_type_param,
        name=f"evaluate",
        keep_alive_period_in_seconds=3600,
    )(
        test_x_data_s3_path=step_get_datasets['test_x_data'],
        test_y_data_s3_path=step_get_datasets['test_y_data'],
        model_s3_path=step_train.properties.ModelArtifacts.S3ModelArtifacts,
        output_s3_prefix=output_s3_prefix,
        tracking_server_arn=tracking_server_arn_param,
        experiment_name=step_get_datasets['experiment_name'],
        pipeline_run_id=step_get_datasets['pipeline_run_id'],
    )

    # register model step
    step_register = step(
        register,
        role=role,
        instance_type=process_instance_type_param,
        name=f"register",
        keep_alive_period_in_seconds=3600,
    )(
        training_job_name=step_train.properties.TrainingJobName,
        model_package_group_name=model_package_group_name_param,
        model_approval_status=model_approval_status_param,
        evaluation_result=step_evaluate['evaluation_result'],
        output_s3_prefix=output_s3_url,
        tracking_server_arn=tracking_server_arn_param,
        experiment_name=step_get_datasets['experiment_name'],
        pipeline_run_id=step_get_datasets['pipeline_run_id'],
    )

    # fail the pipeline execution step
    step_fail = FailStep(
        name=f"fail",
        error_message=Join(on=" ", values=["Execution failed due to AUC Score < ", test_score_threshold_param]),
    )
    
    # condition to check in the condition step
    condition_gte = ConditionGreaterThanOrEqualTo(
            left=step_evaluate['evaluation_result']['classification_metrics']['auc_score']['value'],  
            right=test_score_threshold_param,
    )
    
    # conditional register step
    step_conditional_register = ConditionStep(
        name=f"check-metrics",
        conditions=[condition_gte],
        if_steps=[step_register],
        else_steps=[step_fail],
    )   

    # Create a pipeline object
    pipeline = Pipeline(
        name=f"{pipeline_name}",
        parameters=[
            input_s3_url_param,
            feature_group_name_param,
            process_instance_type_param,
            train_instance_type_param,
            model_approval_status_param,
            test_score_threshold_param,
            model_package_group_name_param,
            tracking_server_arn_param,
        ],
        steps=[step_conditional_register],
        pipeline_definition_config=PipelineDefinitionConfig(use_custom_job_prefix=True)
    )
    
    return pipeline

Overwriting pipeline.py


Copy this `pipeline.py` file from the workshop folder to the `pipelines/fromideatoprod` folder in the project's code repository folder:

In [196]:
!cp ~/{workshop_folder}/pipeline.py ~/{project_path}/pipelines/fromideatoprod/

Test the function `get_pipeline` locally to see if everything works before running as remotely.

In [197]:
from pipeline import get_pipeline

In [None]:
# If you created a feature store in the notebook 3, you can set the feature_group_name parameter instead of input_s3_url to take the data from the feature store
p = get_pipeline(
    region=region,
    sagemaker_project_id=project_id,
    sagemaker_project_name=project_name,
    role=sm_role,
    bucket_name=bucket_name,
    bucket_prefix=bucket_prefix,
    input_s3_url=input_s3_url,
    # feature_group_name=dataset_feature_group_name,
    model_package_group_name=model_package_group_name,
    pipeline_name_prefix=pipeline_name,
    process_instance_type="ml.m5.large",
    train_instance_type="ml.m5.xlarge",
    test_score_threshold=0.70,
    tracking_server_arn=mlflow_arn,
)

In [None]:
p.definition()

In [None]:
p.upsert(role_arn=sm_role)

To see the created pipeline in the Studio UI, click on the link constructed by the code cell below:

In [201]:
from IPython.display import HTML

# Show the pipeline link
display(
    HTML('<b>See <a target="top" href="https://studio-{}.studio.{}.sagemaker.aws/pipelines/{}/graph">the pipeline</a> in the Studio UI</b>'.format(
            domain_id, region, p.describe()['PipelineName']))
)

At this point you have tested locally that the pipeline construction code works and it creates a pipeline. You can see this pipeline in Studio **Pipelines** widget. Now you ready to create a CI/CD pipeline.

#### Control project ownership with resource tags
Project-owned resources are automatically tagged with `sagemaker:project-name` and `sagemaker:project-id` tags for cost control, attribute-based security control, and governance. 

If you need to attach an existing repository, pipeline, model registry group, or endpoint to the project, you can add these two tags with the project name and id to the resource.

For example, the next code cell adds `sagemaker:*` tags to the model package group you created in the previous notebook. As the result, the model package group is now visible as a project resource.

In [202]:
model_package_group_arn = sm.describe_model_package_group(ModelPackageGroupName=model_package_group_name).get("ModelPackageGroupArn")

if model_package_group_arn:
    print(f"Adding tags {project_arn.split('/')[-1]} and {project_id} for model package group {model_package_group_arn}")
    r = sm.add_tags(
        ResourceArn=model_package_group_arn,
        Tags=[
            {
                'Key': 'sagemaker:project-name',
                'Value': project_arn.split("/")[-1]
            },
            {
                'Key': 'sagemaker:project-id',
                'Value': project_id
            },
        ]
    )
    print(r)
else:
    print(f"The model package group {model_package_group_name} doesn't exist")
    
sm.list_tags(ResourceArn=model_package_group_arn)["Tags"]

Adding tags mlops-02-14-11-03-33 and p-ca7phmcraepa for model package group arn:aws:sagemaker:us-east-1:906545278380:model-package-group/from-idea-to-prod-pipeline-model-12-06-30-22
{'Tags': [{'Key': 'sagemaker:project-name', 'Value': 'mlops-02-14-11-03-33'}, {'Key': 'sagemaker:project-id', 'Value': 'p-ca7phmcraepa'}], 'ResponseMetadata': {'RequestId': 'be9be1bf-da7a-4cdf-944f-0ae71254b32a', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'be9be1bf-da7a-4cdf-944f-0ae71254b32a', 'content-type': 'application/x-amz-json-1.1', 'content-length': '130', 'date': 'Sun, 16 Feb 2025 21:18:54 GMT'}, 'RetryAttempts': 0}}


[{'Key': 'sagemaker:project-name', 'Value': 'mlops-02-14-11-03-33'},
 {'Key': 'sagemaker:project-id', 'Value': 'p-ca7phmcraepa'}]

### 3. Modify the build specification file
Now modify the `codebuild-buildspec.yml` file in the project folder to reflect the new name of the Python module with your pipeline and set other project-specific parameters.

You need to pass the following parameters to a pipeline creation script - corresponding to the parameters of the `get_pipeline` function you've just created:
- `input_s3_url` - an S3 URL for the input raw dataset. If you created a feature group in the notebook 3, you can use the parameter `feature_group_name` instead
- `feature_group_name` – you can use this parameter if you created this feature group in the notebook 3, in this case you don't need to provide `input_s3_url`
- `model_package_group_name` – the model registry package to register a model after training
- `pipeline_name_prefix` – a name prefix for the pipeline. The pipeline name is constructed as `<pipeline_name_prefix>-<project-id>`
- `role` – the pipeline execution role
- `tracking_server_arn`- the MLflow server ARN for pipeline execution tracking

The following cells print the values of these parameters:

In [203]:
try:
    print(f"""
        INPUT-S3-URL: {input_s3_url}
        FEATURE-GROUP-NAME: {dataset_feature_group_name}
        MODEL-PACKAGE-GROUP-NAME: {project_name}-{project_id}
        PIPELINE-NAME-PREFIX: {pipeline_name}
        ROLE: {sm_role}
        TRACKING-SERVER-ARN: {mlflow_arn}
        """)
except NameError:
    print(f"""
        Dataset feature group name is not defined, use input_s3_url instead:
        ********************************************************************
        
        INPUT-S3-URL: {input_s3_url}
        MODEL-PACKAGE-GROUP-NAME: {project_name}-{project_id}
        PIPELINE-NAME-PREFIX: {pipeline_name}
        ROLE: {sm_role}
        TRACKING-SERVER-ARN: {mlflow_arn}
        """)


        INPUT-S3-URL: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/input/bank-additional-full.csv
        FEATURE-GROUP-NAME: from-idea-to-prod-12-06-30-53
        MODEL-PACKAGE-GROUP-NAME: mlops-02-14-11-03-33-p-ca7phmcraepa
        PIPELINE-NAME-PREFIX: from-idea-to-prod-pipeline-12-06-30-22
        ROLE: arn:aws:iam::906545278380:role/mlops-workshop-domain-SageMakerExecutionRole-rIgas55nwmQD
        TRACKING-SERVER-ARN: arn:aws:sagemaker:us-east-1:906545278380:mlflow-tracking-server/test-mlflow
        


Now replace the value of these parameters in the following code cell with the printed values from the cell above. 

To do it, locate the parameter `kwargs` in the following code cell starting with `%%writefile codebuild-buildspec.yml`:

```
--kwargs "{ \
    \"input_s3_url\":\"<INPUT-S3-URL>\", \
    \"feature_group_name\":\"<FEATURE-GROUP-NAME>\", \
    \"model_package_group_name\":\"<MODEL-PACKAGE-GROUP-NAME>\", \
    \"pipeline_name_prefix\":\"<PIPELINE-BASE-NAME>\", \
    \"role\":\"<SAGEMAKER-EXECUTION-ROLE-ARN>\", \
    \"tracking_server_arn\":\"<TRACKING-SERVER-ARN>\", \
    \"region\":\"${AWS_REGION}\", \
    \"sagemaker_project_name\":\"${SAGEMAKER_PROJECT_NAME}\", \
    \"sagemaker_project_id\":\"${SAGEMAKER_PROJECT_ID}\", \
    \"bucket_name\":\"${ARTIFACT_BUCKET}\" \
        }"
```

and replace the values of `input_s3_url` OR `feature_group_name`, `model_package_group_name`, `pipeline_name_prefix`, `role`, and `tracking_server_arn` parameters with the values printed by the previous cells.

You need to replace only one of `input_s3_url` or `feature_group_name` depending on what dataset input method you'd like to use - a raw input dataset from S3 or the processed featureset from the feature store. You can use the feature store only if you created it in the previous notebook. 

<div class="alert alert-info">Delete the line with parameter that you don't use: <code>input_s3_url</code> or <code>feature_group_name</code> from the cell code.</div>

![](img/codebuild-buildspec-edit.png)

After you replace the values of the parameter, execute the cell to write a build spec file.

In [204]:
%%writefile codebuild-buildspec.yml

version: 0.2

phases:
  install:
    runtime-versions:
      python: 3.10
    commands:
      - pip install --upgrade --force-reinstall . "awscli>1.20.30"
      - pip install --upgrade mlflow sagemaker-mlflow s3fs xgboost
    
  build:
    commands:
      - export SAGEMAKER_USER_CONFIG_OVERRIDE="./config.yaml"
      - export PYTHONUNBUFFERED=TRUE
      - export SAGEMAKER_PROJECT_NAME_ID="${SAGEMAKER_PROJECT_NAME}-${SAGEMAKER_PROJECT_ID}"
      - |
        run-pipeline --module-name pipelines.fromideatoprod.pipeline \
          --role-arn $SAGEMAKER_PIPELINE_ROLE_ARN \
          --tags "[{\"Key\":\"sagemaker:project-name\",\"Value\":\"${SAGEMAKER_PROJECT_NAME}\"}, {\"Key\":\"sagemaker:project-id\", \"Value\":\"${SAGEMAKER_PROJECT_ID}\"}]" \
          --kwargs "{ \
                \"input_s3_url\":\"s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/input/bank-additional-full.csv\", \
                \"model_package_group_name\":\"mlops-02-14-11-03-33-p-ca7phmcraepa\",\
                \"pipeline_name_prefix\":\"from-idea-to-prod-pipeline-12-06-30-22\",\
                \"role\":\"arn:aws:iam::906545278380:role/mlops-workshop-domain-SageMakerExecutionRole-rIgas55nwmQD\",\
                \"tracking_server_arn\":\"arn:aws:sagemaker:us-east-1:906545278380:mlflow-tracking-server/test-mlflow\", \
                \"region\":\"${AWS_REGION}\", \
                \"sagemaker_project_name\":\"${SAGEMAKER_PROJECT_NAME}\",\
                \"sagemaker_project_id\":\"${SAGEMAKER_PROJECT_ID}\",\
                \"bucket_name\":\"${ARTIFACT_BUCKET}\"\
                    }"
      - echo "Create/update of the SageMaker Pipeline and a pipeline execution completed."

Overwriting codebuild-buildspec.yml


Copy the `codebuild-buildspec.yml` file from the workshop folder to the project's code repository folder:

In [205]:
!cp ~/{workshop_folder}/codebuild-buildspec.yml ~/{project_path}/codebuild-buildspec.yml

To summarize, you have just done three changes in the build spec file:
1. Modified the `run-pipeline` `--module-name` parameter value from `pipelines.abalone.pipeline` to the new path `pipelines.fromideatoprod.pipeline`
2. Removed some parameters from the `kwargs` list to make use of `get_pipeline()` function default parameter values
3. Added additional parameters for the pipeline to the `kwargs` parameter list

### 4. Create the `setup.py` file
Finally, you need to provide the `setup.py` file in the project's code repository folder.

In [206]:
%%writefile setup.py
import os
import setuptools


about = {}
here = os.path.abspath(os.path.dirname(__file__))
with open(os.path.join(here, "pipelines", "__version__.py")) as f:
    exec(f.read(), about)


with open("README.md", "r") as f:
    readme = f.read()


required_packages = ["sagemaker"]
extras = {
    "test": [
        "black",
        "coverage",
        "flake8",
        "mock",
        "pydocstyle",
        "pytest",
        "pytest-cov",
        "sagemaker",
        "tox",
    ]
}
setuptools.setup(
    name=about["__title__"],
    description=about["__description__"],
    version=about["__version__"],
    author=about["__author__"],
    author_email=["__author_email__"],
    long_description=readme,
    long_description_content_type="text/markdown",
    url=about["__url__"],
    license=about["__license__"],
    packages=setuptools.find_packages(),
    include_package_data=True,
    python_requires=">=3.6",
    install_requires=required_packages,
    extras_require=extras,
    entry_points={
        "console_scripts": [
            "get-pipeline-definition=pipelines.get_pipeline_definition:main",
            "run-pipeline=pipelines.run_pipeline:main",
        ]
    },
    classifiers=[
        "Development Status :: 3 - Alpha",
        "Intended Audience :: Developers",
        "Natural Language :: English",
        "Programming Language :: Python",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3.6",
        "Programming Language :: Python :: 3.7",
        "Programming Language :: Python :: 3.8",
    ],
)

Overwriting setup.py


In [207]:
!cp ~/{workshop_folder}/setup.py ~/{project_path}/setup.py

---

## Run the CI/CD for the model building pipeline
To launch the CI/CD for the model building pipeline you need to push the changed code into the project GitHub repository.

<div class="alert alert-info">Make sure you are in the folder that contains the repository code in JupyterLab terminal when running git commands. The folder name looks like <code>[project-name]/sagemaker-[project-id]-modelbuild</code>.</div>

Open a system terminal window via the JupyterLab menu **File** > **New** > **Terminal** and enter the commands generated by the following code cell.
Keep `user.email` and `user.name` or replace with your data.

In [188]:
cmd = f'''
cd ~/{project_path}

git config --global user.email "you@example.com"
git config --global user.name "Your Name"
  
git add -A
git commit -am "customize project"
git push
'''

copy_output(cmd)

After pushing your code changes, the project initiates a run of the CodePipeline pipeline that constructs, upcerts, and executes the SageMaker model building pipeline. This new pipeline execution creates a new model version in the model package group in the SageMaker model registry.

You can follow up the execution of the pipeline in the Studio **Pipelines** widget.

Wait until the pipeline execution finishes. The execution takes about 10 minutes to complete.

To see the execution of the pipeline click on the link constructed by the code cell below. Note, that CodeBuild takes about 1 minute to build upsert the pipeline and to start the execution. Refresh the Studio UI page to see the started execution.

In [190]:
# Show the pipeline execution link
display(
    HTML('<b>See <a target="top" href="https://studio-{}.studio.{}.sagemaker.aws/pipelines/{}/executions/">the pipeline executions</a> in the Studio UI</b>'.format(
            domain_id, region, p.describe()['PipelineName']))
)

## View the model package in the model registry

This new pipeline creates a new model package group named `<project_name>-<project-id` in the SageMaker model registry.

To see the model package version in the Studio UI click on the link constructed by the code cell below. 
<div class="alert alert-info">You need to wait until the pipeline execution finishes to see a registered version of the model in the model package.</div>

On the model version tab that opens, you can browse activity, [model version details](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-details.html), and [data lineage](https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html). 

In a real-world project you add various model attributes and additional model version metadata such as [model quality metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html), [explainability](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-explainability.html) and [bias](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-training-bias.html) reports, load test data, and [inference recommender](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html).

In [191]:
# Show the model package link
display(
    HTML('<b>See <a target="top" href="https://studio-{}.studio.{}.sagemaker.aws/models/registered-models/{}-{}/versions">the model package versions</a> in the Studio UI</b>'.format(
            domain_id, region, project_name, project_id))
)

## Summary
In this notebook you implement a CI/CD pipeline with the following features:
- A model building ML pipeline is under the source control in a GitHub repository
- Every push into the code repository launches a new CodePipeline pipiline which constructs, upserts, and executes the ML pipeline
- The whole end to end model development process is automated now
- SageMaker project is a logical construct in Studio which has the metadata about related ML pipelines, repositories, models, experiments, and inference endpoints

---

## Continue with the step 5
In the next notebook [05-deploy](05-deploy.ipynb) you test the second part of the MLOps project - the model deployment pipeline.

## Further development ideas for your real-world projects
- You can use a SageMaker-provided [MLOps template for model building, training, and deployment with third-party Git repositories using Jenkins](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates-sm.html#sagemaker-projects-templates-git-jenkins)
- Create a [custom SageMaker project template](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates-custom.html) to cover your specific project requirements

## Additional resources
- [Amazon SageMaker Pipelines lab in SageMaker Immersion Day](https://catalog.us-east-1.prod.workshops.aws/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US/lab6)
- [Enhance your machine learning development by using a modular architecture with Amazon SageMaker projects](https://aws.amazon.com/blogs/machine-learning/enhance-your-machine-learning-development-by-using-a-modular-architecture-with-amazon-sagemaker-projects/)
- [Dive deep into automating MLOps](https://www.youtube.com/watch?v=3_cHnk9VSfQ)
- [SageMaker MLOps Project Walkthrough](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html)
- [`aws-samples` GitHub repository with custom project templates examples](https://github.com/aws-samples/sagemaker-custom-project-templates)

# Shutdown kernel

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>