# Automate LLM training and deployment pipeline using SageMaker Pipelines

Amazon SageMaker Pipelines offers machine learning (ML) application developers and operations engineers the ability to orchestrate SageMaker jobs and author reproducible ML pipelines. It also enables them to deploy custom-built models for inference in real-time with low latency, run offline inferences with Batch Transform, and track lineage of artifacts. They can institute sound operational practices in deploying and monitoring production workflows, deploying model artifacts, and tracking artifact lineage through a simple interface, adhering to safety and best practice paradigms for ML application development.

In the previous lab, we saw how we could use SageMaker to simplify the data processing, finetuning an LLM and deploy the finetuned model and run inferences. In this lab, we'll use SageMaker Pipelines to help us automate the entire LLM training and deployment process using serverless and event driven architecture. 

Here's a high level architecture diagram for the LLMOps workflow:

<img src="images/mlops-llm.drawio.png" width="1500">

## Overview
This notebook shows how to:

* Define a set of Pipeline parameters that can be used to parametrize a SageMaker Pipeline.
* Define a Processing step that performs feature engineering of a Huggingface dataset and split the dataset into train and evaluation data sets.
* Define a Training step that finetunes a llama2-7b model on the preprocessed data set.
* Define a Create Model step that creates a model from the model artifacts used in training.
* Define a Register Model step that creates a model package from the estimator and model artifacts used to finetune the model.
* Define and create a Pipeline definition in a DAG, with the defined parameters and steps.
* Start a Pipeline execution and wait for execution to complete.

In [6]:
%pip install 'sagemaker' --upgrade -q

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [7]:
import sys

import boto3
import sagemaker
from sagemaker.workflow.pipeline_context import LocalPipelineSession, PipelineSession

pipeline_session = PipelineSession()
region = pipeline_session.boto_region_name
default_bucket = pipeline_session.default_bucket()
role = sagemaker.get_execution_role()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


In [8]:
from sagemaker.workflow.parameters import ParameterString, ParameterFloat, ParameterInteger, ParameterBoolean
from sagemaker.huggingface import HuggingFaceProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker import get_execution_role
from sagemaker.workflow.steps import CacheConfig
import uuid
from datetime import datetime

processing_instance_count = 1
training_instance_count = 1
transform_instance_count = 1
processing_instance_type = "ml.m5.xlarge"
rand_id = uuid.uuid4().hex[:5] # this is the random-id assigned for each run. 
training_dataset_s3_loc = f"s3://{default_bucket}/data/workshop-{rand_id}/train"
validation_dataset_s3_loc = f"s3://{default_bucket}/data/workshop-{rand_id}/eval"
model_output_s3_loc = f"s3://{default_bucket}/data/workshop-{rand_id}/model"
model_eval_s3_loc = f"s3://{default_bucket}/data/workshop-{rand_id}/modeleval"
model_id = "NousResearch/Llama-2-7b-chat-hf"
hf_dataset_name = "hotpot_qa"

In [9]:
cache_config = CacheConfig(enable_caching=True, expire_after="T12H")

# Define Parameters to parametize SageMaker Pipeline Executions
Define Pipeline parameters that you can use to parametrize the pipeline. Parameters enable custom pipeline executions and schedules without having to modify the Pipeline definition.

The supported parameter types include:

* ParameterString - represents a str Python type
* ParameterInteger - represents an int Python type
* ParameterFloat - represents a float Python type

These parameters support providing a default value, which can be overridden on pipeline execution. The default value specified should be an instance of the type of the parameter.

The parameters defined in this workflow include:

* processing_output_s3_location_param - processing job output S3 location. 
* model_id_param - huggingface model ID to indentify the base model
* epochs_param - number of epochs to finetune the model
* per_device_train_batch_size_param - training dataset batch size 
* per_device_eval_batch_size_param - evaluation dataset batch size  
* learning_rate_param - QLoRA finetuning learning rate
* optimizer_param - the model optimizer
* logging_steps_param - number of steps for logging
* lora_r_param - LoRA attention dimension
* lora_alpha_param - Alpha parameter for LoRA scaling
* lora_dropout_param - LoRA dropout rate
* use_4bit_param - load base model in 4bit
* bnb_4bit_compute_dtype_param - default compute type for quantization
* bnb_4bit_quant_type_param - default quantization type
* training_job_instance_type_param - training job instance type
* model_output_s3_loc_param - model artifact output S3 location
* training_dataset_s3_loc_param - S3 location for training dataset
* eval_dataset_s3_loc_param - S3 location for evaluation dataset
* training_dataset_split_param - training dataset split configuration
* eval_dataset_split_param - evaluation dataset split configuration
* base_model_group_name_param - base model package group name
* region_param - AWS region name
* model_eval_s3_loc_param - S3 location for model evalution metrics JSON file

In [10]:
%store -r

In [11]:
if "base_model_pkg_group_name" not in locals():
    base_model_pkg_group_name = "None"

In [13]:
processing_output_s3_location_param = ParameterString(name="ProcessingOutputS3Location", default_value=training_dataset_s3_loc)
model_id_param = ParameterString(name="ModelId", default_value=model_id)
epochs_param = ParameterInteger(name="Epochs", default_value=1)
per_device_train_batch_size_param = ParameterInteger(name="PerDeviceTrainBatchSize", default_value=8)
per_device_eval_batch_size_param = ParameterInteger(name="PerDeviceEvalBatchSize", default_value=8)
learning_rate_param = ParameterFloat(name="LearningRate", default_value=2e-4)
optimizer_param = ParameterString(name="Optimizer", default_value="paged_adamw_32bit")
logging_steps_param = ParameterInteger(name="LoggingSteps", default_value=25)
lora_r_param = ParameterInteger(name="LoraR", default_value=64)
lora_alpha_param = ParameterInteger(name="LoraAlpha", default_value=16)
lora_dropout_param = ParameterFloat(name="LoraDropout", default_value=0.1)
use_4bit_param = ParameterBoolean(name="Use4Bit", default_value=True)
bnb_4bit_compute_dtype_param = ParameterString(name="BnB4BitComputeType", default_value="float16")
bnb_4bit_quant_type_param = ParameterString(name="BnB4BitQuantType", default_value="nf4")
training_job_instance_type_param = ParameterString(name="TrainingJobInstanceType", default_value="ml.g5.2xlarge")
processing_job_instance_type_param = ParameterString(name="ProcessingJobInstanceType", default_value="ml.m5.xlarge")
model_output_s3_loc_param = ParameterString(name="ModelOutputS3LocParam", default_value=model_output_s3_loc)
training_dataset_s3_loc_param = ParameterString(name="TrainingDatasetS3LocParam", default_value=training_dataset_s3_loc)
eval_dataset_s3_loc_param = ParameterString(name="EvalDatasetS3LocParam", default_value=validation_dataset_s3_loc)
training_dataset_split_param = ParameterString(name="TrainingDatasetSplitParam", default_value="1:50")
eval_dataset_split_param = ParameterString(name="EvalDatasetSplitParam", default_value="51:100")
base_model_group_name_param = ParameterString(name="BaseModelRegistryGroupName", default_value=base_model_pkg_group_name)
region_param = ParameterString(name="RegionNameParam", default_value="us-east-1")
model_eval_s3_loc_param = ParameterString(name="ModelEvalS3LocParam", default_value=model_eval_s3_loc)
hf_dataset_name_param = ParameterString(name="HFDataSetNameParam", default_value=hf_dataset_name)
base_model_package_group_name_param = ParameterString(name="BaseModelPkgGoupName", default_value=base_model_pkg_group_name)

# Define a Processing Step
A processing step is used for triggering a processing job for data processing. 
In this example, we are going to use a processing job to perform feature engineering on a public dataset available on Huggingface Hub. The output from the processing step will be stored in the specified S3 location. 

In [14]:
from sagemaker.pytorch.processing import PyTorchProcessor
from sagemaker.workflow.steps import ProcessingStep

torch_processor = PyTorchProcessor(
    framework_version='2.0',
    role=role,
    instance_type=processing_job_instance_type_param,
    instance_count=1,
    base_job_name=f'frameworkprocessor-PT-{rand_id}',
    py_version="py310",
    sagemaker_session=pipeline_session
)

The input argument instance_type of function (sagemaker.image_uris.get_training_image_uri) is a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>), which is interpreted in pipeline execution time only. As the function needs to evaluate the argument value in SDK compile time, the default_value of this Parameter object will be used to override it. Please make sure the default_value is valid.


In [15]:
processor_args = torch_processor.run(
    code="preprocess.py",
    source_dir="src/preprocess",
    outputs=[
        ProcessingOutput(output_name="train_data",
                         source="/opt/ml/processing/train",
                         destination=training_dataset_s3_loc_param),
        ProcessingOutput(output_name="eval_data",
                         source="/opt/ml/processing/eval",
                         destination=eval_dataset_s3_loc_param),

    ],
    arguments=["--train-data-split", training_dataset_split_param,
               "--eval-data-split", eval_dataset_split_param,
               "--hf-dataset-name", hf_dataset_name_param]
)



Define a procesing step here

In [16]:
step_process = ProcessingStep(name="LLMDataPreprocess",
                              step_args=processor_args,
                              cache_config=cache_config)

# Training Step
In this section, use define a training step to finetune a Llama2-7b model on the given dataset. 
Configure an Estimator for the HuggingFace and the input dataset. 
A typical training script loads data from the input channels, configures training with 
hyperparameters, trains a model, and saves a model to model_dir so that it can be hosted later.

The model path where the models from training are saved is also specified.

**Note:** the instance_type parameter may be used in multiple places in the pipeline. In this case, the instance_type is passed into the estimator.

In [17]:
import time
from sagemaker.huggingface import HuggingFace

# define Training Job Name 
job_name = f'huggingface-qlora-{time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())}-{rand_id}'

# hyperparameters, which are passed into the training job
hyperparameters ={
  'model_id': model_id_param,                                # pre-trained model
  'epochs': epochs_param,                                     # number of training epochs
  'per_device_train_batch_size': per_device_train_batch_size_param,
  'per_device_eval_batch_size' : per_device_eval_batch_size_param,
  'learning_rate' : learning_rate_param,
  'optimizer' : optimizer_param,  
  'logging_steps' : logging_steps_param,
  'lora_r': lora_r_param,
  'lora_alpha' : lora_alpha_param,
  'lora_dropout' : lora_dropout_param, 
  'use_4bit' : use_4bit_param,
  'bnb_4bit_compute_dtype' : bnb_4bit_compute_dtype_param,
  'bnb_4bit_quant_type' : bnb_4bit_quant_type_param,
  'base_model_group_name' : base_model_group_name_param,
  'region' : region_param,
  'model_eval_s3_loc' : model_eval_s3_loc_param,
  'run_experiment' : "False"
}

# create the Estimator
huggingface_estimator = HuggingFace(
    entry_point='train.py',         # train script
    source_dir='src/train',         # directory which includes all the files needed for training
    instance_type=training_job_instance_type_param, # instances type used for the training job
    instance_count=1,               # the number of instances used for training
    base_job_name=job_name,         # the name of the training job
    role=role,      # Iam role used in training job to access AWS ressources, e.g. S3
    volume_size=300,    # the size of the EBS volume in GB
    transformers_version='4.28.1',    # the transformers version used in the training job
    pytorch_version='2.0.0',          # the pytorch_version version used in the training job
    py_version='py310',             # the python version used in the training job
    hyperparameters= hyperparameters,
    environment={ "HUGGINGFACE_HUB_CACHE": "/tmp/.cache" }, # set env variable to cache models in /tmp
    sagemaker_session=pipeline_session,         # specifies a sagemaker session object
    output_path=model_output_s3_loc_param # s3 location for model artifact
)

In [18]:
data = {'training': step_process.properties.ProcessingOutputConfig.Outputs["train_data"].S3Output.S3Uri,
        'validation': step_process.properties.ProcessingOutputConfig.Outputs["eval_data"].S3Output.S3Uri
       }

time_suffix = datetime.now().strftime('%y%m%d%H%M')
run_name = f"qlora-finetune-run-{time_suffix}-{rand_id}"
# starting the train job with our uploaded datasets as input
train_args = huggingface_estimator.fit(data, wait=True)

Define the Training step here

In [19]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

step_train = TrainingStep(
    name="Llama2Train",
    step_args=train_args,
)
step_train.add_depends_on([step_process])

# Register Model
SageMaker Model Registry supports the following features and functionality:

* Catalog models for production.
* Manage model versions. 
* Associate metadata, such as training metrics, with a model.
* Manage the approval status of a model.
* Deploy models to production.
* Automate model deployment with CI/CD.

In this workshop, we are going to register the finetuned LLama2 model as a model package using SageMaker Model Registry. 

A model package is an abstraction of reusable model artifacts that packages all ingredients required for inference. 
Primarily, it consists of an inference specification that defines the inference image to use along with an optional model weights location.

A model package group is a collection of model packages. A model package group can be created for a specific ML business problem, and new versions of the model packages can be added to it. Typically, customers are expected to create a ModelPackageGroup for a SageMaker pipeline so that model package versions can be added to the group for every SageMaker Pipeline run.

In [20]:
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri
import json
from sagemaker.workflow.model_step import ModelStep


# retrieve the llm image uri
llm_image = sagemaker.image_uris.retrieve(
    "djl-deepspeed", region=region, version="0.23.0"
)

# print ecr image uri
print(f"llm image uri: {llm_image}")

inference_instance_type = "ml.g5.2xlarge"
number_of_gpu = 1
health_check_timeout = 3600


# create HuggingFaceModel with the image uri
huggingface_model = HuggingFaceModel(
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    image_uri=llm_image,
    transformers_version="4.28.1",
    pytorch_version="2.0.0",
    py_version="py310",
    model_server_workers=1,
    role=role,
    name=f"HuggingFaceModel-Llama2-7b-{rand_id}",
    sagemaker_session=pipeline_session
)

create_step_args = huggingface_model.create(instance_type=inference_instance_type)
step_create_model = ModelStep(
    name="CreateModel",
    step_args=create_step_args
)

llm image uri: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.23.0-deepspeed0.9.5-cu118


# Model Metrics
To capture the model training and evalution metrics from a SageMaker Training job, we use a `ModelMetrics` class. We captured the model evaluation metrics in a `evaluation.json`, stored in the specified S3 location. With that information, we create a `ModelMetrics` object to include the metrics. The object is used to register the finetuned model.

In [21]:
from sagemaker.model_metrics import MetricsSource, ModelMetrics
import os 

model_package_group_name = f"NousResearch-Llama-2-7b-chat-hf-{rand_id}"
model_metrics = ModelMetrics(
    model_statistics=MetricsSource(
        s3_uri=os.path.join(model_eval_s3_loc, "evaluation.json"),
        content_type="application/json",
    )
)

The input argument instance_type of function (sagemaker.image_uris.get_training_image_uri) is a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>), which is interpreted in pipeline execution time only. As the function needs to evaluate the argument value in SDK compile time, the default_value of this Parameter object will be used to override it. Please make sure the default_value is valid.


In [54]:
register_args = huggingface_model.register(
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=[
        "ml.p2.16xlarge", 
        "ml.p3.16xlarge", 
        "ml.g4dn.4xlarge", 
        "ml.g4dn.8xlarge", 
        "ml.g4dn.12xlarge", 
        "ml.g4dn.16xlarge", 
        "ml.g5.2xlarge",
        "ml.g5.12xlarge",
    ],
    model_package_group_name=model_package_group_name,
    customer_metadata_properties = {"training-image-uri": huggingface_estimator.training_image_uri()},  #Store the training image url
    approval_status="PendingManualApproval",
    model_metrics=model_metrics
)
step_register = ModelStep(name="RegisterModel", step_args=register_args)

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


# Define a Pipeline of Parameters and Steps 
In this section, we combine all the steps into a Pipeline so it can be executed.
A pipeline requires a name, parameters, and steps. Names must be unique within an (account, region) pair.

Note:

* All the parameters used in the definitions must be present.
* Steps passed into the pipeline do not have to be listed in the order of execution. The SageMaker Pipeline service resolves the data dependency DAG as steps for the execution to complete.
* Steps must be unique to across the pipeline step list.

In [22]:
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.execution_variables import ExecutionVariables
from sagemaker.workflow.pipeline_experiment_config import PipelineExperimentConfig

pipeline_name = f"Llama2FMOpsPipeline-{rand_id}"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_output_s3_location_param,
        model_id_param,
        epochs_param,
        per_device_train_batch_size_param,
        per_device_eval_batch_size_param,
        learning_rate_param,
        optimizer_param,
        logging_steps_param,
        lora_r_param,
        lora_alpha_param,
        lora_dropout_param,
        use_4bit_param,
        bnb_4bit_compute_dtype_param,
        bnb_4bit_quant_type_param,
        training_job_instance_type_param,
        model_output_s3_loc_param,
        training_dataset_s3_loc_param,
        eval_dataset_s3_loc_param,
        training_dataset_split_param,
        eval_dataset_split_param,
        base_model_group_name_param,
        region_param,
        model_eval_s3_loc_param,
        hf_dataset_name_param,
        processing_job_instance_type_param,
        base_model_package_group_name_param
    ],
    steps=[step_process, step_train, step_create_model,step_register],
    sagemaker_session=pipeline_session
)

# Examining the pipeline definition
The JSON of the pipeline definition can be examined to confirm the pipeline is well-defined and the parameters and step properties resolve correctly.

In [23]:
import json

definition = json.loads(pipeline.definition())
definition

Using provided s3_resource


INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


Using provided s3_resource




{'Version': '2020-12-01',
 'Metadata': {},
 'Parameters': [{'Name': 'ProcessingOutputS3Location',
   'Type': 'String',
   'DefaultValue': 's3://sagemaker-us-west-2-349674815289/data/workshop-6e737/train'},
  {'Name': 'ModelId',
   'Type': 'String',
   'DefaultValue': 'NousResearch/Llama-2-7b-chat-hf'},
  {'Name': 'Epochs', 'Type': 'Integer', 'DefaultValue': 1},
  {'Name': 'PerDeviceTrainBatchSize', 'Type': 'Integer', 'DefaultValue': 8},
  {'Name': 'PerDeviceEvalBatchSize', 'Type': 'Integer', 'DefaultValue': 8},
  {'Name': 'LearningRate', 'Type': 'Float', 'DefaultValue': 0.0002},
  {'Name': 'Optimizer', 'Type': 'String', 'DefaultValue': 'paged_adamw_32bit'},
  {'Name': 'LoggingSteps', 'Type': 'Integer', 'DefaultValue': 25},
  {'Name': 'LoraR', 'Type': 'Integer', 'DefaultValue': 64},
  {'Name': 'LoraAlpha', 'Type': 'Integer', 'DefaultValue': 16},
  {'Name': 'LoraDropout', 'Type': 'Float', 'DefaultValue': 0.1},
  {'Name': 'Use4Bit', 'Type': 'Boolean', 'DefaultValue': True},
  {'Name': 'Bn

# Submit the pipeline to SageMaker and start execution
Submit the pipeline definition to the Pipeline service. The Pipeline service uses the role that is passed in to create all the jobs defined in the steps.

In [24]:
pipeline.upsert(role_arn=role)

INFO:sagemaker.processing:Uploaded src/preprocess to s3://sagemaker-us-west-2-349674815289/Llama2FMOpsPipeline-6e737/code/5098549aa8d474da89fa5af4cc87ec60/sourcedir.tar.gz
INFO:sagemaker.processing:runproc.sh uploaded to s3://sagemaker-us-west-2-349674815289/Llama2FMOpsPipeline-6e737/code/bc2536a25d34e1ecae5238f42f4207c2/runproc.sh


Using provided s3_resource
Using provided s3_resource


INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


{'PipelineArn': 'arn:aws:sagemaker:us-west-2:349674815289:pipeline/Llama2FMOpsPipeline-6e737',
 'ResponseMetadata': {'RequestId': 'aaf1e96b-7301-4201-be7e-aad6fad77b40',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'aaf1e96b-7301-4201-be7e-aad6fad77b40',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '93',
   'date': 'Fri, 17 Nov 2023 18:48:48 GMT'},
  'RetryAttempts': 0}}

Start the pipeline and accept all the default parameters.

In [25]:
execution = pipeline.start()

Wait for the pipeline to complete

In [28]:
execution.wait()

# SageMaker Project
SageMaker Projects help organizations set up and standardize developer environments for data scientists and CI/CD systems for MLOps engineers. 

Projects also help organizations set up dependency management, code repository management, build reproducibility, and artifact sharing.
You can provision SageMaker Projects from the AWS Service Catalog using custom or SageMaker-provided templates. 

For information about the AWS Service Catalog, see What Is [AWS Service Catalog](https://docs.aws.amazon.com/servicecatalog/latest/dg/what-is-service-catalog.html). 

With SageMaker Projects, MLOps engineers and organization admins can define their own templates or use SageMaker-provided templates. 

The SageMaker-provided templates bootstrap the ML workflow with source version control, automated ML pipelines, and a set of code to quickly start iterating over ML use cases.

Here's an architecture diagram that shows the components of a SageMaker Project:

![sagemaker project](images/sagemaker-project.png)

# Create a SageMaker Project
In this lab, we'll use a SageMaker project to help us create the infrastructure for creating an LLM deployment pipeline. Specifically, we'll leverage SageMaker project to create the following main components:

1. A Git repository using AWS CodeCommit for code and configuration for the model deployment. 
2. A CICD pipeline using AWS CodePipeline orchestrates the deployment process.
3. A build project using AWS CodeBuild to create a CloudFormation template based on the given configuration.
4. Event Bridge Rule with AWS Event Bridge that triggers an LLM deployment based on the approval status update made in the SageMaker Model Registry

    
In the following section, we'll walk through the steps for creating a SageMaker Project. Before that, we need to capture the ```model registry package group name``` which was created in the SageMaker Pipeline step. 

<div style="background-color: #FFDDDD; border-left: 5px solid red; padding: 10px; color: black;">
    <strong>Important:</strong> Please make sure to use the correct SageMaker model package group name when creating a SageMaker project. You can find the the model registry package group name in a variable called <strong>model_package_group_name</strong> defined previously in the lab. The following cell prints the value so you could reference the value accordingly. 
</div>


In [41]:
print(f"The model package group name to be used for the SageMaker Project: \x1b[6;30;42m{model_package_group_name}\x1b[0m")

The model package group name to be used for the SageMaker Project: [6;30;42mNousResearch-Llama-2-7b-chat-hf-6e737[0m


# Setting Up SageMaker Project

In this lab, we'll leverage SageMaker Project to create the a CICD pipeline that integreates with the LLM training pipeline. We are going to go through the following steps: 

#### 1. Retrieve the `model_package_group_name` that was registered in SageMaker Model Registry. This value should be stored in `model_package_group_name` variable in this notebook.

#### 2. Navigate to SageMaker project from Studio as shown in the following diagram:

<img src="images/sagemaker-project-studio-ui.png" width="250">

#### 3. Create a new SageMaker Project by clicking on *Create Project* button

#### 4. Select `Model deployment` in the list, as shown in the following diagram:

<img src="images/sm-project-create.png" width="500">

#### 5. Provide a *unique project name* and the model package group name value shown in the previous cell:

<div style="background-color: #FFDDDD; border-left: 5px solid red; padding: 10px; color: black;">
    <strong>Important:</strong> Please make sure to use the correct SageMaker model package group name when in this step SageMaker project. You can find the the model registry package group name in the variable named <strong>model_package_group_name</strong> defined previously in this notebook. As shown in the screenshot below, my model registry package group name is <strong>NousResearch-Llama-2-7b-chat-hf-67140</strong>. Yours will be different than mine, but the naming convention should be similar.
</div>

<img src="images/sm-project-template.png" width="500">

*Note:* Make sure your project name is unique to avoid any conflict with other projects already exists in the AWS environment.

### 6. Click on the newly created project and clone the repository into your SageMaker Studio environment

<img src="images/sm-project-clone-repository.png" width="500">
<img src="images/sm-project-clone-repo.png" width="500">


#### 7. After cloning the repository, you should see the gitcommit repository folder created:

<img src="images/sm-project-local-clone.png" width="1200">

#### 8. Since the SageMaker project repository was created via a template, we need to make a minor change to the configuration so the CodePipeline can adapt the LLM which we've been working with. In the project git repository folder that we cloned in the earlier step, open the file named `staging-config.json` using an edit as shown in the following:
<img src="images/sm-project-staging-update.png" width="500">


#### 9. Update the following variables:

* EndpointInstanceType: "ml.g5.2xlarge"
* EnableDataCapture: "true"

<img src="images/sm-project-staging-change.png" width="500">

#### 10. Save the changes (by Ctrl-S on Windows, or Command-S, or File->Save JSON File).

#### 11. Navigate to Git console from the left panel and stage the changes by clicking the '+' sign as shown in the following diagram:

![stage-changes](images/sm-project-stage.png)

#### 12. Commit the changes by adding a brief description of the change, and hit the `Commit` button at the end.

<img src="images/sm-project-commit.png" width="300">
<img src="images/sm-project-commit-email.png" width="300">

#### 13. Push the changes into AWS Codecommit as shown in the diagram below. A successful push message will appear in the lower right hand corner of the screen.

<img src="images/sm-project-git-push.png" width="300">

# SageMaker Model Registry Status Update 
With the success of AWS CodePipeline update, next, we will verify the deployment process by approving the model that we have registered in the SageMaker Model Registry via SageMaker Pipeline execution.

#### 1. Locate the Model Package Group Name that you've created in the SageMaker Pipeline Run:

<div style="background-color: #FFDDDD; border-left: 5px solid red; padding: 10px; color: black;">
<strong>Important:</strong> You can find the the model registry package group name in the variable named <strong>model_package_group_name</strong> defined previously in this notebook. As shown in the screenshot below, my model registry package group name is <strong>NousResearch-Llama-2-7b-chat-hf-67140</strong>. Yours will be different than mine, but the naming convention should be similar.
</div>

Your model package group name is shown below:

In [52]:
print(f"Your model package group name is: {model_package_group_name}")

Your model package group name is: NousResearch-Llama-2-7b-chat-hf-6e737


![sm-model-registry-ui](images/model-registry-ui.png)

#### 2. By default, the status of the registered model is in "Pending Approval". When the model is approved, it'll trigger a model deployment job in CodePipeline. To update the status, edit the model version as shown below:

<img src="images/model-registry-approval-edit.png" width="800">

#### 3. Change the status to `Approve`, and save the changes.

<img src="images/model-registry-approve-save.png" width="800">

#### 4. A new CodePipeline job should be triggered based on the approval status update from the previous step. To verify the pipeline execution, navigate to AWS CodePipeline console

![code-pipeline console](images/codepipeline-aws-ui.png)

#### 5. Click on the pipeline that was created for your project:

![code-pipeline-status-check](images/codepipeline-status-check.png)

#### 6. The LLM deployment should be triggered. It takes about 10 minutes to deploy the model in SageMaker Hosting. You can track the status in the pipeline
directly in the CodePipeline UI as shown in the following:

![code-pipeline-deploy](images/codepipeline-staging-deploy.png)

#### 7. Congratulations! You've successfully completed the model deployment pipeline for the LLM.

#### 8. Find out the endpoint name
After the endpoint is deployed, you can find the endpoint name from the SageMaker Studio:

Go to ![home](images/home.png) -> Deployment -> Endpoints

![endpoint](images/sm-studio-deployment-endpoint.png)



<div style="background-color: #FFDDDD; border-left: 5px solid red; padding: 10px; color: black;">
<strong>Note:</strong> Capture the endpoint name as you see in the endpoint, we will use this endpoint in the subsequent labs to build a generative AI application using the LLM endpoint. In the example above, the endpoint name for my deployment is <strong>hf-llama2-b987c-pipeline-staging</strong>. Your endpoint name will be different. Copy your endpoint name, we will store the variable into the session so that we could reference it in the later labs. 
</div>


In [61]:
llm_endpoint_name = "hf-llama2-b987c-pipeline-staging".strip() # put your endpoint name here.

In [62]:
%store llm_endpoint_name

Stored 'llm_endpoint_name' (str)


# Next Step
In this lab, we saw how we can use SageMaker Pipelines to automate the LLM model training/finetuning pipeline, and integrates with a CICD pipeline created using SageMaker Project orchestrate model deployment process at scale using AWS CodePipeline. 

We'll keep the endpoint running for the remaining of the workshop.
