## Journey 5: Orchestrate with MLOps

After completing Journey 1-4, Kaitlin has identified the right model to use for her application, and customized the model to produce accurate text summaries. 
In the future, Kaitlin would prefer not to go through the entire journey again, one step at a time, when a new FM is available in Jumpstart or a new fine-tuning dataset is available. 
She instead wants to codify her journey into a repeatable end-to-end ML workflow that can be executed later either as a user-initiated or an event-triggered workflow.    
  
The goal of this notebook is to provide an implementation of a multi-step SageMaker pipeline that will take care of multiple models evaluation, selection and registration into the SageMaker model registry.  
For running this example we will use **LLama-2-7b** models that will be used with default weights or after a finetuning. All the models will be instantiated and finetuned by using [Amazon Sagemaker Jumpstart SDK](https://aws.amazon.com/sagemaker/jumpstart/).  

This notebook is also using other Amazon SageMaker components:  

[SageMaker Pipelines](https://aws.amazon.com/sagemaker/pipelines/) is a purpose-built workflow orchestration service to automate all phases of machine learning (ML) from data pre-processing to model monitoring. With an intuitive UI and Python SDK you can manage repeatable end-to-end ML pipelines at scale. The native integration with multiple AWS services allows you to customize the ML lifecycle based on your MLOps requirements.
SageMaker Model Registry

[Amazon SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html) is a purpose-built metadata store to manage the entire lifecycle of ML models from training to inference. Whether you prefer to store your model artifacts (model framework files, container image) in AWS (Amazon ECR) or outside of AWS in any third party Docker repository, you can now track them all in Amazon SageMaker Model Registry. You also have the flexibility to register a model without read/write permissions to the associated container image. If you want to track an ML model in a private repository, set the optional ‘SkipModelValidation’ parameter to ‘All’ at the time of registration. Later you can also deploy these models for inference in Amazon SageMaker.

[Amazon SageMaker Clarify](https://aws.amazon.com/sagemaker/clarify/) provides purpose-built tools to gain greater insights into your ML models and data, based on metrics such as accuracy, robustness, toxicity, and bias to improve model quality and support responsible AI initiative. With the rise of generative AI, data scientists and ML engineers can leverage publicly available foundation models (FMs) to accelerate speed-to-market. To remove the heavy lifting of evaluating and selecting the right FM for your use case, Amazon SageMaker Clarify supports FM evaluation to help you quickly evaluate, compare, and select the best FM for your use case based on a variety of criteria across different tasks within minutes. It allows you to adopt FMs faster and with confidence.
To perform evaluation we are using the open source library [FMEval](https://github.com/aws/fmeval) that empowers SageMaker Clarify FM model evaluation.

This example was built by following the best practices explained in the blog post [Operationalize LLM Evaluation at Scale using Amazon SageMaker Clarify and MLOps services](https://aws.amazon.com/blogs/machine-learning/operationalize-llm-evaluation-at-scale-using-amazon-sagemaker-clarify-and-mlops-services/). 

### Environment setup
You need to select `Data Science 3.0 kernel` with `ml.t3.medium` instance to run this notebook.

First we need to install required dependencies and import required libraries.  
We also make sagemaker SDK aware of the configuration file *config.yml*. 
This file *config.yml* contains general pipeline parameters like the default pipeline container instance type and the path to the file *dependencies.txt* with the required dependencies.
These dependencies will be automatically downloaded from the pipeline container at the start of each pipeline step. We will create *requirements.txt* file later in the notebook.

In [1]:
!pip3 install fmeval==0.3.0
!pip3 install sagemaker

Collecting fmeval==0.3.0
  Using cached fmeval-0.3.0-py3-none-any.whl.metadata (5.7 kB)
Collecting IPython (from fmeval==0.3.0)
  Downloading ipython-8.21.0-py3-none-any.whl.metadata (5.9 kB)
Collecting bert-score<0.4.0,>=0.3.13 (from fmeval==0.3.0)
  Using cached bert_score-0.3.13-py3-none-any.whl (61 kB)
Collecting detoxify<0.6.0,>=0.5.1 (from fmeval==0.3.0)
  Downloading detoxify-0.5.2-py3-none-any.whl.metadata (13 kB)
Collecting evaluate<0.5.0,>=0.4.0 (from fmeval==0.3.0)
  Using cached evaluate-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting ipykernel<7.0.0,>=6.26.0 (from fmeval==0.3.0)
  Downloading ipykernel-6.29.2-py3-none-any.whl.metadata (6.0 kB)
Collecting jiwer<4.0.0,>=3.0.3 (from fmeval==0.3.0)
  Using cached jiwer-3.0.3-py3-none-any.whl.metadata (2.6 kB)
Collecting markdown (from fmeval==0.3.0)
  Using cached Markdown-3.5.2-py3-none-any.whl.metadata (7.0 kB)
Collecting matplotlib<4.0.0,>=3.8.0 (from fmeval==0.3.0)
  Using cached matplotlib-3.8.2-cp310

In [None]:
import sagemaker
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.function_step import step
from steps.deploy_llama7b import deploy_llama7b
from steps.finetune_llama7b import finetune_llama7b
from steps.deploy_finetuned_llama7b import deploy_finetuned_llama7b
from steps.selection import selection
from steps.preprocess import preprocess
from steps.evaluation import evaluation
from steps.register import register
from steps.cleanup import cleanup
from steps.utils import create_training_job_name
import os

os.environ["SAGEMAKER_USER_CONFIG_OVERRIDE"] = os.getcwd()

### Evaluation dataset preparation - preprocess step
We save data paths of the evaluation dataset in *evaluation_data_path* and the path for the pipeline outputs in *output_data_path*.  
We then configure **preprocess** our first pipeline step. This step will take care of any data preprocessing that must be done
on the evaluation dataset. The output data path after processing is contained in *preprocess_step_ret*.  
Remember the *pipeline_name* as it will be used also in SageMaker Studio to identify our pipeline.  
Also mark down the path of the S3 bucket used as output for later consultation.

For running this example we will use a sub sample taken from [SCIQ](https://huggingface.co/datasets/sciq) dataset

In [None]:
pipeline_name = "genai-for-builders-fmops-pipeline"

default_bucket = sagemaker.Session().default_bucket()
main_data_path = f"s3://{default_bucket}"
evaluation_data_path= main_data_path + "/datasets/sciq/evaluation/automatic/dataset_evaluation.jsonl"
output_data_path = (main_data_path + "/datasets/sciq/output_" + pipeline_name)

# You can add your own evaluation dataset code into this step
preprocess_step_ret = step(preprocess, name="preprocess")(evaluation_data_path, output_data_path)

print("The pipeline name is "+pipeline_name)
# Mark the name of this bucket for reviewing the artifacts generated by this pipeline at the end of the execution
print("Output S3 bucket: "+output_data_path)

### Setup models
We are now going to add different models into pipeline. Each model will have an optional **finetune** step, a **deploy** step and finally an **evaluation** step.
Before starting the setup we instantiate a couple of supporting array. 
*model_list* will contain the list of models defined as a dictionary of parameters.  
*evaluation_results_ret_list* will contain the result of the evaluation generated by the **evaluation** step.

In [None]:
model_list = []
evaluation_results_ret_list = []

### Setup first model: LLama-2-7b from SageMaker Jumpstart
For the first model we are using LLama-2-7b available in Amazon SageMaker Jumpstart.
We collect all the required parameters into a dictionary and we add it to *model_list* for later use.  
We will use one *ml.g5.2xlarge* instance for inference.

In [None]:
# We setup required model parameters
model_1 = {"model_id": "meta-textgeneration-llama-2-7b",
           "model_version": "3.0.2",
           "model_name": "llama-2-7b",
           "endpoint_name": "genai-for-builders-fmops-meta-textgeneration-llama-2-7b",
           "instance_type": "ml.g5.2xlarge",
           "num_instances": 1}

# We save the information of the model in the model_list array for later use
model_list.append(model_1)

We then configure **deploy** and **evaluation** data step. Note that **evaluation** step is dependent on both **preprocess** and **deploy** steps because is using the ret values as step inputs.

In [None]:
deploy_llama7b_ret = step(deploy_llama7b, name="deploy_llama7b")(model_1)

# Evaluation step is using the output from preprocess (the S3 location of the evaluation dataset file) 
# and the output of the deploy step (the endpoint name)
evaluate_llama7b_ret = step(evaluation,
                    name="evaluate_llama7b",
                    keep_alive_period_in_seconds=1200
                    )(model_1,
                      preprocess_step_ret,
                      deploy_llama7b_ret)

# We save the evaluation output details in the evaluation_results_ret_list array for later use
evaluation_results_ret_list.append(evaluate_llama7b_ret)

### Setup second model: LLama-2-7b from SageMaker Jumpstart to be instruction finetuned
The second model in this example is a LLama-2-7b from SageMaker Jumpstart that we are going to finetune with an instruction dataset.  
For this model we are going to set parameters required for finetuning job such as:
- *finetune_instance_type*: the instance type that will be used to finetune the model
- *epoch*: number of finetune epochs
- *max_input_length*: maximum input sequence length
- *per_device_train_batch_size*: batch size per device
- *instruction_tuned*: set to True will force the model to be instruction tuned
- *training_data_path*: the S3 data path containing the training dataset

We also setup the training job name manually to track it down during the pipeline execution.

In [2]:
# We setup required model parameters
model_2 = {
    "model_id": "meta-textgeneration-llama-2-7b",
    "model_version": "3.0.2",
    "model_name": "llama-2-7b-instruction-tuned",
    "endpoint_name": "genai-for-builders-fmops-meta-llama-2-7b-instr-finetuned",
    "finetune_instance_type": "ml.g5.12xlarge",
    "finetune_num_instances": 1,
    "instance_type": "ml.g5.2xlarge",
    "num_instances": 1,
    "epoch": 1,
    "max_input_length": 512,
    "per_device_train_batch_size": 4,
    "instruction_tuned": "True",
    "chat_dataset": "False",
    "training_data_path": f"s3://{default_bucket}/datasets/sciq/fine_tuning/instruction_fine_tuning",
    "is_finetuned_model": True
}
model_2["training_job_name"] = create_training_job_name(model_2["model_id"])

# We save the information of the model in the model_list array for later use
model_list.append(model_2)

NameError: name 'default_bucket' is not defined

We are now going to create the pipeline steps for the second model. For model 2 we add a **finetune** step before the **deploy** and **evaluation** steps.  
As before we are saving the evaluation results into *evaluation_results_ret_list* array.

In [None]:
finetune_ret = step(finetune_llama7b, name="finetune_llama7b_instruction")(model_2)

# Deploy step is using the output from the finetune step (the training job name)
deploy_finetuned_llama7b_ret = step(deploy_finetuned_llama7b, 
                                    name="deploy_finetuned_llama7b_instruction")(model_2, finetune_ret)

# Evaluation step is using the output from preprocess (the S3 location of the evaluation dataset file) 
# and the output of the deploy step (the endpoint name)
evaluate_finetuned_llama7b_instruction_ret = step(evaluation,
                    name="evaluate_finetuned_llama7b_instr",
                    keep_alive_period_in_seconds=1200,
                    )(model_2,
                      preprocess_step_ret,
                      deploy_finetuned_llama7b_ret)

# We save the information of the model in the model_list array for later use
evaluation_results_ret_list.append(evaluate_finetuned_llama7b_instruction_ret)

### Setup third model: LLama-2-7b-chat from SageMaker Jumpstart to be domain finetuned
The third model in this example is a LLama-2-7b-chat from SageMaker Jumpstart that we are going to finetune
with a domain dataset.  
For this model we are going to set parameters required for finetuning job such as:
- *finetune_instance_type*: the instance type that will be used to finetune the model
- *epoch*: number of finetune epochs
- *max_input_length*: maximum input sequence length
- *instruction_tuned*: set to True will force the model to be instruction tuned
- *training_data_path*: the S3 data path containing the training dataset
- *per_device_train_batch_size*: batch size per device

In [None]:
# We setup required model parameters
model_3 = {
    "model_id": "meta-textgeneration-llama-2-7b-f",
    "model_version": "3.0.2",
    "model_name": "llama-2-7b-chat-domain-tuned",
    "endpoint_name": "genai-for-builders-fmops-meta-llama-2-7b-chat-dom-finetuned",
    "finetune_instance_type": "ml.g5.12xlarge",
    "finetune_num_instances": 1,
    "instance_type": "ml.g5.2xlarge",
    "num_instances": 1,
    "epoch": 5,
    "max_input_length": 512,
    "per_device_train_batch_size": 4,
    "instruction_tuned": "False",
    "chat_dataset": "False",
    "training_data_path": f"s3://{default_bucket}/datasets/sciq/fine_tuning/domain_adaptation_fine_tuning",
    "is_finetuned_model": True
}
model_3["training_job_name"] = create_training_job_name(model_3["model_id"])

# We save the information of the model in the model_list array for later use
model_list.append(model_3)

We are now going to create the pipeline steps for model 3 like we did for model 2.

In [None]:
domain_finetune_ret = step(finetune_llama7b, name="finetune_llama7b_domain")(model_3)

# Deploy step is using the output from the finetune step (the training job name)
deploy_finetuned_llama7b_dom_ret = step(deploy_finetuned_llama7b, 
                                    name="deploy_finetuned_llama7b_domain")(model_3, domain_finetune_ret)

# Evaluation step is using the output from preprocess (the S3 location of the evaluation dataset file) 
# and the output of the deploy step (the endpoint name)
evaluate_finetuned_llama7b_domain_ret = step(evaluation,
                    name="evaluate_finetuned_llama7b_dom",
                    keep_alive_period_in_seconds=1200,
                    )(model_3,
                      preprocess_step_ret,
                      deploy_finetuned_llama7b_dom_ret)

# We save the information of the model in the model_list array for later use
evaluation_results_ret_list.append(evaluate_finetuned_llama7b_domain_ret)

### Select best model and register it in SageMaker Model Registry
Now it's time to select best model. To do so we create a pipeline step dedicated to the best model **selection**.
The selection is using the output of all the models' evaluation.
The output of the **selection** step is the best model name. We will use the best model name in the **register** step.  
The **register** step will also need a package group and description name.

In [None]:
# Evaluation step is using the output from the evaluation steps of all the models
selection_ret = step(selection, name="best_model_selection")(*evaluation_results_ret_list)

# Set a package group name and description
model_package_group_name = "GenAIForBuilderFMOpsEvaluationPipeline"
model_package_group_description = "GenAI For Builder FMOps Evaluation Pipeline Model Registry"

# We will register the best model in the model register. The best model name is contained in the return object of the selection step
register_ret = step(register, name="best_model_register")(model_list,
                                                          output_data_path,
                                                          model_package_group_name,
                                                          model_package_group_description,
                                                          selection_ret,
                                                          *evaluation_results_ret_list)

### Cleanup
The last pipeline step is dedicated to cleanup all the resource that we are going to instantiate with the pipeline.
For each model we create a **cleanup** step to be executed in parallel. All **cleanup** steps will fan-out after **register** step as they are dependent on its output.

In [None]:
# We need to create a cleanup step for each model. We collect the return objects to add them later in the pipeline creation function
cleanup_ret_list = []

for model in model_list:
    # We append register_ret to connect the register and cleanup steps together
    cleanup_ret = step(cleanup, name="cleanup_"+model["model_name"])(model["endpoint_name"], register_ret)
    cleanup_ret_list.append(cleanup_ret)

### Creating and launching the pipeline
We are finally ready to create and launch the pipeline but before doing that we will need to create a requirements.txt file.
As a best practice we are reading the current sagemaker library version that we are using to create the pipeline and set it as a requirement into the requirement file.
Keeping the same sagemaker version in the creation and running phase will allow us to avoid any deserialization issues.

In [None]:
if os.path.exists("requirements.txt"):
    os.remove("requirements.txt")

with open('requirements.txt', 'w') as req_file:
    req_file.write("fmeval==0.3.0\n")
    req_file.write("sagemaker==" + str(sagemaker.__version__) + "\n")
    

In the last cell of this notebook we are creating the pipeline and serializing it to S3. 
Don't forget to attach the execution role with sufficient permission and the return results from the last steps of our pipeline.
We are now ready to start the pipeline execution!

In [None]:
from sagemaker import get_execution_role
role = get_execution_role()

pipeline = Pipeline(name=pipeline_name, steps=cleanup_ret_list)
pipeline.upsert(role)
pipeline.start()