# Fine-tuning Llama 3 for Amazon Bedrock using SageMaker Pipeline
## Introduction
This notebook demonstrates how to fine-tune a Llama 3 model using a SageMaker Pipeline and deploy it to Amazon Bedrock. It covers the entire process from data preparation to model deployment.
This notebook demonstrates how to fine-tune a Llama 3 model using Amazon SageMaker and deploy it to Amazon Bedrock. It covers the entire machine learning workflow, including:

- Data preparation and preprocessing
- Model training using SageMaker Pipeline
- Model registration
- Deployment to Amazon Bedrock
- Inference comparison

## Architecture
The following diagram illustrates the end-to-end ML workflow:

![Architecture Diagram](Llama3_finetuning_bedrock.png)

This pipeline processes, trains, and evaluates a model using HuggingFace containers, then registers it before deploying to Amazon Bedrock through a Lambda function for inference. Model artifacts are stored in S3 throughout the proces

## Prerequisites

- An AWS account with appropriate permissions
- SageMaker Studio or a SageMaker Notebook instance
- Access to the Llama 3 model on Hugging Face (meta-llama/Llama-3.2-3B-Instruct)
- Necessary Python libraries installed (transformers, sagemaker, boto3, etc.)

## Setup

### Install required libraries

In [None]:
%pip install transformers  sagemaker seaborn sentence-transformers nltk scikit-learn "huggingface_hub[cli]" --upgrade --quiet

### Import required libraries

In [None]:
import boto3
import botocore
import importlib.util
import json
import logging
import os
import sys
import time

from huggingface_hub import HfApi
from sagemaker.huggingface import HuggingFace, HuggingFaceModel, get_huggingface_llm_image_uri
from sagemaker.lambda_helper import Lambda
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.s3 import S3Downloader, S3Uploader
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.lambda_step import LambdaStep, LambdaOutput, LambdaOutputTypeEnum
from sagemaker.workflow.model_step import ModelStep
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.properties import PropertyFile
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.workflow.steps import ProcessingStep, TrainingStep, CacheConfig, CreateModelStep

import sagemaker

# Import custom modules
spec = importlib.util.spec_from_file_location("iam_role_helper", "iam_role_helper.py")
iam_role_manager = importlib.util.module_from_spec(spec)
sys.modules["iam_role_manager"] = iam_role_manager
spec.loader.exec_module(iam_role_manager)

spec = importlib.util.spec_from_file_location("utils", "utils.py")
utils = importlib.util.module_from_spec(spec)
sys.modules["utils"] = utils
spec.loader.exec_module(utils)

# Import specific functions from custom modules if needed
from iam_role_helper import create_lambda_execution_role, create_or_update_role,create_boto3_layer
from utils import monitor_pipeline_execution, wait_for_model_availability, run_model_evaluation


### Set Up Hugging Face Credentials

To access the Llama 3 model weights, you need to authenticate with Hugging Face. This step requires an API token and access to the `meta-llama/Llama-3.2-3B-Instruct` model.

> ⚠️ **Important**: You must have explicit permission to access the Llama 3 model. If you don't have access, you'll need to request it from Meta AI.

Steps:

1. Obtain a Hugging Face API token:
   - Visit the [Hugging Face Access Tokens page](https://huggingface.co/settings/tokens)
   - Create a new token or use an existing one
   
2. Ensure you have access to the Llama 3 model:
   - Check your access [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
   - If you don't have access, follow the instructions in this [discussion thread](https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/172)

3. Set your Hugging Face API token:


In [None]:
os.environ['HUGGINGFACE_TOKEN'] = 'your_huggingface_token_here'

4. Verify your access:

In [None]:
api = HfApi()
model_info = api.model_info("meta-llama/Llama-3.2-3B-Instruct")
print(f"You have access to: {model_info.modelId}")

📘 Note: For more information on Hugging Face access tokens and security, refer to the [official documentation](https://huggingface.co/docs/hub/en/security-tokens).

### Configure SageMaker session and IAM role

In [None]:
sagemaker_session = sagemaker.Session()
sagemaker_execution_role = "your_sagemaker_execution_role"
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sagemaker_session is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sagemaker_session.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    #role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
    #use this code if you are running locally
    role = iam.get_role(RoleName=sagemaker_execution_role)['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
sm_client = boto3.client('sagemaker', region_name=sess.boto_region_name)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

# Contents

1. [Setup and Environment Configuration](#setup)
2. [Data Preparation](#data-preparation)
3. [SageMaker Pipeline Creation](#sagemaker-pipeline)
4. [Deployment to Amazon Bedrock](#bedrock-deployment)
6. [Pipeline Job Creation](#pipeline-execution)
5. [Inference Analysis](#model-evaluation)
6. [Resource Cleanup](#clean-up-resources)

Follow along with this notebook to learn how to leverage AWS's machine learning ecosystem for fine-tuning and deploying custom language models.

## Data Preparation
In this section, we'll prepare our dataset for fine-tuning the Llama 3 model. 
Proper data preparation is crucial for effective model training. We'll be:

1. Downloading a pre-prepared dataset
2. Uploading it to an S3 bucket for use in our SageMaker pipeline

### About the Dataset

We're using a shortened version of the [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1) (OpenAssistant Conversations Dataset) from Hugginface. 
This dataset has been specifically curated to focus on Question-Answer (QA) examples, 
making it ideal for fine-tuning our model for QA tasks. The dataset is currently 
stored in an Amazon S3 bucket for easy access and integration with our SageMaker pipeline.

Using this pre-processed and focused dataset ensures that:
1. Our data is relevant to the QA task we're training for
2. The dataset size is manageable for this demonstration
3. We can efficiently access the data through Amazon S3

This preprocessing step saves time and computational resources while still providing 
a rich dataset for fine-tuning our Llama 3 model.

### Download the dataset

In [None]:
dataset_S3Uri="s3://jumpstart-cache-prod-us-west-2/training-datasets/oasst_top/train/"

In [None]:
train_dataset_path = S3Downloader.download(s3_uri=dataset_S3Uri, local_path=f"dataset/")
print(f"Training config downloaded to:")
print(train_dataset_path)

### Upload the processed data to S3"

In [None]:
input_path = f's3://{sess.default_bucket()}/datasets/llama3'
# upload the model yaml file to s3
train_dataset_path = "dataset/train.jsonl"
train_s3_path = S3Uploader.upload(local_path=train_dataset_path, desired_s3_uri=f"{input_path}/dataset")

print(f"Training dataset uploaded to:")
print(train_s3_path)

# Sagemaker Pipeline
## Why Use a SageMaker Pipeline?

SageMaker Pipelines allow us to create reusable workflows for machine learning tasks. 
By defining our process as a pipeline, we gain several benefits:

1. Reproducibility: The entire workflow can be easily recreated and rerun
2. Automation: Steps are executed automatically in sequence
3. Scalability: Pipeline can handle large-scale data processing and model training
4. Versioning: Each run of the pipeline can be tracked and compared

In this section, we'll define the steps of our pipeline, including data preprocessing, 
model training, and model registration.

## Define pipeline parameters

In this section, we set up essential parameters for our SageMaker pipeline. These parameters define:

- The pipeline session for managing our workflow
- AWS region and model naming for resource management
- Compute resources for data preprocessing
- Caching configuration to improve pipeline efficiency

These settings help us optimize our pipeline's performance, ensure consistency across runs, and manage computational resources effectively. Adjusting these parameters allows us to fine-tune the pipeline for different scenarios or datasets.

In [None]:
pipeline_session = PipelineSession()

# Define pipeline parameters
region=sagemaker_session.boto_region_name
model_name = "llama3-qa-model"
instance_type_preprocessing = "ml.m5.large"
instance_count = 1
# Cache configuration to improve pipeline execution time
cache_config = CacheConfig(enable_caching=True, expire_after="30d")

## Create preprocessing step
This section defines the preprocessing step of our SageMaker pipeline. Here's what we're doing:

- Setting up a SKLearnProcessor to handle our data preprocessing
- Configuring input and output paths for our data
- Defining the preprocessing script location

The preprocessing step is crucial for preparing our data before model training. It ensures our dataset is in the correct format and structure for the Llama 3 model fine-tuning process.

By using SageMaker's built-in SKLearnProcessor, we leverage AWS-optimized containers for efficient and scalable data processing, streamlining our ML workflow.

  

In [None]:
preprocessing_processor = SKLearnProcessor(
    framework_version="1.0-1",
    instance_type=instance_type_preprocessing,
    instance_count=instance_count,
    base_job_name="llama3-qa-preprocessing",
    role=role,
    max_runtime_in_seconds=3600,  # Set a maximum runtime of 1 hour,
    sagemaker_session=pipeline_session
)


In [None]:
inputs = [
    ProcessingInput(source=train_s3_path, destination="/opt/ml/processing/input"),
]

outputs = [
    ProcessingOutput(output_name="train", source="/opt/ml/processing/output/train"),
    ProcessingOutput(output_name="test", source="/opt/ml/processing/output/test")
]


In [None]:
preprocessing_step = ProcessingStep(
    name="PreprocessQADataset",
    processor=preprocessing_processor,
    inputs=inputs,
    outputs=outputs,
    
    code="scripts/preprocessing/preprocess.py",
)

## Create training step
This section sets up the core training step of our SageMaker pipeline. Key aspects include:

- Defining the training configuration using a YAML file
- Setting up a HuggingFace estimator for training the Llama 3 model
- Configuring compute resources, environment variables, and hyperparameters
- Creating a TrainingStep that integrates with our pipeline

The training step is where the actual fine-tuning of the Llama 3 model occurs. We're using SageMaker's integration with HuggingFace to simplify the process of training large language models.

This configuration allows us to leverage distributed training techniques like FSDP (Fully Sharded Data Parallel) for efficient training of our large model.


In [None]:
%%writefile llama_3_2_3B_fsdp_lora.yaml
# script parameters
model_id: "meta-llama/Llama-3.2-3B-Instruct"# Hugging Face model id
max_seq_length:  512 #2048              # max sequence length for model and packing of the dataset
# sagemaker specific parameters
train_dataset_path: "/opt/ml/input/data/train" # path to where SageMaker saves train dataset
test_dataset_path: "/opt/ml/input/data/test"   # path to where SageMaker saves test dataset
#output_dir: "/opt/ml/model"            # path to where SageMaker will upload the model 
output_dir: "/tmp/llama3"            # path to where SageMaker will upload the model 
# training parameters
report_to: "tensorboard"               # report metrics to tensorboard
learning_rate: 0.0002                  # learning rate 2e-4
lr_scheduler_type: "constant"          # learning rate scheduler
num_train_epochs: 10                   # number of training epochs
per_device_train_batch_size: 16         # batch size per device during training
per_device_eval_batch_size: 16          # batch size for evaluation
gradient_accumulation_steps: 2         # number of steps before performing a backward/update pass
optim: adamw_torch                     # use torch adamw optimizer
logging_steps: 10                      # log every 10 steps
save_strategy: epoch                   # save checkpoint every epoch
evaluation_strategy: epoch             # evaluate every epoch
max_grad_norm: 0.3                     # max gradient norm
warmup_ratio: 0.03                     # warmup ratio
bf16: true                             # use bfloat16 precision
tf32: false                             # use tf32 precision
gradient_checkpointing: true           # use gradient checkpointing to save memory
# FSDP parameters: https://huggingface.co/docs/transformers/main/en/fsdp
fsdp: "full_shard auto_wrap offload" # remove offload if enough GPU memory
fsdp_config:
  backward_prefetch: "backward_pre"
  forward_prefetch: "false"
  use_orig_params: "false"

In [None]:
# upload the model yaml file to s3
model_yaml = "llama_3_2_3B_fsdp_lora.yaml"
train_config_s3_path = S3Uploader.upload(local_path=model_yaml, desired_s3_uri=f"{input_path}/config")

print(f"Training config uploaded to:")
print(train_config_s3_path)

In [None]:
# define Training Job Name with timestamp

timestamp = time.strftime('%Y%m%d-%H%M%S')
job_name = f'llama3-8B-exp1-{timestamp}'

# create the Estimator
huggingface_estimator = HuggingFace(
    entry_point          = 'training/train_fsdp_lora.py',      # train script
    model_dir            = '/opt/ml/model',
    source_dir           = 'scripts/',  # directory which includes all the files needed for training
    instance_type        = 'ml.g5.12xlarge',  # instances type used for the training job
    #instance_type        = 'ml.g5.48xlarge',  # instances type used for the training job
    #instance_type        = 'ml.g5.16xlarge',  # instances type used for the training job
    instance_count       = 2,                 # the number of instances used for training
    max_run              = 2*24*60*60,        # maximum runtime in seconds (days * hours * minutes * seconds)
    base_job_name        = job_name,          # the name of the training job
    role                 = role,              # Iam role used in training job to access AWS ressources, e.g. S3
    volume_size          = 500,               # the size of the EBS volume in GB
    transformers_version = '4.36.0',          # the transformers version used in the training job
    pytorch_version      = '2.1.0',           # the pytorch_version version used in the training job
    py_version           = 'py310',           # the python version used in the training job
    hyperparameters      =  {
        "config": "/opt/ml/input/data/config/llama_3_2_3B_fsdp_lora.yaml" # path to TRL config which was uploaded to s3
    },
    sagemaker_session=pipeline_session,
    disable_output_compression = True,        # not compress output to save training time and cost
    distribution={"torch_distributed": {"enabled": True}},   # enables torchrun
    environment  = {
        "HUGGINGFACE_HUB_CACHE": "/tmp/.cache", # set env variable to cache models in /tmp
        "HF_TOKEN": os.environ['HUGGINGFACE_TOKEN'],       # huggingface token to access gated models, e.g. llama 3
        "ACCELERATE_USE_FSDP": "1",             # enable FSDP
        "FSDP_CPU_RAM_EFFICIENT_LOADING": "1"   # enable CPU RAM efficient loading
    }, 
    
)

training_step = TrainingStep(
    name=job_name,
    estimator=huggingface_estimator,
    inputs={
        "train": sagemaker.inputs.TrainingInput(
            s3_data=preprocessing_step.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
        ),
        "config": sagemaker.inputs.TrainingInput(
            s3_data=train_config_s3_path,
        ),
        "test": sagemaker.inputs.TrainingInput(
            s3_data=preprocessing_step.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
        )
    },
)

## Create model registration step
This section focuses on registering our trained model in SageMaker. Key points include:

- Setting up a HuggingFaceModel for deployment
- Creating a CreateModelStep to integrate model creation into our pipeline
- Configuring a RegisterModel step to add our model to the SageMaker Model Registry

Model registration is crucial for version control and deployment management. It allows us to:
- Track different versions of our fine-tuned Llama 3 model
- Manage model approvals and transitions between stages (e.g., testing to production)
- Simplify model deployment and updates in production environments

By integrating this step into our pipeline, we ensure that each successful training run results in a properly registered and trackable model vers

In [None]:
image_uri = get_huggingface_llm_image_uri(
  backend="huggingface",
  region=region,
  version="2.0",
  
)

In [None]:
llm_model=HuggingFaceModel(
    transformers_version="4.37.0",
    pytorch_version="1.10.2",
    py_version="py310",
    role=role,
    image_uri=image_uri,
)

In [None]:
# Create model step
llama_model_step = CreateModelStep(
    name="CreateLlama3ModelStep",
    model=llm_model,
    inputs=training_step.properties.ModelArtifacts.S3ModelArtifacts,
    depends_on=[training_step],
)
    
# Crete a RegisterModel step, which registers the model with Sagemaker Model Registry.
model_package_group_name = "Llama3Models" 
step_register_model = RegisterModel(
    name="RegisterModel",
    model=llm_model,
    model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.g5.12xlarge"],
    transform_instances=["ml.g5.12xlarge"],
    model_package_group_name=model_package_group_name,
    depends_on=[training_step],
    approval_status="Approved",
)

## Bedrock Deployment
## Why Deploy to Amazon Bedrock?

Amazon Bedrock provides a serverless environment optimized for running large language models. 
By deploying our fine-tuned Llama 3 model to Bedrock, we gain:

1. Scalability: Bedrock can handle varying inference loads efficiently
2. Cost-effectiveness: Pay only for the compute resources used during inference
3. Easy integration: Simplified API for model invocation in production environments
4. Optimized performance: Bedrock is tuned for running large language models

This section will walk through the process of importing our trained model into Bedrock.

### Create IAM roles and policies for Bedrock access
This section focuses on setting up the necessary IAM (Identity and Access Management) roles and policies for Amazon Bedrock integration. We create two key roles:

1. Lambda role
   - Allows our Lambda function to interact with Bedrock and other AWS services
   - Includes permissions for model import and S3 access

2. Bedrock custom import role
   - Enables Bedrock to access our trained model in S3
   - Ensures secure and controlled access during the model import process

Creating these roles is crucial for:
- Maintaining the principle of least privilege
- Enabling secure communication between different AWS services
- Allowing our pipeline to programmatically import models into Bedrock

By carefully defining these roles, we ensure our deployment process is both secure and functional.

- Lambda role

In [None]:
# Import the module
spec = importlib.util.spec_from_file_location("iam_role_helper", "iam_role_helper.py")
iam_role_manager = importlib.util.module_from_spec(spec)
sys.modules["iam_role_manager"] = iam_role_manager
spec.loader.exec_module(iam_role_manager)

# Now you can use it
from iam_role_helper import create_lambda_execution_role

# Get account information
account_id = boto3.client('sts').get_caller_identity()['Account']
region = "us-west-2"
training_bucket = sagemaker_session_bucket
role_name = "LambdaBedrockExecutionRole"

# Define trust relationship
trust_relationship = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": account_id
                },
                "ArnEquals": {
                    "aws:SourceArn": f"arn:aws:bedrock:{region}:{account_id}:model-import-job/*"
                }
            }
        }
    ]
}

# Define policies
bedrock_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateModelImportJob",
                "bedrock:GetModelImportJob",
                "bedrock:ListModelImportJobs"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": ["iam:PassRole"],
            "Resource": f"arn:aws:iam::{account_id}:role/*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": "bedrock.amazonaws.com"
                }
            }
        }
    ]
}

s3_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                f"arn:aws:s3:::{training_bucket}",
                f"arn:aws:s3:::{training_bucket}/*"
            ]
        }
    ]
}

# Configure policies
policies_config = {
    'inline_policies': {
        'BedrockAccessPolicy': bedrock_policy,
        'S3AccessPolicy': s3_policy
    },
    'managed_policies': [
        "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
    ]
}

# Create the role
execution_role_arn = create_lambda_execution_role(
    role_name=role_name,
    trust_relationship=trust_relationship,
    policies_config=policies_config
)

print(f"Execution Role ARN: {execution_role_arn}")

- Bedrock custom import role

In [None]:
# Set up variables
account_id = boto3.client('sts').get_caller_identity()['Account']
region = "us-west-2"
training_bucket = sagemaker_session_bucket
role_name = "Sagemaker_Bedrock_import_role"

# Define policies
trust_relationship = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "bedrock.amazonaws.com"},
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {"aws:SourceAccount": account_id},
                "ArnEquals": {"aws:SourceArn": f"arn:aws:bedrock:{region}:{account_id}:model-import-job/*"}
            }
        },
        {
            "Effect": "Allow",
            "Principal": {"Service": "lambda.amazonaws.com"},
            "Action": "sts:AssumeRole"
        }
    ]
}

permission_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:ListBucket"],
            "Resource": [f"arn:aws:s3:::{training_bucket}", f"arn:aws:s3:::{training_bucket}/*"],
            "Condition": {"StringEquals": {"aws:ResourceAccount": account_id}}
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}

# Create or update the role
bedrock_role_arn = create_or_update_role(
    role_name=role_name,
    trust_relationship=trust_relationship,
    permission_policy=permission_policy
)

print(f"Role ARN: {bedrock_role_arn}")

### Set up Lambda function for model import to Bedrock
This section prepares a Lambda function to automate the process of importing our trained model into Amazon Bedrock. Key aspects include:

1. Creating a Lambda layer
   - Adds necessary dependencies (like boto3) to our Lambda environment
   - Ensures our function has access to required libraries

2. Creating the Lambda function
   - Defines a function that will handle the model import process
   - Configures function parameters like memory, timeout, and execution role

3. Integrating the Lambda function into our pipeline
   - Creates a LambdaStep to include this function in our SageMaker pipeline
   - Defines inputs (model URI, role ARN) and outputs (model ARN) for the step

This Lambda function acts as a bridge between our SageMaker pipeline and Amazon Bedrock, automating the model deployment process. It allows us to programmatically import our fine-tuned Llama 3 model into Bedrock as soon as training is complete.


- Create Lambda layer

In [None]:
lambda_client = boto3.client('lambda', region_name=region)
# Create the boto3 layer first
layer_arn = create_boto3_layer(lambda_client)
print(f"Layer ARN: {layer_arn}")

- Create Lambda functions

In [None]:
# Create Lambda function instance
lambda_func = Lambda(
    function_name="bedrock-model-import",
    execution_role_arn=execution_role_arn,
    script="scripts/lambda/bedrock_model_import.py",
    handler='bedrock_model_import.lambda_handler',
    timeout=900,  # 15 minutes, adjust as needed
    memory_size=128,
    runtime='python3.12',
    layers=[layer_arn],  # Your boto3 layer ARN
    )

In [None]:
# Define the outputs
lambda_outputs = [
    LambdaOutput(output_name="model_arn", output_type=LambdaOutputTypeEnum.String)
]

# Create the Lambda step
lambda_step = LambdaStep(
    name="BedrockModelImport",
    lambda_func=lambda_func,
    inputs={
        "model_uri": training_step.properties.ModelArtifacts.S3ModelArtifacts,  # Use the output from the training step
        "role_arn": bedrock_role_arn,
        "model_name": model_name
    },
    outputs=lambda_outputs,
    cache_config=CacheConfig(enable_caching=True, expire_after="1d"),
    depends_on=[step_register_model]
)

# Pipeline Execution
### Executing the Pipeline: What to Expect

Now that we've defined our pipeline, it's time to run it. This process will:

1. Preprocess our data
2. Train the Llama 3 model on our dataset
3. Register the model in the SageMaker Model Registry
4. Deploy the model to Amazon Bedrock

Depending on the size of your dataset and the complexity of your model, 
this process may take several hours to complete. We'll monitor the progress 
and check the results of each step.

### Create and run the SageMaker Pipeline
This section brings together all the previously defined steps to create and execute our complete SageMaker pipeline. Key points include:

- Assembling the pipeline by combining preprocessing, training, model registration, and Lambda steps
- Defining pipeline parameters and the execution role
- Using error handling to manage potential issues during pipeline creation and execution
- Initiating the pipeline execution

The pipeline encapsulates our entire workflow, from data preprocessing to model deployment in Bedrock. By using a pipeline, we ensure:

- Reproducibility of our ML workflow
- Automated execution of all steps in sequence
- Easy monitoring and management of the entire process

Running this pipeline will kick off the end-to-end process of fine-tuning our Llama 3 model and deploying it to Amazon Bedrock.

In [None]:
logging.basicConfig(level=logging.INFO)

try:
    pipeline = Pipeline(
        name="Llama3-QAPipeline",
        steps=[preprocessing_step, training_step,step_register_model,lambda_step ],
        parameters=[role, model_name],
        sagemaker_session=pipeline_session,
    )
    logging.info("Pipeline created successfully")

    pipeline.upsert(role_arn=role)
    logging.info("Pipeline upserted successfully")

    execution = pipeline.start()
    logging.info("Pipeline started successfully")

except ValueError as ve:
    logging.error(f"ValueError occurred: {str(ve)}")
    logging.error(f"Error occurred in pipeline definition: {pipeline.definition()}")
except Exception as e:
    logging.error(f"An error occurred: {str(e)}")
    logging.error(f"Error type: {type(e).__name__}")

### Monitor pipeline execution progress
This section focuses on tracking the progress of our SageMaker pipeline execution. Key aspects include:

- Using a custom function `monitor_pipeline_execution` to observe the pipeline's status
- Extracting information about individual step outputs, particularly the preprocessing step
- Retrieving the S3 URI for the test dataset, which will be used later for model evaluation

Monitoring the pipeline execution is crucial because:
- It allows us to track the progress of our end-to-end ML workflow in real-time
- We can quickly identify and troubleshoot any issues that arise during execution
- It provides valuable information about the location of output artifacts (like the test dataset)

This monitoring step ensures we have visibility into our pipeline's performance and prepares us for the subsequent model evaluation phase.

In [None]:
# get the pipeline execution status
monitor_pipeline_execution(execution)

In [None]:
# Get the preprocessing step
preprocessing_step_name = "PreprocessQADataset"  # Make sure this matches your step name
steps = execution.list_steps()
preprocessing_step_output = next((step for step in steps if step['StepName'] == preprocessing_step_name), None)

In [None]:
# Extract the processing job ARN
processing_job_arn = preprocessing_step_output['Metadata']['ProcessingJob']['Arn']

# Create a SageMaker client
sagemaker_client = boto3.client('sagemaker')

# Describe the processing job to get its details
processing_job_details = sagemaker_client.describe_processing_job(ProcessingJobName=processing_job_arn.split('/')[-1])

# Extract the S3 URI for the test output
test_output_uri = None
for output in processing_job_details['ProcessingOutputConfig']['Outputs']:
    if output['OutputName'] == 'test':
        test_output_uri = output['S3Output']['S3Uri']
        break
if test_output_uri:
    print(f"Extracted test output S3 URI: {test_output_uri}")
    
else:
    print("Could not find test output S3 URI in the processing job details")

# Model Evaluation
### Why Evaluate the Model?

After training and deploying our model, it's crucial to evaluate its performance. 
This helps us:

1. Understand how well the model has learned from our dataset
2. Identify any areas where the model might be struggling
3. Compare the performance of our fine-tuned model to the base Llama 3 model
4. Make informed decisions about whether the model is ready for production use

In this section, we'll run some sample inputs through our deployed model and analyze the results.

### Check Model Availability in Amazon Bedrock

After initiating the model import job, we need to verify that our model has been successfully imported and is available in Amazon Bedrock. This step is crucial before we can proceed with using the model for inference.

The following function, `wait_for_model_availability`, periodically checks Bedrock for the presence of our imported model. It continues checking at regular intervals until either:

1. The model is found and its details are returned, or
2. The maximum number of attempts is reached without finding the model.

This approach is necessary because the import process can take several minutes to complete, and the exact duration can vary depending on factors such as model size and current Bedrock workload.

> ⚠️ **Note:** The default settings check for the model every 60 seconds for up to 30 attempts (30 minutes total). You may need to adjust these parameters based on your specific model and use case.

Let's run this function to confirm our model's availability:

In [None]:
# Usage
model_name_filter = "llama3-qa-model"  # Replace with your model name
model_info = wait_for_model_availability(model_name_filter,max_attempts=30,delay=60)

if model_info:
    model_arn=model_info["modelArn"]
    print("Model is now available in Bedrock.")
else:
    print("Failed to find the model in Bedrock within the specified attempts.")

> 📌 Tip: While waiting for the import job to complete, you can take a short break. The import process typically takes between 10 to 30 minutes, depending on the model size and current Bedrock workload.

### Compare model outputs with expected results
In this section, we'll evaluate our newly imported Bedrock model by comparing its outputs with expected results from our test dataset.

> ⚠️ **Important Note:** After a successful import, Amazon Bedrock requires some time to prepare your custom model for inference. This process typically takes 10-15 minutes but may occasionally take longer.

> 📌 **Tip:** If you encounter errors like "Model is not in a valid state for invocation", it means Bedrock is still preparing your model. Wait for a few minutes and try again.

In [None]:
# Example usage
S3_URI = test_output_uri
results_df = run_model_evaluation(
    model_id=model_arn,
    s3_uri=S3_URI,
    num_samples=10,
    batch_size=5,
    display_examples=5
)


# Clean Up Resources

After completing your experiments and evaluations, it's crucial to clean up the resources you've created to avoid ongoing charges. This section will guide you through the process of deleting all the resources used in this notebook.

> ⚠️ **Warning:** The following steps will permanently delete resources. Make sure you've saved any important data or model artifacts before proceeding.

### 1. Delete the Bedrock Custom Model

First, let's remove the custom model from Amazon Bedrock:

In [None]:
def delete_bedrock_custom_model(model_name):
    bedrock_client = boto3.client('bedrock')
    try:
        bedrock_client.delete_imported_model(modelIdentifier=model_name)
        print(f"Successfully deleted Bedrock custom model: {model_name}")
    except botocore.exceptions.ClientError as error:
        error_code = error.response['Error']['Code']
        if error_code == 'ValidationException':
            print(f"Error deleting Bedrock custom model: The provided model name is invalid. Model Name: {model_name}")
        elif error_code == 'ResourceNotFoundException':
            print(f"Error: The model '{model_name}' was not found in Bedrock.")
        elif error_code == 'AccessDeniedException':
            print("Error: You do not have permission to delete this model.")
        elif error_code == 'ConflictException':
            print("Error: The model is currently in use or in a state that doesn't allow deletion.")
        else:
            print(f"Error deleting Bedrock custom model: {error}")

# Replace with your actual model name
MODEL_NAME = "llama3-qa-model"
delete_bedrock_custom_model(MODEL_NAME)


### 2. Delete IAM Roles

Now, let's remove the IAM roles we created specifically for this project:

In [None]:
def delete_iam_role(role_name):
    iam = boto3.client('iam')
    try:
        # Delete inline policies
        inline_policies = iam.list_role_policies(RoleName=role_name)['PolicyNames']
        for policy in inline_policies:
            iam.delete_role_policy(RoleName=role_name, PolicyName=policy)
            
        # Detach managed policies
        attached_policies = iam.list_attached_role_policies(RoleName=role_name)['AttachedPolicies']
        for policy in attached_policies:
            iam.detach_role_policy(RoleName=role_name, PolicyArn=policy['PolicyArn'])
            
        # Delete permissions boundary if it exists
        try:
            iam.delete_role_permissions_boundary(RoleName=role_name)
        except iam.exceptions.NoSuchEntityException:
            pass
        
        # Finally delete the role
        iam.delete_role(RoleName=role_name)
        print(f"Successfully deleted IAM role: {role_name}")
    except botocore.exceptions.ClientError as error:
        print(f"Error deleting IAM role: {error}")

# Delete LambdaBedrockExecutionRole
delete_iam_role("LambdaBedrockExecutionRole")

# Delete Sagemaker_Bedrock_import_role
delete_iam_role("Sagemaker_Bedrock_import_role")

### 3. Delete Lambda Functions

If you created any Lambda functions, delete them as well:

In [None]:
def delete_lambda_function(function_name):
    lambda_client = boto3.client('lambda')
    try:
        lambda_client.delete_function(FunctionName=function_name)
        print(f"Successfully deleted Lambda function: {function_name}")
    except botocore.exceptions.ClientError as error:
        print(f"Error deleting Lambda function: {error}")

# Replace with your actual Lambda function name
LAMBDA_FUNCTION_NAME = "bedrock-model-import"
delete_lambda_function(LAMBDA_FUNCTION_NAME)


### 4. Delete Lambda Layers

If you created any Lambda layers, remove them:

In [None]:
def delete_lambda_layer(layer_name):
    lambda_client = boto3.client('lambda')
    try:
        # List versions of the layer
        versions = lambda_client.list_layer_versions(LayerName=layer_name)['LayerVersions']
        
        # Delete each version
        for version in versions:
            lambda_client.delete_layer_version(
                LayerName=layer_name,
                VersionNumber=version['Version']
            )
        print(f"Successfully deleted all versions of Lambda layer: {layer_name}")
    except botocore.exceptions.ClientError as error:
        print(f"Error deleting Lambda layer: {error}")

# Replace with your actual Lambda layer name
LAYER_NAME = "boto3-latest"
delete_lambda_layer(LAYER_NAME)

> 📌 Tip: Always double-check your AWS Console to ensure all resources have been properly deleted.

### Verification

After running these cleanup steps, it's a good practice to manually verify in the AWS Console that all resources have been successfully deleted. Pay special attention to:

- Bedrock custom models
- IAM roles (especially LambdaBedrockExecutionRole and Sagemaker_Bedrock_import_role)
- Lambda functions and layers
> 🚨 Important: If you encounter any issues or have resources that weren't covered in this cleanup process, please refer to the AWS documentation or contact AWS support for assistance in properly removing all resources

# Conclusion

In this notebook, we've walked through the entire process of fine-tuning a Llama 3 model and deploying it to Amazon Bedrock. Let's recap the key steps and learnings:

### Key Accomplishments

1. **Data Preparation**: We preprocessed and prepared a dataset suitable for fine-tuning the Llama 3 model.

2. **SageMaker Pipeline**: We constructed a SageMaker pipeline that encompassed:
   - Data preprocessing
   - Model training using Llama 3
   - Model evaluation
   - Model registration

3. **Bedrock Deployment**: We successfully imported our fine-tuned model into Amazon Bedrock, making it ready for inference.

4. **Custom Inference**: We demonstrated how to use the custom Bedrock model for inference, comparing its outputs with expected results.

5. **Resource Management**: We created and managed various AWS resources, including IAM roles, Lambda functions, and S3 buckets.

### Key Learnings

- **Fine-tuning Large Language Models**: We gained hands-on experience in fine-tuning a state-of-the-art language model like Llama 3 for specific use cases.

- **SageMaker Pipelines**: We leveraged SageMaker Pipelines to create a reproducible and scalable ML workflow.

- **Bedrock Integration**: We learned how to bridge the gap between SageMaker and Bedrock, enabling us to use custom models in the Bedrock environment.

- **AWS Service Orchestration**: This project demonstrated the seamless integration of multiple AWS services (SageMaker, Bedrock, Lambda, IAM, S3) to create an end-to-end ML solution.

### Potential Next Steps

1. **Model Optimization**: Experiment with different hyperparameters or training datasets to further improve model performance.

2. **Scalability Testing**: Assess the model's performance under various loads to ensure it meets production requirements.

3. **Monitoring and Logging**: Implement comprehensive monitoring and logging for the deployed model in Bedrock.

4. **A/B Testing**: Compare the performance of your fine-tuned model against the base Llama 3 model or other variants.

5. **Continuous Learning**: Explore ways to implement continuous learning or periodic re-training to keep the model up-to-date.

> 💡 **Insight**: The combination of SageMaker's robust training capabilities and Bedrock's inference optimizations provides a powerful platform for deploying custom large language models.

### Final Thoughts

This notebook has demonstrated the power and flexibility of AWS's machine learning ecosystem. By leveraging SageMaker for training and Bedrock for deployment, we've created a custom language model that can be easily integrated into various applications.

Remember to clean up your resources as shown in the previous section to avoid unnecessary costs. As you continue to explore and build with these technologies, always keep best practices in mind, especially regarding data security and model governance.

Thank you for following along with this notebook. Happy modeling!