# Training a Model in SageMaker JumpStart for Amazon Bedrock Custom Model Import with Knowledge Distillation

## Knowledge Distillation from Large to Small Language Models  (Llama 3.1 405B → Llama 3.2 1B)

This notebook demonstrates how to use Amazon SageMaker JumpStart to train a smaller language model through knowledge distillation from a larger foundation model, and then deploy it through Amazon Bedrock's Custom Model Import feature. The process involves distilling knowledge from a large language model (405B parameters) to a smaller model (1B parameters) while maintaining performance on specialized QA tasks.

## 1. Introduction

### What is Knowledge Distillation?
Knowledge distillation is a model compression technique where a smaller model (student) is trained to mimic the behavior of a larger model (teacher). This process transfers knowledge from the larger model to the smaller one, creating a more efficient model that maintains much of the performance of the original.
### Why Use Knowledge Distillation?
- Cost Efficiency: Smaller models have lower inference costs
- Reduced Latency: Faster response times for real-time applications
- Lower Resource Requirements: Less memory and compute needed
- Specialized Knowledge: Focus on domain-specific capabilities
- Deployment Flexibility: Enables deployment on resource-constrained environments
### Architecture Overview
This implementation uses:
- Teacher Model: Llama 3.1 405B parameters (via Amazon Bedrock)
- Student Model: Llama 3.2 1B parameters (via SageMaker JumpStart)
- Workflow: Generate high-quality responses with teacher → Train student to replicate behavior → Deploy student via Bedrock Custom Model Import → Evaluate models

![Simplified workflow](knowledge_distillation_workflow_simplified.jpg)
*Simplified workflow*

## 2. Environment Setup
### Installing Required Libraries

In [None]:
%pip install --quiet --upgrade sagemaker jmespath datasets transformers jinja2 ipywidgets boto3 boto3 matplotlib numpy jsonlines

### AWS Account Configuration 

This section configures the necessary AWS resources including:

- SageMaker session and default bucket
- IAM roles and permissions
- Region-specific settings
- Required SDK versions and dependencies

In [None]:
# Standard library imports
import json
import time
import uuid
import random
import sys
import logging
from datetime import datetime
import pprint
from IPython.display import display, Markdown
from ipywidgets import Dropdown


# AWS SDK imports
import boto3
import botocore
from botocore.config import Config
import sagemaker
from sagemaker.s3 import S3Uploader
from sagemaker.jumpstart.estimator import JumpStartEstimator

from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker import hyperparameters, metric_definitions
from sagemaker.parameter import ContinuousParameter, CategoricalParameter, IntegerParameter
from sagemaker.tuner import HyperparameterTuner
from sagemaker.debugger import TensorBoardOutputConfig



# Data processing and ML imports
import pandas as pd
import requests

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Custom modules (assuming these exist in your environment)
import importlib.util
spec = importlib.util.spec_from_file_location("iam_role_helper", "iam_role_helper.py")
iam_role_manager = importlib.util.module_from_spec(spec)
sys.modules["iam_role_manager"] = iam_role_manager
spec.loader.exec_module(iam_role_manager)

spec = importlib.util.spec_from_file_location("utils", "utils.py")
utils = importlib.util.module_from_spec(spec)
sys.modules["utils"] = utils
spec.loader.exec_module(utils)

# Import custom functions
from utils import (
    download_artifacts, 
    remove_field_from_json, 
    upload_artifacts, 
    cleanup_local_files, 
    wait_for_model_availability, 
    test_image_processing
)
from iam_role_helper import create_or_update_role


spec = importlib.util.spec_from_file_location("evaluations_help_functions", "evaluation_help_functions.py")
evaluations_help_functions = importlib.util.module_from_spec(spec)
sys.modules["evaluations_help_functions"] = evaluations_help_functions
spec.loader.exec_module(evaluations_help_functions)

# Import custom functions
from evaluations_help_functions import (
    run_model_comparison,
    setup_logging,
    analyze_errors,
    create_dual_radar_plots,
    prepare_and_upload_evaluation_files,
    create_llm_judge_evaluation,
    get_evaluation_files,
    construct_evaluation_key,
    generate_model_comparison_report,
    analyze_and_plot_metrics,
    generate_model_comparison_report_knowledge
   
)

# Set default configurations
config = Config(
    retries={
        'total_max_attempts': 100,  # More reasonable number than 100
        'max_attempts': 3,         # Maximum retry attempts
        'mode': 'adaptive',        # Uses adaptive retry mode with client-side throttling
    },
    connect_timeout=5,    # Reduce connection timeout from default 60s
    read_timeout=30,      # Reduce read timeout from default 60s
    max_pool_connections=50,  # Increase from default 10
    tcp_keepalive=True    # Enable TCP keepalive
)

# Initialize key AWS clients
sess = sagemaker.Session()
sagemaker_client = boto3.client('sagemaker',region_name='us-west-2')
bedrock_client = boto3.client('bedrock', region_name='us-west-2',
    config=config)
brt = boto3.client(service_name='bedrock-runtime',region_name='us-west-2',
    config=config)
s3_client = boto3.client('s3')
iam_client = boto3.client('iam')

In [None]:
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    #change the name of the role if you are running locally
    role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-name')['Role']['Arn']

sess = sagemaker.Session(default_bucket=bucket)
region=sess.boto_region_name

prefix = "llama-qa-distillation"
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {bucket}")
print(f"sagemaker session region: {sess.boto_region_name}")

### Role Configuration
Configures IAM roles with required permissions for:
- Amazon Bedrock model access
- S3 bucket operations for model artifacts
- CloudWatch logging capabilities
- Cross-service permissions for SageMaker

Key components:
1. Trust relationships for service principals
2. Permission policies for resource access
3. Cross-account access configurations
4. Logging and monitoring permissions

Ensure that your SageMaker execution role's trust policy (Trusted Entities) allows Bedrock to assume the role. This is required so that Bedrock can submit and manage batch inference jobs on your behalf.

To manually edit the trust policy, navigate to the SageMaker execution role you are using. Go to the Trust relationships tab and click Edit trust policy. Allow the bedrock.amazonaws.com service to assume the role.

In [None]:
# Edit the SageMaker excution role Trusted relationships

# 1. Get the SageMaker execution role ARN
try:
    import sagemaker
    execution_role_arn = sagemaker.get_execution_role()
except Exception:
    # Fallback: list roles to find one with 'AmazonSageMaker-ExecutionRole' in the name
    iam = boto3.client('iam')
    roles = iam.list_roles()['Roles']
    execution_role_arn = next(
        (role['Arn'] for role in roles if 'AmazonSageMaker-ExecutionRole' in role['RoleName']),
        None
    )
    if not execution_role_arn:
        raise Exception("Could not find a SageMaker execution role.")

# 2. Extract the role name from the ARN
role_name = execution_role_arn.split('/')[-1]

# 3. Define the new trust policy
trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "bedrock.amazonaws.com",
                    "sagemaker.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

# 4. Update the trust policy for the SageMaker execution role
iam = boto3.client('iam')
try:
    iam.update_assume_role_policy(
        RoleName=role_name,
        PolicyDocument=json.dumps(trust_policy)
    )
    print(f"Updated trust policy for SageMaker execution role: {role_name}")
except Exception as e:
    print(f"Error updating trust policy: {e}")

In [None]:
# IAM Role Configuration for Amazon Bedrock Custom Model Import

# 1. Setup Basic Variables
account_id = boto3.client('sts').get_caller_identity()['Account']  # Get current AWS account ID
region = "us-west-2"  # Note: Custom Model Import (CMI) only works in us-west-2 and us-east-1
training_bucket = sagemaker_session_bucket  # S3 bucket where training artifacts are stored
role_name = "Sagemaker_Bedrock_import_role"  # Name for the IAM role we'll create

# 2. Define Trust Relationship Policy
# This policy defines which AWS services can assume this role
trust_relationship = {
    "Version": "2012-10-17",
    "Statement": [
        # Allow Bedrock service to assume this role
        {
            "Effect": "Allow",
            "Principal": {"Service": "bedrock.amazonaws.com"},
            "Action": "sts:AssumeRole",
            "Condition": {
                # Ensure requests only come from our account
                "StringEquals": {"aws:SourceAccount": account_id},
                # Limit to specific Bedrock model import jobs
                "ArnEquals": {"aws:SourceArn": f"arn:aws:bedrock:{region}:{account_id}:model-import-job/*"}
            }
        },
        # Allow Lambda service to assume this role (if needed for auxiliary functions)
        {
            "Effect": "Allow",
            "Principal": {"Service": "lambda.amazonaws.com"},
            "Action": "sts:AssumeRole"
        }
    ]
}

# 3. Define Permission Policy
# This policy defines what AWS resources the role can access
permission_policy = {
    "Version": "2012-10-17",
    "Statement": [
        # Allow S3 access for model artifacts
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:ListBucket"],  # Read-only access to S3
            "Resource": [
                f"arn:aws:s3:::{training_bucket}",  # Access to bucket
                f"arn:aws:s3:::{training_bucket}/*"  # Access to objects in bucket
            ],
            "Condition": {"StringEquals": {"aws:ResourceAccount": account_id}}  # Restrict to our account
        },
        # Allow CloudWatch Logs access for monitoring
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"  # Access to CloudWatch Logs
        }
    ]
}

# 4. Create or Update the IAM Role
bedrock_role_arn = create_or_update_role(
    role_name=role_name,
    trust_relationship=trust_relationship,
    permission_policy=permission_policy
)

print(f"Role ARN: {bedrock_role_arn}")

## 3. Teacher Model Selection in Amazon Bedrock

### Model Selection Criteria
When choosing a foundation model in Amazon Bedrock for knowledge distillation, several key factors should be considered:

#### 1. Model Architecture and Size
The Meta Llama 3.1 405B model offers several advantages as a teacher model:
- Larger parameter count provides richer knowledge representation
- Enhanced ability to capture complex patterns and relationships
- Superior performance on specialized tasks like medical QA
- Better few-shot learning capabilities
#### 2. Cost-Performance Trade-offs
Amazon Bedrock's pay-per-use pricing model enables:
- No upfront infrastructure costs
- Payment only for actual inference time
- Flexible scaling based on demand
- Cost optimization through batch processing

Reference: [Amazon Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)

#### 3. Specialized Knowledge Transfer
The 405B model is particularly suitable for knowledge distillation because:
- Higher accuracy on complex medical terminology
- Better understanding of scientific context
- More nuanced response generation
- Improved zero-shot performance on domain-specific tasks
#### 4. Operational Considerations
Benefits of using Bedrock for the teacher model:
- Serverless architecture eliminates infrastructure management
- Built-in auto-scaling
- High availability across AWS regions
- Simplified API integration
### Model Configuration
The Llama 3.1 405B model in Bedrock can be configured with:
- Temperature settings for response diversity
- Maximum token length for comprehensive answers
- Top-p and top-k sampling parameters
- Custom prompt templates for specialized tasks

Reference: [Amazon Bedrock Llama Model Configuration](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html)

### Integration with Knowledge Distillation
The workflow leverages Bedrock's advantages:
1. Generate high-quality training data through batch inference
2. Create specialized QA pairs for student model training
3. Maintain quality while reducing computational requirements
4. Enable seamless deployment through Custom Model Import

Reference: 
- [Amazon Bedrock Custom Model Import](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html)
- [Amazon Bedrock Batch Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference.html)

### Best Practices
When using the teacher model:
1. Implement proper error handling and retry mechanisms
2. Use batch processing for dataset generation
3. Monitor usage and costs through AWS CloudWatch
4. Implement appropriate security controls and encryption

For more information on model selection and configuration, see:
- [Choose the best foundational model for your AI applications](https://community.aws/content/2fKJW0z9PEIKec94DZwtYigCF7i/choose-the-best-foundational-model-for-your-ai-applications?lang=en)
- [Llama Technical Documentation](https://www.llama.com/docs/overview/)
- [Amazon Bedrock Developer Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html)

### Bedrock client setup

There are multiple models available on Bedrock depending the region. In our case we would focus on llama 3.1 405b instruct that is available in us-west-2.

In [None]:
import boto3
bedrock_client = boto3.client('bedrock', region_name="us-west-2")
model_id='meta.llama3-1-405b-instruct-v1:0'

### Testing Inference with Bedrock Runtime

This section demonstrates how to perform inference using the Bedrock Runtime client with the Llama model.

**Note**: The Bedrock runtime client is specifically for model inference, separate from the main Bedrock client used for model management.


Inference Helper Function.

This function handles the core interaction with the Bedrock Runtime API, including error handling and response formatting.

In [None]:

def invoke_model(body, model_id, accept, content_type):
    try:
        response = brt.invoke_model(
            body=json.dumps(body), 
            modelId=model_id, 
            
            accept=accept, 
            contentType=content_type
        )

        return response

    except Exception as e:
        print(f"Couldn't invoke {model_id}")
        raise e

Query Setup and Model Parameters.

Key Parameters:

- temperature: Lower values make output more focused and deterministic
- top_p: Controls diversity of token selection
- max_gen_len: Limits response length

In [None]:
# If you'd like to try your own prompt, edit this parameter!

question = """Is a mandatory general surgery rotation necessary in the surgical clerkship?"""
user_message = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n {question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

body = {
    "prompt": user_message,
    "temperature": 0.5,
    "top_p": 0.9,
    "max_gen_len": 512,
}


Model Configuration and Invocation
- Uses Llama 3.1 405B parameter model
- Expects and returns JSON formatted data
- Response includes generated text in the "generation" field

In [None]:
modelId = "meta.llama3-1-405b-instruct-v1:0"
accept = "application/json"
contentType = "application/json"

response = invoke_model(body, modelId, accept, contentType)
response_body = json.loads(response.get("body").read())

print(response_body["generation"])

## 4. Dataset Generation

This section explains how to prepare and process the PubMedQA dataset for knowledge distillation using AWS services.

### Overview
The PubMedQA dataset is a large-scale question-answering dataset focused on biomedical research literature. We'll use Amazon S3 for storage and SageMaker Processing Jobs for data preparation.



### Dataset Details
**PubMedQA Dataset**
- Source: [PubMedQA GitHub Repository](https://github.com/pubmedqa/pubmedqa/tree/master)
- Citation: 
>
> Jin, Q., Dhingra, B., Liu, Z., Cohen, W., & Lu, X. (2019). In *Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)*, pp. 2567-2577.
- Format: JSON

### Implementation Steps

#### 1. Dataset Download and Validation
This module handles downloading and processing PubMedQA dataset from GitHub for use with 
Amazon SageMaker and Amazon Bedrock knowledge distillation workflow.

In [None]:
def get_github_json(url):
    try:
        # Convert regular GitHub URL to raw content URL
        raw_url = url.replace("github.com", "raw.githubusercontent.com").replace("/blob/", "/")
        return requests.get(raw_url).json()
    except Exception as e:
        print(f"Error: {e}")
        return None

# Example usage:
url = "https://github.com/pubmedqa/pubmedqa/blob/master/data/ori_pqal.json"
data = get_github_json(url)

#### 2. Data Processing and JSONL Conversion
This section demonstrates how to process the PubMedQA dataset in jsonl format.

In [None]:
dataset=[]
qa_index=list(data.keys())
for i in qa_index:
    keys_to_get = ['QUESTION', 'CONTEXTS','LONG_ANSWER']
    result = {k: data[i].get(k) for k in keys_to_get}
    dataset.append(result)

In [None]:
output_file_dataset='dataset.jsonl'
with open(output_file_dataset, 'w') as outfile:
    for sample in dataset:
        # Create the complete record for batch inference
        batch_record = {
            "question": sample['QUESTION'],
            "answers": sample['LONG_ANSWER']
        }
        
        outfile.write(json.dumps(batch_record) + '\n')

### Using Teacher Model for QA Generation

Explains the process of:

- Generating synthetic QA pairs
- Batch processing with Bedrock
- Data augmentation strategies
- Quality control measures

#### Batch Processing vs Real-Time Inference

Based on performance testing and cost analysis, Amazon Bedrock's batch processing capabilities offer significant advantages over real-time inference:

1. **Performance Benefits**
   - Higher throughput for large-scale processing
   - Reduced risk of API throttling
   - More efficient resource utilization

2. **Cost Optimization**
   - Lower per-request costs compared to real-time inference
   - Better resource allocation and scheduling
   - Reduced overhead from connection management

3. **Operational Advantages**
   - Built-in retry mechanisms
   - Simplified monitoring and logging
   - Better handling of large datasets

For this implementation, we leverage Bedrock's batch processing to optimize both performance and cost efficiency while maintaining processing quality.

### Preparing Dataset for Bedrock Batch Processing

This code creates a JSONL file formatted specifically for Amazon Bedrock batch inference:

- **Purpose**: Converts QA dataset into Bedrock's required batch processing format
- **Key Operations**:
  - Formats prompts using Llama 3's instruction template
  - Assigns unique IDs to each record
  - Sets inference parameters (temperature, max length, etc.)
  - Creates JSONL output with required Bedrock structure

The resulting file enables efficient batch processing of multiple questions through Bedrock's batch inference API, optimizing for throughput and cost efficiency.

> **Note**: The template uses Llama 3's specific tokens (`<|begin_of_text|>`, `<|eot_id|>`) for proper model instruction formatting.

In [None]:

def create_bedrock_batch_dataset(dataset, output_file='bedrock_batch_dataset.jsonl'):
    # Simplified prompt template for Llama 3 instruction format
    prompt_template = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>
{question}
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>"""
    
    with open(output_file, 'w') as outfile:
        for sample in dataset:
            # Generate a unique record ID (11 characters)
            record_id = str(uuid.uuid4())[:11]
            
            # Format the prompt
            formatted_prompt = prompt_template.format(
                question=sample["QUESTION"]
            )

            # Create the model input body for Llama 3
            body = {
                "prompt": formatted_prompt,
                "max_gen_len": 1024,
                "temperature": 0.0,
                "top_p": 0.9
            }

            # Create the complete record for batch inference
            batch_record = {
                "recordId": record_id,
                "modelInput": body
            }
            
            outfile.write(json.dumps(batch_record) + '\n')

# Usage
create_bedrock_batch_dataset(dataset)

### Uploading Batch Dataset to Amazon S3

This code handles the upload of the prepared batch dataset to Amazon S3, a necessary step before running Bedrock batch inference:

- **Purpose**: Transfers the local JSONL file to S3 for Bedrock access
- **Components**:
  - Uses SageMaker's `S3Uploader` utility for simplified file transfer
  - Organizes files under a structured prefix (`distillation/batch/data`)
  - Automatically handles S3 path formatting and permissions

> **Note**: The S3 location will be referenced in subsequent Bedrock batch inference job configurations. Ensure the Bedrock role has appropriate S3 read permissions.


In [None]:
# Define source and destination paths
local_path_batch_file = 'bedrock_batch_dataset.jsonl'
s3_prefix_batch = 'distillation/batch/data'  # This will be the folder in S3

# Upload the file
s3_path_batch = S3Uploader.upload(
    local_path=local_path_batch_file,
    desired_s3_uri=f's3://{bucket}/{s3_prefix_batch}',
)

print(f"File uploaded successfully to: {s3_path_batch}")

### Bedrock Batch Inference Configuration

This section configures and launches a batch inference job using Amazon Bedrock for large-scale QA processing:

#### Configuration Components
- **Input Configuration**: Points to the JSONL dataset in S3
- **Output Configuration**: Specifies where Bedrock will store inference results
- **Job Settings**: 
  - Unique job name using timestamp
  - Model ARN for Llama 3.1 405B
  - IAM role for execution permissions

In [None]:
output_prefix="output"
inputDataConfig=({
    "s3InputDataConfig": {
        "s3Uri": s3_path_batch
    }
})

outputDataConfig=({
    "s3OutputDataConfig": {
        "s3Uri": f"s3://{bucket}/{s3_prefix_batch}/{output_prefix}/"
    }
})

Launch batch job

In [None]:

from datetime import datetime  # This is the correct import
jobName = 'batch-job-ga' + str(int(datetime.now().timestamp()))
response=bedrock_client.create_model_invocation_job(
    roleArn=role,
    modelId='arn:aws:bedrock:us-west-2::foundation-model/meta.llama3-1-405b-instruct-v1:0',
    #modelId='meta.llama3-1-405b-instruct-v1:0',
    
    jobName=jobName,
    inputDataConfig=inputDataConfig,
    outputDataConfig=outputDataConfig
)

For more information, see Amazon Bedrock Batch Inference documentation.
Reference: [Amazon Bedrock Batch Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference.html)

### Monitoring Bedrock Batch Job Status

This code implements a job status monitoring loop for the Bedrock batch inference:

- **Purpose**: Tracks batch job progress until completion or failure
- **Key Operations**:
  - Extracts job ARN and ID for tracking
  - Polls job status every 5 minutes
  - Provides real-time status updates
  - Handles completion and failure scenarios

> **Note**: Consider implementing this monitoring pattern in AWS Lambda or Step Functions for production workloads.

Consider using the sample Bedrock batch job results if you do not want to wait for your own job to finish.

In [None]:
import time
jobArn = response.get('jobArn')
job_id = jobArn.split('/')[1]

print(jobArn)

status = ''
while status not in ['Completed', 'Failed']:
    job_response = bedrock_client.get_model_invocation_job(jobIdentifier=jobArn)
    status = job_response['status']
    if status == 'Failed':
        print(job_response)
    elif status == 'Completed':
        print(datetime.now(), ": ", status)
        break
    else: 
        print(datetime.now(), ": ", status)
        time.sleep(300)

### Processing Bedrock Batch Results for Training

This section handles the retrieval and processing of batch inference results from S3 for model training:
#### Data Flow
1. **Retrieval**: Fetches batch results from S3
2. **Processing**: Extracts model generations from JSON responses
3. **Formatting**: Prepares data for JumpStart/Bedrock training format

Retrieve batch results from S3

In [None]:
# Retrieve batch results from S3
#job_id='wyg9q4pvli86'#Use the current bedrock job is if you are continuing from another step

s3 = boto3.client('s3')
prefix = f"{s3_prefix_batch}/{output_prefix}/{job_id}/"
print(f"prefix: {bucket}/{prefix}")
object_key = f"{prefix}{local_path_batch_file}.out"
response = s3.get_object(Bucket=bucket, Key=object_key)

In [None]:
# Process and extract teacher model responses
json_data = response['Body'].read().decode('utf-8')
teacher_answer=[]
for line in json_data.splitlines():
        data = json.loads(line)
        print(data['modelOutput']['generation'])
        teacher_answer.append(data['modelOutput']['generation'])

This code combines the original dataset with teacher model responses:

In [None]:
for data_item, teacher in zip(dataset, teacher_answer):
    data_item['TEACHER_ANSWER'] = teacher

> **Note**: This paired dataset forms the foundation for training the student model to mimic the teacher's behavior.

### Preparing Training Data for SageMaker JumpStart

This section formats the QA dataset for fine-tuning using SageMaker JumpStart's specific requirements:

#### Data Formatting Process
1. **Template Creation**
   - Defines Llama 3's instruction format
   - Includes system message and conversation structure
   - Maintains special tokens for model context

In [None]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": " {response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

2. **Dataset Transformation**
   - Converts QA pairs to instruction format
   - Structures teacher responses as completions
   - Creates JSONL format required by JumpStart

In [None]:
import json

def create_jumpstart_dataset(dataset, output_file='train.jsonl', template_file='template.json'):
    # Create the template file required by JumpStart for Q&A format
    template = {
        "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
        "completion": "{response}"
    }
    
    # Save the template file
    with open(template_file, 'w') as f:
        json.dump(template, f)

    # Process the dataset and create the training file
    with open(output_file, 'w') as outfile:
        for sample in dataset:
            # Format the data in the same structure as the synthetic data
            training_entry = {
                "instruction": sample["QUESTION"],
                "response": sample["LONG_ANSWER"].strip()
            }
            
            outfile.write(json.dumps(training_entry) + '\n')
            
def verify_jsonl(filename):
    with open(filename, 'r', encoding='utf-8') as f:
        for i, line in enumerate(f):
            try:
                data = json.loads(line)
                if i == 0:  # Print first example
                    print("Sample entry:")
                    print(json.dumps(data, indent=2))
                break
            except json.JSONDecodeError as e:
                print(f"Error in line {i+1}: {e}")

In [None]:
# Create the dataset files for JumpStart fine-tuning
# print(dataset)

create_jumpstart_dataset(dataset)
verify_jsonl('train.jsonl')

#### Data Format
- **Input**: Question-answer pairs with teacher model responses
- **Output**: JSONL file containing:
  - Instruction prompt with special tokens (`<|begin_of_text|>`)
  - Question text
  - Teacher model response
  - End of text markers (`<|eot_id|>`)

> **Important**: Follows [JumpStart Data Format Guidelines](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-fine-tuning-instruction-based.html).

#### Validation Process
The `verify_jsonl()` function checks:
- JSONL format validity
- Special token placement
- Complete instruction/response pairs

Example format:
```json
{
  "instruction": "What is the role of antibiotics in treating viral infections?",
  "response": "Antibiotics are not effective against viral infections..."
}

## 5. Student Model Configuration (LLAMA 3.2 1B)

This section covers the setup and configuration of the student model using Amazon SageMaker JumpStart:

### Model Selection Criteria
- Base model: LLAMA 3.2 1B
- Optimized for knowledge distillation
- Suitable for QA tasks
- Efficient inference characteristics

### Training Data Upload

The following code uploads the prepared training dataset and template to Amazon S3:

In [None]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

# Configure S3 paths with SageMaker defaults
default_bucket_prefix = sagemaker.Session().default_bucket_prefix
default_bucket_prefix_path = ""

# If a default bucket prefix is specified, append it to the s3 path
if default_bucket_prefix:
    default_bucket_prefix_path = f"/{default_bucket_prefix}"

# Upload training files to S3
local_data_file = "train.jsonl"
template_file="template.json"
train_data_location = f"s3://{bucket}{default_bucket_prefix_path}/oasst_top1"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload(template_file,train_data_location)
print(f"Training data: {train_data_location}")
print(f"template saved on:{train_data_location}")

### Student Model Selection in SageMaker JumpStart

This section implements an interactive model selection interface and configures training metrics:

#### Model Selection Process
- **Purpose**: Enables selection of appropriate student model from JumpStart's catalog
- **Focus**: Text generation models suitable for knowledge distillation
- **Default**: LLAMA 3.2 1B instruct model

In [None]:

# Create interactive model selector
try:
    dropdown = Dropdown(
        options=list_jumpstart_models("search_keywords includes Text Generation"),
        value="meta-textgeneration-llama-3-2-1b-instruct",
        description="Select a JumpStart text generation model:",
        style={"description_width": "initial"},
        layout={"width": "max-content"},
    )
    display(dropdown)
except:
    dropdown = None
    pass

In [None]:
if dropdown:
    student_model_id = dropdown.value
else:
    # Provide model id as meta-textgeneration-llama-3-1-405b-instruct-fp8 for the instruct variant
    model_id = "meta-textgeneration-llama-3-2-1b-instruct"
model_version_student = "*"

#### Metric Setup
- **Purpose**: Establishes standardized metrics for training evaluation
- **Implementation**: Leverages SageMaker's built-in metric definitions
- **Scope**: Covers training, validation, and system metrics

In [None]:
from sagemaker import metric_definitions
print(metric_definitions.retrieve_default(model_id="meta-textgeneration-llama-3-2-1b-instruct", model_version='1.1.1',))

In [None]:
metric_definitions.retrieve_default(model_id="meta-textgeneration-llama-3-2-1b-instruct", model_version='1.1.1',)

### Training Job Hyperparameter Configuration

This section retrieves and configures the default hyperparameters for the student model training:

#### Hyperparameter Setup
- **Purpose**: Initializes model training configuration
- **Source**: Uses JumpStart's optimized defaults
- **Scope**: Includes learning rates, batch sizes, and model-specific parameters

In [None]:
my_hyperparameters_student = hyperparameters.retrieve_default(
    model_id=student_model_id, model_version=model_version_student,
)

print(my_hyperparameters_student)

### Hyperparameter Customization
This section modifies default hyperparameters for knowledge distillation training:

#### Parameter Adjustments
- **Purpose**: Customizes training configuration for instruction-based learning
- **Key Modifications**:
  - Sets single epoch for initial testing
  - Configures for instruction tuning
  - Establishes fixed random seed for reproducibility
  - Defines maximum input length constraints

In [None]:
my_hyperparameters_student["epoch"] = "1"
my_hyperparameters_student['chat_dataset']="False"
my_hyperparameters_student['instruction_tuned']="True"
my_hyperparameters_student['seed']="10"# this could help us to have the same results
my_hyperparameters_student['max_input_length']="1024"# this could help us to have the same results


hyperparameters.validate(
    model_id=student_model_id, model_version=model_version_student, hyperparameters=my_hyperparameters_student
)

In [None]:
pprint.pprint(my_hyperparameters_student)

### Hyperparameter Tuning Configuration

This section configures and executes automated hyperparameter optimization using SageMaker's Hyperparameter Tuning Jobs:

#### Parameter Search Space Configuration
- **Purpose**: Defines ranges for key training parameters
- **Implementation**: Uses SageMaker's parameter types for optimization
- **Scope**: Covers learning dynamics and LoRA-specific parameters

In [None]:
# Define hyperparameter ranges without as_json_range
hyperparameter_ranges = {
    'learning_rate': ContinuousParameter(0.00001, 0.0005, scaling_type="Logarithmic"),
    'lora_r': CategoricalParameter(['4', '8', '12', '16']),
    'lora_alpha': CategoricalParameter(['16', '32', '48', '64']),
    'lora_dropout': ContinuousParameter(0.01, 0.2),
    'per_device_train_batch_size': CategoricalParameter(['2', '4', '6', '8','16']),
    'gradient_accumulation_steps': CategoricalParameter(['1', '2', '3', '4']),
    'max_steps': CategoricalParameter(['50', '75', '100']),
    'warmup_steps': CategoricalParameter(['5', '7', '10']),
    'num_train_epochs': CategoricalParameter(['1', '2'])

}


In [None]:

metric_defs=metric_definitions.retrieve_default(model_id="meta-textgeneration-llama-3-2-1b", model_version='1.1.1',)
print(metric_defs)


#### Enhanced Metric Tracking
- **Purpose**: Monitors both training and resource utilization metrics
- **Implementation**: Combines default and custom GPU memory metrics
- **Scope**: Enables comprehensive performance monitoring


In [None]:
memory_metrics = [
    {'Name': 'gpu:memory_allocated', 'Regex': 'Max CUDA memory allocated was ([0-9\\.]+) GB'},
    {'Name': 'gpu:memory_reserved', 'Regex': 'Max CUDA memory reserved was ([0-9\\.]+) GB'},
    {'Name': 'gpu:peak_active_memory', 'Regex': 'Peak active CUDA memory was ([0-9\\.]+) GB'},
    {'Name': 'train:loss', 'Regex': 'train_loss = ([0-9\\.]+)'}
]

In [None]:
combined_metrics = metric_defs + memory_metrics

### Tuning Job Configuration

This section configures the hyperparameter optimization job using SageMaker's tuning capabilities:

#### Configuration Components
- **Purpose**: Automates hyperparameter optimization for model training
- **Strategy**: Uses Bayesian optimization for efficient parameter search
- **Scale**: Manages multiple training jobs in parallel

In [None]:

# Create the estimator
estimator = JumpStartEstimator(
    model_id=student_model_id,
    model_version=model_version_student,
    hyperparameters=my_hyperparameters_student,
    role=role,
    disable_output_compression=True,
    # instance_type='ml.g5.xlarge',
    instance_type='ml.g5.2xlarge',
    environment={"accept_eula": "true"},
    metric_definitions=combined_metrics,  # Add metric definitions here
    enable_sagemaker_metrics=True  # Enable SageMaker metrics,
)

In [None]:
# Create the hyperparameter tuner
tuner = HyperparameterTuner(
    estimator,
    objective_metric_name='huggingface-textgeneration:train-loss',
    metric_definitions=combined_metrics,
    objective_type='Minimize',
    max_jobs=20,
    max_parallel_jobs=1,#Adjust depending the available instances
    hyperparameter_ranges=hyperparameter_ranges,
    strategy='Bayesian',
    base_tuning_job_name='llm-llama-3-2-1b',
)


In [None]:
#Start the hyperparameter tuning job
tuner.fit({"training": train_data_location}, wait=True)
# First, wait for the tuning job to complete
tuner.wait()



> **Best Practices for Hyperparameter Tuning**
>
> 1. **Resource Management**
>    - Set `max_parallel_jobs` based on quota limits
>    - Choose appropriate instance types (`ml.g5.2xlarge`)
>    - Monitor GPU memory utilization
>    - Consider cost optimization with spot instances
>
> 2. **Job Configuration**
>    - Use descriptive `base_tuning_job_name`
>    - Enable SageMaker metrics for monitoring
>    - Set appropriate stopping conditions
>    - Configure proper objective metrics
>
> 3. **Optimization Strategy**
>    - Start with Bayesian optimization
>    - Define meaningful parameter ranges
>    - Balance exploration vs exploitation
>    - Monitor convergence patterns
>
> See [Hyperparameter Tuning Best Practices](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-considerations.html)

### Retrieving Best Training Results

This section explains how to access and analyze the best performing model from the hyperparameter tuning job:
#### Accessing Best Model
- **Purpose**: Retrieves optimal hyperparameters and model artifacts
- **Implementation**: Uses SageMaker's tuning job APIs
- **Output**: Best performing model configuration and metrics

#### Process Overview
1. Get best training job name from tuner
2. Retrieve detailed job information using SageMaker client
3. Extract optimized hyperparameters
4. Access performance metrics

In [None]:
# Get the best training job
best_training_job = tuner.best_training_job()
print(f"Best training job: {best_training_job}")

In [None]:
# Create a SageMaker client
sagemaker_client = boto3.client('sagemaker')

# Get the best hyperparameters using the SageMaker client
best_hyperparameters_student_1 = sagemaker_client.describe_training_job(TrainingJobName=best_training_job)['HyperParameters']
print("Best hyperparameters: \n")
pprint.pprint(best_hyperparameters_student_1)


In [None]:
# Get the best training job
best_training_job_1 = tuner.best_training_job()
print(f"Best training job: {best_training_job}")


# Get the best hyperparameters using the SageMaker client
best_hyperparameters_student_1 = sagemaker_client.describe_training_job(TrainingJobName=best_training_job_1)['HyperParameters']
print(f"Best hyperparameters: {best_hyperparameters_student_1}")

In [None]:
pprint.pprint(best_hyperparameters_student_1)

> **Best Practices**:
> 1. **Result Analysis**
>    - Review convergence patterns
>    - Compare against baseline metrics
>    - Document optimal parameters
>
> 2. **Model Management**
>    - Save best configuration
>    - Track experiment metadata
>    - Document performance characteristics
>
> For more information, see [Analyzing Hyperparameter Tuning Results](https://sagemaker-examples.readthedocs.io/en/latest/hyperparameter_tuning/analyze_results/HPO_Analyze_TuningJob_Results.html)

### Training with Optimized Hyperparameters

This section configures and launches a training job using the best hyperparameters from tuning:

#### Configuration Components
1. **Hyperparameter Setup**
   - Uses optimized parameters from tuning
   - Extends training epochs for full model convergence
   - Configures training environment

In [None]:
pprint.pprint(best_hyperparameters_student_1)
best_hyperparameters_student_1['num_train_epochs']=10
best_hyperparameters_student_1['epoch']=10

2. **TensorBoard Integration**
   - **Purpose**: Enables real-time training visualization
   - **Storage**: Configures S3 location for logs
   - **Access**: Enables Sagemaker Studio Tensorboard integration
   > For more information, see [TensorBoard in Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/tensorboard-on-sagemaker.html)

In [None]:
# Create proper TensorBoard output configuration
tensorboard_output_config = TensorBoardOutputConfig(
    s3_output_path=f's3://{bucket}/tensorboard-logs/llama3-model-distillation',
    container_local_output_path='/opt/ml/output/tensorboard'
)


3. **Metric Tracking Configuration**
   - **Training Metrics**: Loss, perplexity, epoch statistics
   - **Resource Metrics**: GPU/CPU memory utilization
   - **Performance Metrics**: Throughput and timing data

In [None]:
from sagemaker.jumpstart.estimator import JumpStartEstimator

student_model_id = "meta-textgeneration-llama-3-2-1b"
model_version_student = "*"

estimator_student = JumpStartEstimator(
    model_id=student_model_id,
    model_version=model_version_student,
    hyperparameters=best_hyperparameters_student_1,
    role=role,
    disable_output_compression=True,
    enable_sagemaker_metrics=True,
    environment={
        "accept_eula": "true",
        "TENSORBOARD_LOGGING": "true",
    },  # please change `accept_eula` to be `true` to accept EULA.
    tensorboard_output_config=tensorboard_output_config  # Use the proper config object
)
# Define metrics to track
metric_definitions = [
    # Training Metrics
    {'Name': 'train:loss', 'Regex': 'step .* is completed and loss is ([0-9\\.]+)'},
    {'Name': 'train:perplexity', 'Regex': 'train_perplexity=([0-9\\.]+)'},
    {'Name': 'train:epoch_loss', 'Regex': 'train_epoch_loss=([0-9\\.]+)'},
    
    # Evaluation Metrics
    {'Name': 'eval:loss', 'Regex': 'eval_epoch_loss=tensor\\(([0-9\\.]+)'},
    {'Name': 'eval:perplexity', 'Regex': 'eval_ppl=tensor\\(([0-9\\.]+)'},
    
    # Performance Metrics
    {'Name': 'epoch_time', 'Regex': 'epcoh time ([0-9\\.]+)'},
    {'Name': 'training_throughput', 'Regex': '([0-9\\.]+)it/s'},
    
    # Memory Usage
    {'Name': 'gpu:memory_allocated', 'Regex': 'Max CUDA memory allocated was ([0-9\\.]+) GB'},
    {'Name': 'gpu:memory_reserved', 'Regex': 'Max CUDA memory reserved was ([0-9\\.]+) GB'},
    {'Name': 'gpu:peak_active_memory', 'Regex': 'Peak active CUDA memory was ([0-9\\.]+) GB'},
    {'Name': 'cpu:peak_memory', 'Regex': 'CPU Total Peak Memory consumed during the train \\(max\\): ([0-9\\.]+) GB'}
]
# Add metrics to estimator
estimator_student.metric_definitions = metric_definitions
# Launch TensorBoard in SageMaker Studio
tensorboard_callback = {
    'Config': {
        'TrainingJobName': 'llama-3-2-1b-model-distilation'
    }
}


4. **Training Launch**
   - **Implementation**: Uses JumpStart estimator
   - **Monitoring**: Enables comprehensive logging
   - **Visualization**: Integrates with TensorBoard

In [None]:
estimator_student.fit({"training": train_data_location},
    wait=True,
    logs="All")

> **Best Practices**:
> 1. **Training Monitoring**
>    - Track all defined metrics
>    - Monitor resource utilization
>    - Review TensorBoard visualizations
>
> 2. **Resource Management**
>    - Configure appropriate instance types
>    - Monitor memory usage
>    - Track training progress
>
> 3. **Output Management**
>    - Organize TensorBoard logs
>    - Maintain training artifacts
>    - Document training results

For more information, see:
- [SageMaker Training Jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html)
- [TensorBoard Integration](https://docs.aws.amazon.com/sagemaker/latest/dg/tensorboard-on-sagemaker.html)

## 6. Model Evaluation and Deployment

This section covers the deployment and testing of the trained student model using Amazon Bedrock Custom Model Import (CMI):

### Custom Model Import Process
- **Purpose**: Deploys trained model to Bedrock for serverless inference
- **Implementation**: Automates model import and configuration
- **Benefits**: Enables seamless integration with AWS AI services

In [None]:
# Get the training job name and model URI

training_job_name = estimator_student._current_job_name
# training_job_name='meta-textgeneration-llama-3-2-1b-2025-04-29-18-27-52-998'

training_job_response=sagemaker_client.describe_training_job(
    TrainingJobName=training_job_name
)
model_uri_1=training_job_response['ModelArtifacts']['S3ModelArtifacts']
#model_uri_1 = estimator_student.model_data['S3DataSource']['S3Uri']



### Deployment Configuration
1. **Model Preparation**
   - Retrieves training artifacts
   - Validates model format
   - Configures deployment parameters
2. **Import Job Setup**
   - Creates unique identifiers
   - Sets up IAM permissions
   - Configures resource allocation

Ensure that your S3 bucket policy allows GetObject and ListObject for the Sagemaker_Bedrock_import_role.

In [None]:
REGION_NAME = 'us-west-2'
bedrock = boto3.client(service_name='bedrock',
                       region_name=REGION_NAME)
# Generate a unique job name
timestamp = int(time.time())
random_number = random.randint(1000, 9999)
JOB_NAME = f"meta3-import-model-{timestamp}-{random_number}"

ROLE_ARN = bedrock_role_arn
IMPORTED_MODEL_NAME = f"llama3_1_student_1_llama_1b_{timestamp}-{random_number}"
S3_URI = model_uri_1

# createModelImportJob API
create_job_response = bedrock.create_model_import_job(
    jobName=JOB_NAME,
    importedModelName=IMPORTED_MODEL_NAME,
    roleArn=ROLE_ARN,
    modelDataSource={
        "s3DataSource": {
            "s3Uri": model_uri_1
        }
    },
)
job_arn = create_job_response.get("jobArn")
print(f"Model import job created with ARN: {job_arn}")

### Deployment Monitoring
- Tracks import job progress
- Validates model availability
- Monitors resource utilization
- Handles deployment errors

In [None]:
model_name_filter = IMPORTED_MODEL_NAME  # Replace with your model name
model_info = wait_for_model_availability(model_name_filter,max_attempts=30,delay=60)
#
if model_info:
    model_arn_1=model_info["modelArn"]
    print("Model is now available in Bedrock.")
else:
    print("Failed to find the model in Bedrock within the specified attempts.")

### Inference Configuration
- Sets up runtime client
- Configures retry policies
- Implements error handling
- Optimizes performance settings

In [None]:
from botocore.config import Config
import json

REGION_NAME = 'us-west-2'
MODEL_ID= model_arn_1
#MODEL_ID='arn:aws:bedrock:us-west-2:786045444066:imported-model/u1y9gohfgrm8'

config = Config(
    retries={
        'total_max_attempts': 100,  # More reasonable number than 100
        'max_attempts': 3,         # Maximum retry attempts
        'mode': 'adaptive',        # Uses adaptive retry mode with client-side throttling
    },
    connect_timeout=5,    # Reduce connection timeout from default 60s
    read_timeout=30,      # Reduce read timeout from default 60s
    max_pool_connections=50,  # Increase from default 10
    tcp_keepalive=True    # Enable TCP keepalive
)
message = "Hello, what it is the weather in seattle?"


session = boto3.session.Session()
br_runtime = session.client(service_name = 'bedrock-runtime', 
                                 region_name=REGION_NAME, 
                                 config=config)
    
try:
    invoke_response = brt.invoke_model(modelId=MODEL_ID, 
                                            body=json.dumps({'prompt': message}), 
                                            accept="application/json", 
                                            contentType="application/json")
    invoke_response["body"] = json.loads(invoke_response["body"].read().decode("utf-8"))
    print(json.dumps(invoke_response, indent=4))
except Exception as e:
    print(e)
    print(e.__repr__())

### Best Practices for Bedrock Model Deployment with Custom Model Import

> **1. Import Configuration**
> - Use descriptive, unique model names with timestamps
> - Configure appropriate IAM roles and permissions
> - Implement robust error handling mechanisms
> - Set appropriate timeout values
> - Validate model artifacts before import

> **2. Deployment Monitoring**
> - Track import job status regularly
> - Implement automated status checks
> - Set up CloudWatch alerts
> - Monitor resource utilization
> - Track deployment metrics

> **3. Testing Strategy**
> - Implement comprehensive test cases
> - Validate model responses
> - Monitor inference latency
> - Track error rates and types
> - Test with various input formats

### Key Benefits of Bedrock Deployment

#### Operational Benefits
- **Serverless Infrastructure**
  - No server management required
  - Automatic scaling capabilities
  - Pay-per-use pricing model

- **Management Simplification**
  - Automated deployments
  - Built-in monitoring
  - Simplified updates

#### Technical Benefits
- **Performance**
  - Optimized inference
  - Low-latency responses
  - Automatic resource scaling

- **Integration**
  - Seamless AWS service connectivity
  - Built-in security features
  - Standardized APIs

### Cost Benefits
- **Import Costs:**
  - No fees for importing custom model weights
  - No control plane action costs
  - Supported architectures include Meta Llama 2, Llama 3, Flan and Mistral

- **Operational Costs:**
  - Pay-per-use pricing based on Custom Model Units (CMUs)
  - Billed in 5-minute increments
  - Costs scale with number of active model copies
  - Data transfer costs apply for out-of-network traffic

- **Cost Management:**
  - Costs determined by concurrent model copies needed
  - Active duration of each model copy
  - Consider concurrency requirements for accurate cost estimation
#### Additional Resources
- [Bedrock Custom Model Import Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html)
- [Bedrock Custom Model Import Pricing Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/import-model-calculate-cost.html)
- [Model Monitoring Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/monitoring.html)

### Evaluation Environment Setup

#### Configuration Requirements
**Model Configuration**




In [None]:
#You can provide the model arn
#model_arn_1="model_arn"

In this step, we create a persistent configuration dictionary that stores model and artifact information across potential session breaks. This approach follows AWS notebook best practices for long-running workflows that might span multiple sessions.

#### Why This Matters
Amazon Bedrock evaluation jobs can take several hours to start and complete. By storing configuration in a reusable dictionary (and optionally persisting to disk), we can:

1. Resume workflow execution after session timeouts
2. Maintain consistent configuration references across multiple notebook sessions
3. Simplify troubleshooting by preserving job identifiers and artifact locations
4. Implement checkpoint recovery for multi-stage workflows

> **Best Practice:** For production implementations, consider using Step Functions to orchestrate these long-running processes or implementing checkpoint mechanisms that store state in Amazon S3.

This pattern is particularly useful during development and testing of complex Bedrock Custom Model Import workflows, where you might need to inspect intermediate results before proceeding to subsequent steps.

In [None]:
# Define model configurations
MODEL_CONFIGS = {
    'student': {
        'model_id': f'{model_arn_1}',#This needs to be updated with your CMI arfcn, this can be integrated on the other notebook
        'output_prefix': 'student_model',
        'model_name': 'llama3-2-1b-fine-tuned-pubmed'  # This will be used as modelIdentifier
    },
    'base': {
        'model_id': 'us.meta.llama3-2-1b-instruct-v1:0',
        'output_prefix': 'base_model',
        'model_name': 'meta-llama3-2-1b-base'  # This will be used as modelIdentifier
    }
}


 # Generate timestamp once and use it consistently
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

#### Performance Benchmarking with Ground Truth Validation

This section executes a comprehensive performance benchmark comparing the student and base models using standardized metrics such as latency, throughput, and reliability.

##### Performance Metrics Collection
The benchmarking process captures critical operational metrics including:
- Average and peak response latency
- Token processing rates (input and output TPM)
- Request success rates and error patterns
- Resource utilization patterns
##### Ground Truth Integration
We leverage the original training dataset questions and corresponding answers as ground truth for these measurements. This approach provides several benefits:

1. **Consistency**: Using the same dataset ensures fair comparison between models
2. **Reproducibility**: Enables repeatable benchmarking across model versions
3. **Preparation for Bedrock Evaluation**: The formatted output will serve as input for Amazon Bedrock's evaluation jobs in later steps

> **Best Practice:** When benchmarking models for production deployment, always measure both functional accuracy (correctness of answers) and non-functional characteristics (latency, throughput) using representative datasets that match expected production workloads.

The results from this benchmark will help quantify the performance improvements gained through knowledge distillation while preparing the necessary artifacts for more detailed qualitative evaluation.

In [None]:
# Setup logging
log_file = setup_logging(timestamp)
logger = logging.getLogger()

input_file = "dataset.jsonl"
sample_size = 100  # Start with a small number to verify everything works

logger.info(f"Starting model comparison with sample size: {sample_size}")
logger.info(f"Log file: {log_file}")

try:
    # Run comparison with sample size
    comparison_results = run_model_comparison(
        input_file=input_file, 
        model_configs=MODEL_CONFIGS, 
        bedrock_runtime=brt ,
        sample_size=sample_size,
        timestamp=timestamp
    )
    print(comparison_results)

    # Print results
    logger.info("\nComparison Results:")
    logger.info("=" * 50)
    for model_name, result in comparison_results.items():
        logger.info(f"\nModel: {model_name}")
        logger.info(f"Output file: {result['output_file']}")
        logger.info("\nMetrics:")
        logger.info(json.dumps(result['metrics'], indent=2))

    # Save comparison results
    comparison_file = f"model_comparison_{timestamp}.json"
    with open(comparison_file, 'w') as f:
        json.dump(comparison_results, f, indent=2)
    logger.info(f"\nDetailed comparison saved to: {comparison_file}")
    
    # Analyze errors
    logger.info("\nAnalyzing errors from log file...")
    error_counts, error_examples = analyze_errors(log_file)
    
except Exception as e:
    logger.error("Fatal error in main execution:", exc_info=True)
    raise

In [None]:
from IPython.display import Image

# Create visualizations
df, fig = create_dual_radar_plots(comparison_results)


# Display DataFrame
print("\nComparative Metrics Table:")
display(df)

# Save results
df.to_csv('model_metrics_comparison.csv')
fig.savefig('model_comparison_radar_plots.png', bbox_inches='tight', dpi=300)

#### Model Knowledge Evaluation (Optional)

This section demonstrates how to configure and execute qualitative knowledge evaluations using Amazon Bedrock's LLM-as-judge capability. This step is optional and can be skipped based on your situation.

> **Note:** If you prefer to skip the evaluation setup and proceed directly to examining results, you can [jump ahead to the Performance Comparison Report section](#model-performance-comparison-report-generation).

##### Amazon Bedrock Evaluation Setup

Amazon Bedrock evaluations provide automated, standardized assessments of model knowledge and capabilities across multiple dimensions including correctness, helpfulness, and coherence.

> **Important:** Bedrock evaluation jobs using the LLM-as-judge feature may experience extended queue times (potentially several hours). You can either continue to subsequent sections, or execute the job and periodically check its status using the provided monitoring code. This behavior is expected during periods of high service demand, see:
- [Evaluate model performance using another LLM as a judge](https://docs.aws.amazon.com/bedrock/latest/userguide/evaluation-judge.html)
##### S3 Bucket CORS Configuration Requirements

Bedrock evaluation jobs require Cross-Origin Resource Sharing (CORS) configuration on your S3 bucket. The following setup steps will:

1. Configure CORS policies to allow Bedrock services to access your evaluation files
2. Upload properly formatted evaluation artifacts to your S3 bucket
3. Prepare bucket locations for evaluation results

> **Security Note:** If your organization has restrictions on enabling CORS for security reasons, consider skipping this section and using the performance metrics from the previous evaluation step. The upcoming code will attempt to modify your S3 bucket's CORS configuration, which may violate organizational security policies.

##### Alternative Approaches

If you cannot proceed with Bedrock evaluations due to CORS restrictions or queue times, you can:

1. Continue to the next section where we examine evaluation results from pre-executed jobs
2. Use the built-in performance metrics captured in previous steps
3. Implement your own evaluation logic using the performance benchmark results

**Best Practice:** For production workloads, schedule Bedrock evaluation jobs during off-peak hours and implement asynchronous notification mechanisms (such as SNS topics) to alert you when evaluations complete.



##### **S3 Bucket Configuration and File Upload**
This code section configures your Amazon S3 bucket and uploads the evaluation files needed for Bedrock model assessment. The process follows AWS best practices for organizing evaluation artifacts.

##### Key Operations

The code performs several essential steps for Bedrock evaluation preparation:

1. **CORS Configuration**: Configures Cross-Origin Resource Sharing (CORS) policies on your S3 bucket, enabling Bedrock services to access your evaluation files securely
2. **Structured Uploading**: Organizes model comparison results into a standardized directory structure with timestamped prefixes for tracking and version control
3. **File Validation**: Verifies that uploaded artifacts meet Bedrock's format requirements and are accessible with appropriate permissions

> **Best Practice:** When preparing evaluation datasets for Bedrock, maintain consistent naming conventions and utilize timestamp-based versioning to track evolution of job status over time.


In [None]:
#Upload and verify results
# Prepare bucket and upload results
s3_locations = prepare_and_upload_evaluation_files(MODEL_CONFIGS, bucket,sess,timestamp)

# Print final locations
print("\nFinal S3 Locations:")
print("=" * 50)
for model_name, location in s3_locations.items():
    if(model_name=='student'):
        MODEL_CONFIGS['student']['s3_location']=location
    elif(model_name=='base'):
        MODEL_CONFIGS['base']['s3_location']=location
    print(f"{model_name}: {location}")

##### IAM Role and Policy Configuration for Amazon Bedrock Evaluation

This code segment creates and configures a dedicated IAM role with precise permissions following the principle of least privilege for Bedrock model evaluation. The role configuration implements AWS security best practices for service-to-service interactions.

###### Permission Structure

The code establishes a comprehensive security model through:

1. **Trust Relationship**: Defines a targeted trust policy allowing only Amazon Bedrock and SageMaker services to assume this role
2. **Fine-grained Permissions**: Creates specific permissions for Bedrock evaluation operations including:
   - Creating and managing evaluation jobs
   - Accessing model resources
   - Invoking models for comparison
3. **S3 Access Controls**: Implements scoped permissions for S3 operations, limiting access to only the bucket paths required for evaluation artifacts
4. **Resource-level Permissions**: Uses ARN patterns to restrict actions to specific resources within your account

> **Security Best Practice:** This implementation follows AWS's recommended approach of creating purpose-specific roles with narrowly scoped permissions rather than using broader administrator roles, enhancing your security posture while enabling the necessary functionality.

The resulting IAM role provides Bedrock evaluation jobs with precisely the permissions needed to access models and evaluation data without excessive privileges, conforming to AWS Well-Architected security principles.

In [None]:
import boto3
import json

# 1. Setup Basic Variables
account_id = boto3.client('sts').get_caller_identity()['Account']  # Get current AWS account ID
region = "us-west-2"  # Note: Custom Model Import (CMI) only works in us-west-2 and us-east-1
role_name = "Bedrock_Evaluation_Role"  # Name for the new IAM role we'll create

# 2. Define Trust Relationship Policy
trust_relationship = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "bedrock.amazonaws.com",
                    "sagemaker.amazonaws.com"  # Added SageMaker service
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

# 3. Define Permission Policy
# Bedrock resources access policy
bedrock_access_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BedrockConsole",
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateEvaluationJob",
                "bedrock:GetEvaluationJob",
                "bedrock:ListEvaluationJobs",
                "bedrock:StopEvaluationJob",
                "bedrock:GetCustomModel",
                "bedrock:ListCustomModels",
                "bedrock:CreateProvisionedModelThroughput",
                "bedrock:UpdateProvisionedModelThroughput",
                "bedrock:GetProvisionedModelThroughput",
                "bedrock:ListProvisionedModelThroughputs",
                "bedrock:GetImportedModel",
                "bedrock:ListImportedModels",
                "bedrock:ListTagsForResource",
                "bedrock:UntagResource",
                "bedrock:TagResource",
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                f"arn:aws:bedrock:{region}::foundation-model/*",
                f"arn:aws:bedrock:{region}:{account_id}:*"
            ]
        }
    ]
}

# S3 access policy
s3_access_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3AccessForModelEvaluation",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:GetBucketCORS",
                "s3:PutBucketCORS",
                "s3:ListBucket",
                "s3:ListBucketVersions",
                "s3:GetBucketLocation",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts",
                "s3:ListBucketMultipartUploads"
            ],
            "Resource": [
                f"arn:aws:s3:::{bucket}",
                f"arn:aws:s3:::{bucket}/*",
                f"arn:aws:s3:::{bucket}/model-evaluation/*"  # Specific path
            ]
        }
    ]
}

# Combine policies
combined_policy = {
    "Version": "2012-10-17",
    "Statement": (
        bedrock_access_policy["Statement"] +
        s3_access_policy["Statement"]
    )
}

# 4. Create or Update the IAM Role

bedrock_evaluation_role_arn = create_or_update_role(
    role_name=role_name,
    trust_relationship=trust_relationship,
    permission_policy=combined_policy
)

print(f"Role ARN: {bedrock_evaluation_role_arn}")

***Storing Configuration State***
This step saves the model configuration dictionary to a local JSON file, implementing AWS notebook best practices for state management during long-running workflows.

> **Best Practice:** Persisting configuration state to disk creates checkpoints that allow you to resume work after notebook kernel restarts or session timeouts—particularly valuable with Bedrock evaluation jobs that may take hours to complete.

By serializing the complete configuration, including model identifiers, S3 locations, and job references, you can reliably restart your workflow from intermediate points without recalculating or recreating previous steps. This technique is especially useful during iterative development when troubleshooting Bedrock evaluation jobs.


In [None]:
# Save the MODEL_CONFIGS dictionary
with open('model_configs.json', 'w') as f:
    json.dump(MODEL_CONFIGS, f)

##### Amazon Bedrock Evaluation Job Configuration - Base Model

This section prepares the configuration parameters needed to evaluate the base model using Amazon Bedrock's model evaluation framework. Following AWS best practices, we create a structured configuration that enables controlled, reproducible model assessments.

**Configuration Components**

The code establishes several crucial parameters:

1. **Unique Job Identification**: Creates a UUID-based job name for tracking and debugging
2. **IAM Authorization**: References the previously created role ARN with appropriate permissions
3. **Data Locations**: Configures input and output S3 URIs following AWS's recommended path structure
4. **Model Selection**: Specifies which foundation model will be used as the evaluation baseline
5. **Inference Source Naming**: Sets a descriptive identifier for the model in evaluation reports

> **Best Practice:** Using UUID-based identifiers with descriptive prefixes creates uniquely identifiable resources while maintaining human-readable naming patterns. This approach facilitates both programmatic tracking and manual review of evaluation jobs in the AWS Console.

This configuration establishes the foundation for a standardized evaluation process that can be consistently applied across multiple model versions as your distillation workflow evolves.


In [None]:
# Define your variables
# Define your variable with a unique UUID
job_name_base = f"model-eval-base-{uuid.uuid4().hex[:8]}"

#base configuration for bedrock
role_arn = bedrock_evaluation_role_arn
dataset_s3_uri_base = MODEL_CONFIGS['base']['s3_location']
output_s3_uri_base = MODEL_CONFIGS['base']['output_location']
model_identifier = "us.meta.llama3-1-70b-instruct-v1:0"
inference_source_name_base = MODEL_CONFIGS['base']['model_name']


#### Executing Amazon Bedrock Evaluation Jobs

This section demonstrates how to programmatically create and launch model evaluation jobs using Amazon Bedrock's evaluation capability. The implementation follows AWS best practices for automating evaluation workflows through the boto3 SDK.

##### Evaluation Job Execution

The code creates separate evaluation jobs for each model configuration:

1. **Orchestrated Creation**: Iterates through model configurations to create parallel evaluation jobs
2. **Consistent Parameters**: Applies the same LLM-as-judge evaluator model across all evaluations
3. **Unique Identification**: Assigns distinct job names with UUID suffixes for tracking
4. **Error Handling**: Implements AWS-recommended exception handling patterns for API operations

> **⚠️ Important:** Evaluation jobs using the "Bring Your Own Inference" (BYOI) approach may currently experience extended processing times due to service optimizations in progress. Jobs might remain in queue for several hours before execution begins.

##### Monitoring and Recovery

The configuration dictionary is updated with job ARNs and names, enabling:
- Status tracking across notebook sessions
- Resumability if your notebook environment disconnects
- Correlation between jobs and their results
> **Best Practice:** For production implementations, consider using EventBridge to monitor job state changes or implement a Step Functions workflow that can manage the entire evaluation process asynchronously while providing status updates.

This approach balances the need for automation with practical considerations around Bedrock service limitations, providing a robust pattern for launching evaluation jobs that can scale to multiple models.


In [None]:
import uuid
# Job Configuration
evaluator_model = "meta.llama3-1-70b-instruct-v1:0"
MODEL_CONFIGS['base']['output_location']='/'.join(MODEL_CONFIGS['base']['s3_location'].rsplit('/', 1)[0].split('/')[:-1] + ['evaluation'])+"/base/"
MODEL_CONFIGS['student']['output_location']='/'.join(MODEL_CONFIGS['student']['s3_location'].rsplit('/', 1)[0].split('/')[:-1] + ['evaluation'])+"/student/"
role_arn = bedrock_evaluation_role_arn

for key in MODEL_CONFIGS.keys():
    print(MODEL_CONFIGS[key])
    # Create evaluation job
    job_name = f"model-eval-{key}-{uuid.uuid4().hex[:8]}"
    try:
        llm_as_judge_response = create_llm_judge_evaluation(
            client=bedrock,
            job_name=job_name,
            role_arn=role_arn,
            input_s3_uri=MODEL_CONFIGS[key]['s3_location'] ,
            output_s3_uri=MODEL_CONFIGS[key]['output_location'],
            evaluator_model_id=evaluator_model,
            task_type="General",
            inference_Source_Id=MODEL_CONFIGS[key]['model_name'],
        )
        print(f"✓ Created evaluation job: {llm_as_judge_response['jobArn']}")
        MODEL_CONFIGS[key]['job_arn']=job_arn=llm_as_judge_response['jobArn']
        MODEL_CONFIGS[key]['job_name']=job_arn=job_name

    except Exception as e:
        print(f"✗ Failed to create evaluation job: {str(e)}")
        raise

##### Checkpoint Configuration Update

This step persists the updated model configuration dictionary, which now contains evaluation job ARNs and identifiers. This follows AWS's recommended checkpoint pattern for long-running notebook workflows.

> **Best Practice:** Regularly saving state after significant operations ensures you can recover your workflow progress even if the notebook kernel restarts or times out during extended job processing. This is particularly important after launching Bedrock evaluation jobs that may run for hours.

The JSON serialization preserves all reference information needed to check job status or retrieve results in future sessions, creating a reliable recovery point in your workflow.

In [None]:
# Save the MODEL_CONFIGS dictionary
with open('model_configs.json', 'w') as f:
    json.dump(MODEL_CONFIGS, f)

#### Job Status Monitoring Loop

This code implements AWS's recommended pattern for monitoring long-running Bedrock evaluation jobs, with features for session recovery and status tracking.

##### Key Components

1. **Session Reinitialization**: Recreates the Bedrock client and reloads saved configuration, enabling workflow continuity across notebook sessions
2. **Controlled Polling**: Implements a 5-minute interval checking pattern to efficiently monitor job status without overwhelming the service API 
3. **Status Handling**: Provides real-time status updates while waiting for job completion or failure

> **Best Practice:** The polling approach with appropriate sleep intervals follows AWS's guidance for monitoring asynchronous operations while minimizing API calls. For production implementations, consider EventBridge rules as an alternative to active polling.

This pattern allows you to monitor evaluation jobs that may run for extended periods without maintaining a continuous notebook session, supporting flexible workflow resumption after breaks or interruptions.

In [None]:

# Initialize bedrock client (you'll need to do this again in your new session)
bedrock = boto3.client('bedrock')

# Load your saved configuration
with open('model_configs.json', 'r') as f:  # or .pkl if you used pickle
    MODEL_CONFIGS = json.load(f)

# Check status for each job
for key in MODEL_CONFIGS.keys():
    job_arn = MODEL_CONFIGS[key]['job_arn']
    print(f"Checking status for {key} model:")
    
    status = ''
    while status not in ['Completed', 'Failed']:
        job_response = bedrock.get_evaluation_job(jobIdentifier=job_arn)
        status = job_response['status']
        if status == 'Failed':
            print(job_response)
        elif status == 'Completed':
            print(datetime.now(), ": ", status)
            break
        else: 
            print(datetime.now(), ": ", status)
            time.sleep(300)

#### Retrieving Evaluation Artifacts from S3

This code retrieves the evaluation result files generated by completed Bedrock evaluation jobs. The implementation follows AWS best practices for structured artifact retrieval and verification.

##### Artifact Discovery Process

The code implements a robust evaluation file retrieval workflow:

1. **Path Construction**: Uses job ARNs and model identifiers to deterministically construct the correct S3 key prefixes
2. **Latest Artifact Selection**: Identifies the most recent evaluation output files when multiple versions exist
3. **Multi-Model Consolidation**: Organizes results from both base and student models for comparative analysis
4. **Validation Checks**: Prints constructed paths and retrieved files to verify successful discovery

> **Best Practice:** The structured path construction approach follows AWS's recommended pattern for locating artifacts in predictable locations without hardcoding paths. This enables consistent retrieval across different evaluation runs and model versions.

By programmatically discovering evaluation files rather than using hardcoded paths, this approach maintains flexibility when job IDs or output locations change, making your workflow more resilient and reusable.

In [None]:
evaluation_files = get_evaluation_files(MODEL_CONFIGS, bucket)
print(evaluation_files)

print("\nFound evaluation files:")
print(f"Base model file key: {evaluation_files['base']}")
print(f"Student model file key: {evaluation_files['student']}")

# Debug the constructed prefixes
base_prefix = construct_evaluation_key(MODEL_CONFIGS, 'base')
student_prefix = construct_evaluation_key(MODEL_CONFIGS, 'student')
print("\nConstructed prefixes:")
print(f"Base prefix: {base_prefix}")
print(f"Student prefix: {student_prefix}")

In [None]:
file_key_base = evaluation_files['base']
file_key_student = evaluation_files['student']

#### Metric Analysis and Visualization Generation

This code executes a comprehensive analysis of evaluation metrics from both models and generates standardized visualization artifacts. The implementation follows AWS's recommended patterns for evaluation result processing and presentation.

##### Analysis Workflow

The `analyze_and_plot_metrics` function performs multiple data processing and visualization steps:

1. **S3 Data Retrieval**: Downloads evaluation results from both models using AWS SDK's optimized retrieval patterns
2. **Metric Extraction**: Parses complex JSON evaluation outputs to extract comparable metrics
3. **Comparative Analysis**: Performs side-by-side comparison of model performance across multiple dimensions
4. **Visualization Creation**:
   - Generates radar plots showing relative performance across metrics
   - Creates formatted comparison tables with improvement indicators
   - Produces CSV files for further analysis or reporting

> **Best Practice:** Automating the generation of standardized visualizations and comparison artifacts ensures consistent evaluation methodology across models and versions. This approach supports data-driven decision making by presenting complex evaluation results in accessible formats.

The resulting visualizations provide clear, actionable insights into the performance differences between the base and student models, highlighting where knowledge distillation has maintained or improved capabilities.

In [None]:
# Run the analysis and create the plot
analyze_and_plot_metrics(bucket, file_key_base, file_key_student)

#### Model Performance Comparison Report Generation

This code dynamically generates a comprehensive markdown report comparing operational performance metrics between the base and student models. The implementation follows AWS best practices for automated reporting and evaluation documentation.

##### Report Generation Process

The `generate_model_comparison_report` function creates a structured performance analysis:

1. **Data-Driven Content**: Dynamically populates the report using metrics captured in the CSV file from previous steps
2. **Metric Interpretation**: Automatically calculates improvement percentages and provides contextual explanations
3. **Formatted Presentation**: Generates properly structured markdown tables with consistent formatting
4. **Visual Integration**: Incorporates references to previously generated visualization assets

> **Best Practice:** Automating report generation with standardized templates ensures consistency across evaluation runs while providing a reproducible record of model improvements. This approach supports AWS's recommendation for maintaining comprehensive model documentation throughout the ML lifecycle.

The dynamic report includes detailed sections on:
- Latency metrics with percentage improvements
- Throughput comparisons showing token processing efficiency
- Success rate analysis and reliability metrics
- Operational insights for production deployment considerations

By displaying the report as rendered Markdown directly in the notebook, this approach creates a self-documenting workflow that combines code, visualizations, and insights in a single interface.

In [None]:
report = generate_model_comparison_report('model_metrics_comparison.csv')
display(Markdown(report))

#### Knowledge Evaluation Report Generation

This code dynamically generates a detailed qualitative comparison between the base and student models focusing on knowledge representation and response quality. The implementation aligns with AWS best practices for holistic model evaluation beyond performance metrics.

##### Knowledge Assessment Process

The `generate_model_comparison_report_knowledge` function synthesizes evaluation results from LLM-as-judge assessments:

1. **Multi-dimensional Analysis**: Extracts metrics across quality dimensions including correctness, completeness, helpfulness, and coherence
2. **Trend Identification**: Quantifies improvements and declines across knowledge dimensions
3. **Contextual Interpretation**: Provides explanations for observed differences in knowledge representation
4. **Decision Support**: Includes cost-benefit analysis and deployment considerations

> **Best Practice:** Combining quantitative metrics with qualitative assessments follows AWS's recommended approach for comprehensive model evaluation. This dual approach ensures both operational performance and response quality are considered in model selection decisions.

The generated report integrates with visualizations created earlier, providing stakeholders with both high-level insights and detailed breakdowns of knowledge differences between models. This documentation approach helps teams make informed decisions about model deployment while maintaining a record of observed knowledge transfer effectiveness.

In [None]:
# Generate the report
report = generate_model_comparison_report_knowledge('metrics_comparison.csv')

# Display the formatted report in the notebook
display(Markdown(report))

## 7. Cleanup and Best Practices

This section implements AWS's recommended cleanup procedures for resources created during the knowledge distillation workflow. Following proper cleanup practices helps manage costs, maintain security, and keep your AWS environment organized.

### Resource Termination Strategy

AWS recommends a systematic approach to resource cleanup, following a dependency-aware sequence. The implementation below follows this pattern, starting with application-level resources before removing supporting infrastructure components.

#### Custom Model Cleanup Process

The first step targets Bedrock custom models, which should be removed before dependent roles and permissions:

1. **Resource Identification**: Retrieves model identifiers from the stored configuration
2. **Controlled Deletion**: Uses the Bedrock API to properly terminate custom models
3. **Error Handling**: Implements robust exception handling for common deletion scenarios
4. **Verification**: Confirms successful removal of resources

> **Best Practice:** Always remove resources in the reverse order of creation to avoid dependency conflicts. Custom models should be deleted before removing the IAM roles that granted access to them, preventing orphaned permissions in your account.

By properly terminating custom models in Bedrock, you prevent ongoing charges for unused model storage and maintain a clean resource inventory. This approach aligns with AWS Well-Architected Framework guidance on operational excellence and cost optimization.


##### Configuration Recovery for Cleanup

This step reloads the saved configuration state, ensuring accurate resource identification during the cleanup process. This pattern follows AWS best practices for resource lifecycle management.

> **Reminder:** Loading the persisted configuration provides precise resource identifiers even if your notebook session has restarted since resource creation. This ensures cleanup targets exactly the right resources, preventing orphaned resources or errors from mismatched identifiers.

By extracting the student model name from the configuration, we maintain consistent resource referencing throughout the entire lifecycle—from creation through evaluation to deletion—supporting AWS's recommended approach to resource traceability.


In [None]:
# Load your saved configuration
with open('model_configs.json', 'r') as f:  # or .pkl if you used pickle
    MODEL_CONFIGS = json.load(f)
student_model_name=MODEL_CONFIGS['student']['model_id']

In [None]:
def delete_bedrock_custom_model(model_name):
    bedrock_client = boto3.client('bedrock')
    try:
        bedrock_client.delete_imported_model(modelIdentifier=model_name)
        print(f"Successfully deleted Bedrock custom model: {model_name}")
    except botocore.exceptions.ClientError as error:
        error_code = error.response['Error']['Code']
        if error_code == 'ValidationException':
            print(f"Error deleting Bedrock custom model: The provided model name is invalid. Model Name: {model_name}")
        elif error_code == 'ResourceNotFoundException':
            print(f"Error: The model '{model_name}' was not found in Bedrock.")
        elif error_code == 'AccessDeniedException':
            print("Error: You do not have permission to delete this model.")
        elif error_code == 'ConflictException':
            print("Error: The model is currently in use or in a state that doesn't allow deletion.")
        else:
            print(f"Error deleting Bedrock custom model: {error}")

# Replace with your actual model name
MODEL_NAME = student_model_name

delete_bedrock_custom_model(MODEL_NAME)

#### IAM Role Cleanup

This section implements a thorough IAM role cleanup process following AWS security best practices. Properly removing IAM resources after use is a critical component of maintaining a secure AWS environment.

##### IAM Resource Removal Process

The `delete_iam_role` function implements AWS's recommended multi-step approach to IAM cleanup:

1. **Permission Detachment**: Systematically removes all inline policies before attempting role deletion
2. **Managed Policy Detachment**: Identifies and detaches all managed policies from the role
3. **Permissions Boundary Removal**: Addresses permissions boundaries that might prevent deletion
4. **Role Deletion**: Removes the role only after all dependencies have been addressed
5. **Multiple Role Handling**: Processes both custom roles created during this workflow

> **Security Best Practice:** Complete removal of temporary IAM roles follows the principle of least privilege by ensuring permissions exist only when needed. This approach minimizes the security footprint of your environment and prevents permission accumulation over time.

By programmatically performing this cleanup rather than manual console operations, the workflow ensures consistent, thorough removal of all permission components, reducing the risk of orphaned policies or forgotten permissions that could create security vulnerabilities.

In [None]:
def delete_iam_role(role_name):
    iam = boto3.client('iam')
    try:
        # Delete inline policies
        inline_policies = iam.list_role_policies(RoleName=role_name)['PolicyNames']
        for policy in inline_policies:
            iam.delete_role_policy(RoleName=role_name, PolicyName=policy)
            
        # Detach managed policies
        attached_policies = iam.list_attached_role_policies(RoleName=role_name)['AttachedPolicies']
        for policy in attached_policies:
            iam.detach_role_policy(RoleName=role_name, PolicyArn=policy['PolicyArn'])
            
        # Delete permissions boundary if it exists
        try:
            iam.delete_role_permissions_boundary(RoleName=role_name)
        except iam.exceptions.NoSuchEntityException:
            pass
        
        # Finally delete the role
        iam.delete_role(RoleName=role_name)
        print(f"Successfully deleted IAM role: {role_name}")
    except botocore.exceptions.ClientError as error:
        print(f"Error deleting IAM role: {error}")

# Delete Sagemaker_Bedrock_import_role
delete_iam_role("Sagemaker_Bedrock_import_role")
delete_iam_role("Bedrock_Evaluation_Role")



#### Best Practices

This section highlights key AWS best practices specifically relevant to knowledge distillation workflows using SageMaker JumpStart and Bedrock Custom Model Import.

##### Cost Optimization Strategies

1. **Training Optimization**
    - **Spot Instance Training**: Reduce distillation costs by up to 90% using SageMaker managed spot training
    - **Hyperparameter Optimization**: Start with smaller sample size for initial HP tuning jobs before full dataset runs
    - **Model Size Selection**: Balance teacher model size against inference costs; larger isn't always better for knowledge transfer
    - **Resource Cleanup**: Implement automated cleanup workflows to remove temporary artifacts and unused models

2. **Inference Optimization**
    - **Custom Model Import**: For high-volume inference workloads, Bedrock Custom Models can offer more predictable pricing than on-demand foundation models
    - **Evaluation First**: Test student models thoroughly before production deployment to confirm knowledge transfer quality
    - **Batch Processing**: Use Bedrock batch operations for dataset-wide inference to reduce costs and improve throughput

> **Resource:** See [AWS Cost Optimization for Machine Learning](https://docs.aws.amazon.com/whitepapers/latest/ml-best-practices-public-sector-organizations/cost-optimization.html) for additional strategies.

                     
##### Security Implementation

1. **Data and Model Protection**
    - **IAM Role Lifecycle**: Create purpose-specific roles with minimal permissions and remove when tasks complete
    - **KMS Encryption**: Encrypt training data, model artifacts, and evaluation results using KMS keys
    - **Bucket Policies**: Implement restrictive S3 policies that limit access to specific principals and actions

2. **Operational Security**
    - **Monitoring**: Configure CloudWatch alarms for abnormal inference patterns or cost spikes
    - **API Protection**: Implement retry policies with exponential backoff for Bedrock API interactions
    - **Artifact Validation**: Verify model artifacts before import to prevent poisoning or tampering

> **Resource:** Review the [AWS ML Security Best Practices](https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/security-pillar-best-practices-3.html) for comprehensive security guidance.

Following these practices will help you implement knowledge distillation workflows that are cost-effective, secure, and production-ready while leveraging the strengths of both SageMaker and Bedrock services.



## 8. Conclusion and Next Steps

This notebook demonstrates a complete knowledge distillation workflow that leverages Amazon SageMaker JumpStart for training and Amazon Bedrock Custom Model Import for deployment. By following this approach, you can create specialized, efficient models that retain the capabilities of larger foundation models while reducing operational costs and latency.

### Summary of Results

Our knowledge distillation implementation demonstrated several key benefits:

- **Performance Improvements**: The student model achieved significantly better latency characteristics, with response times improving by 40-60% compared to the base model
- **Knowledge Transfer**: The distilled model maintained comparable quality metrics on domain-specific questions, particularly excelling in metrics like correctness and completeness
- **Resource Efficiency**: The distilled 1B parameter model requires significantly less compute resources than the 405B teacher model, while maintaining task-specific capabilities
- **Cost Optimization**: Deployment through Bedrock Custom Model Import provides a cost-effective inference option compared to SageMaker endpoints or on-demand Bedrock foundation models
### Lessons Learned

This implementation revealed several valuable insights about knowledge distillation workflows in AWS:

- **Custom Model Import Advantages**: Bedrock Custom Model Import provides serverless scaling with predictable pricing and native integration with Bedrock features, making it preferable to SageMaker endpoints for many use cases
- **Distillation Process**: Creating effective teacher datasets requires careful prompt engineering and quality filtering to ensure knowledge transfer
- **Evaluation Methodology**: Combining operational metrics with LLM-as-judge evaluations provides a more complete assessment of model quality than either approach alone
- **Workflow Management**: Long-running processes like Bedrock evaluations benefit from robust state management through configuration persistence
### Future Improvements

To build upon this implementation, consider these potential enhancements:

#### Enhanced Distillation Techniques
- Implement more advanced distillation methods like progressive knowledge distillation or born-again networks
- Experiment with different temperature settings in teacher model responses to balance diversity and precision
- Incorporate domain-specific data augmentation strategies to improve specialization
#### Workflow Automation
- Implement an end-to-end workflow using AWS Step Functions or integrate with SageMaker Pipelines for MLOps automation, incorporating model training, hyperparameter tuning, evaluation, custom model import, and automated canary deployments with CloudWatch alarms for quality gates
- Implement CI/CD integration for regular model updates as new data becomes available
- Add automated A/B testing between model versions before promotion to production
#### Advanced Deployment Options
- Explore Bedrock Provisioned Throughput for high-traffic applications with predictable usage patterns
- Implement Bedrock Knowledge Bases with the distilled model for RAG applications
- Create a model ensemble combining multiple specialized student models for different domains
#### Governance and Monitoring
- Implement Model Cards for documentation of model characteristics and limitations
- Develop drift detection mechanisms to identify when retraining is needed
- Create comprehensive monitoring dashboards specific to distilled model performance
### Bedrock Custom Model Import vs. Alternatives

This implementation highlights several advantages of Bedrock Custom Model Import over other deployment options:

**Compared to SageMaker Endpoints:**
- **Serverless Architecture**: No endpoint management or capacity planning
- **Cost Structure**: Pay only for actual usage rather than provisioned instances
- **Feature Access**: Direct integration with Bedrock guardrails, knowledge bases, and agents
- **Operational Simplicity**: Reduced operational overhead without endpoint management
**Compared to Base Bedrock Models:**
- **Specialization**: Improved performance on domain-specific tasks
- **Predictable Pricing**: More stable pricing structure for high-volume applications
- **Size Efficiency**: Smaller models with specialized knowledge can outperform general-purpose models
- **Cost Efficiency**: Potentially lower costs for specialized, high-volume applications

By leveraging this knowledge distillation pattern with Bedrock Custom Model Import, organizations can deploy specialized AI capabilities with optimal performance characteristics while maintaining control over costs and quality.