# Amazon Bedrock Model Distillation for Citations - Advanced Implementation Guide

## Learning Objectives

After completing this notebook, you will be able to:
1. Implement advanced model distillation techniques using Amazon Bedrock's APIs
2. Configure and optimize teacher-student model architectures for citation tasks
3. Monitor and evaluate distillation performance metrics
4. Deploy and manage production-grade distilled models
5. Implement best practices for model efficiency and cost optimization

## Introduction

Model distillation is an advanced knowledge transfer technique that enables the creation of efficient, production-ready models by distilling knowledge from larger foundation models into smaller, specialized ones. This notebook demonstrates enterprise-grade implementation of model distillation in Amazon Bedrock, focusing on citation generation use cases.

### Technical Overview

The distillation process involves several sophisticated components:

1. Knowledge Transfer Architecture
   - Teacher Model (Nova Premier): Provides high-fidelity outputs and knowledge representation
   - Student Model (Nova Lite): Optimized for efficient inference while maintaining citation accuracy
   - Distillation Layer: Manages the knowledge transfer and optimization process

2. Training Pipeline
   - Input Processing: JSONL format with specialized citation schema
   - Teacher Inference: Generates high-quality responses with citation metadata
   - Student Training: Optimizes for both performance and efficiency
   - Validation: Ensures citation accuracy and response quality

3. Production Deployment
   - Provisioned Throughput Management
   - Performance Monitoring
   - Cost Optimization
   - High Availability Configuration

This implementation guide focuses on advanced techniques and production considerations for building robust, scalable citation systems using Amazon Bedrock's distillation capabilities.

## Production Considerations

When implementing model distillation in production:

1. Performance Optimization
   - Monitor and tune distillation hyperparameters
   - Implement robust validation pipelines
   - Optimize for both accuracy and latency

2. Resource Management
   - Scale provisioned throughput based on demand
   - Implement cost monitoring and optimization
   - Configure auto-scaling policies

3. Quality Assurance
   - Validate citation accuracy and relevance
   - Monitor model drift and performance degradation
   - Implement automated testing pipelines

### Setup and Prerequisites

Before proceeding, ensure you have:

- An active AWS account with appropriate permissions
- Amazon Bedrock access enabled in your preferred region
- An S3 bucket for storing training data and output
- Training data in JSONL format
- Sufficient service quota to use Provisioned Throughput in Bedrock
- An IAM role with the following permissions:

IAM Policy:
```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_DISTILLATION_OUTPUT_BUCKET",
                "arn:aws:s3:::YOUR_DISTILLATION_OUTPUT_BUCKET/*",
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateModelCustomizationJob",
                "bedrock:GetModelCustomizationJob",
                "bedrock:ListModelCustomizationJobs",
                "bedrock:StopModelCustomizationJob"
            ],
            "Resource": "arn:aws:bedrock:YOUR_REGION:YOUR_ACCOUNT_ID:model-customization-job/*"
        }
    ]
}
```

Trust Relationship:
```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "bedrock.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "YOUR_ACCOUNT_ID"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:bedrock:YOUR_REGION:YOUR_ACCOUNT_ID:model-customization-job/*"
                }
            }
        }
    ]
}
```

#### Dataset:
For this implementation, we'll use the `Uber10K dataset`, which provides a diverse set of citation examples for model training.

First, let's set up our environment and import required libraries.

In [None]:
# upgrade boto3 
%pip install --upgrade pip --quiet
%pip install boto3 --upgrade --quiet

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

# Model Selection and Configuration

Choosing the right teacher and student models is crucial for successful distillation. Consider these key factors:

1. Performance targets
   - Citation accuracy and relevance metrics
   - Response quality and coherence
   - Context understanding and utilization

2. Latency requirements
   - Maximum acceptable inference time
   - Throughput needs (requests per second)
   - Response time consistency requirements

3. Total Cost of Ownership (TCO)
   - Model hosting costs
   - Inference costs per request
   - Training and maintenance costs

In this implementation, we're using:
- Teacher: Amazon Nova Premier (high accuracy, larger model)
  - Optimized for high-quality citation generation
  - Strong context understanding capabilities
  - Robust metadata handling

- Student: Amazon Nova Lite (faster inference, smaller footprint)
  - Tuned for efficient citation processing
  - Optimized for production deployment
  - Balanced performance-cost ratio

For production deployments, conduct thorough benchmarking across multiple model combinations. See the [Amazon Bedrock Model Selection Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-selection.html) for detailed guidance.

**Note**: Ensure you're operating in a [supporting region](https://docs.aws.amazon.com/bedrock/latest/userguide/regions.html) for your chosen models.

In [None]:
import json
import sys
import os

current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)
sys.path.append(parent_dir)

import boto3
from datetime import datetime
from botocore.exceptions import ClientError
from utils import create_s3_bucket, upload_training_data_to_s3, delete_s3_bucket_and_contents, \
create_model_distillation_role_and_permissions, delete_role_and_attached_policies, delete_distillation_buckets

# Create Bedrock client
bedrock_client = boto3.client(service_name="bedrock",region_name='us-east-1')

# Create runtime client for inference
bedrock_runtime = boto3.client(service_name='bedrock-runtime',region_name='us-east-1')

# Region and accountID
session = boto3.session.Session(region_name='us-east-1')
region =  'us-east-1' # session.region_name
sts_client = session.client(service_name='sts',region_name='us-east-1')
account_id = sts_client.get_caller_identity()['Account']

# define bucket you want to create and upload the dataset to:
BUCKET_NAME= '<BUCKET_NAME>' # Replace by your bucket name
DATA_PREFIX = 'citations_distillation' # Replace by your defined prefix

# configure teacher nd student model
teacher_model = "us.amazon.nova-premier-v1:0"
student_model = "amazon.nova-lite-v1:0:300k"

# Prepare Dataset for Model Distillation

The quality of your training data is crucial for successful model distillation. Let's examine the required format and best practices for data preparation.

### Model Distillation Input Schema

Training data must follow the Bedrock conversation schema in JSONL format. Each line represents a complete training example:

```json
{
    "schemaVersion": "bedrock-conversation-2024",
    "system": [
        {
            "text": <Your-System-Prompt>
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": <Your-Prompt-And-OR-Context>
                }
            ]
        },
        {
            "role": "assistant",
            "content": [
                {
                    "text": <Your-Ground-Truth-Response>
                }
            ]
        }
    ]
}
```

### Data Quality Requirements

1. Schema Compliance
   - Valid JSON format per line
   - Required schemaVersion field
   - Complete message structure

2. Content Quality
   - Diverse citation examples
   - Accurate ground truth responses
   - Proper citation metadata

3. Technical Validation
   - UTF-8 encoding
   - No malformed JSON
   - Consistent formatting

In [None]:
# Generate unique names for the job and model
distillation_dataset = 'distillation_data.jsonl'
current_dt = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
job_name = f"distill-citations-{current_dt}"
model_name = f"distilled-citations-{current_dt}"


In [None]:

# Configure models and IAM role
role_name, role_arn = create_model_distillation_role_and_permissions(bucket_name=BUCKET_NAME, account_id=account_id)

# creating training data bucket
create_s3_bucket(bucket_name=BUCKET_NAME)

# Specify S3 locations
training_data_s3_uri = upload_training_data_to_s3(BUCKET_NAME, distillation_dataset, prefix=DATA_PREFIX)
output_path = f"s3://{BUCKET_NAME}/{DATA_PREFIX}/outputs/"

# Set maximum response length
max_response_length = 1000

# Starting the Distillation Job

With our environment configured and data prepared, we'll initiate the distillation process. This section covers:

1. Job Configuration
   - Model selection and parameters
   - Resource allocation
   - Output settings

2. Performance Optimization
   - Response length tuning
   - Batch size configuration
   - Resource utilization

3. Monitoring Setup
   - Metrics configuration
   - Logging settings
   - Alert thresholds

We'll use the `create_model_customization_job` API with production-optimized settings.

In [None]:
# need to add wait to let the role be persisted here
response = bedrock_client.create_model_customization_job(
    jobName=job_name,
    customModelName=model_name,
    roleArn=role_arn,
    baseModelIdentifier=student_model,
    customizationType="DISTILLATION",
    trainingDataConfig={
        "s3Uri": training_data_s3_uri
    },
    outputDataConfig={
        "s3Uri": output_path
    },
    customizationConfig={
        "distillationConfig": {
            "teacherModelConfig": {
                "teacherModelIdentifier": teacher_model,
                "maxResponseLengthForInference": max_response_length 
            }
        }
    }
)

# Monitoring the Distillation Job

Effective monitoring is crucial for production-grade model distillation. The process involves several phases:

1. Data Processing Phase
   - Input validation and preprocessing
   - Schema compliance checking
   - Data quality metrics

2. Teacher Model Phase
   - Response generation monitoring
   - Quality metrics tracking
   - Error handling and recovery

3. Student Training Phase
   - Loss metrics monitoring
   - Performance optimization
   - Resource utilization tracking

### Key Metrics to Monitor

1. Quality Metrics
   - Citation accuracy
   - Response relevance
   - Context utilization

2. Performance Metrics
   - Training loss
   - Validation scores
   - Inference latency

3. Resource Metrics
   - GPU utilization
   - Memory usage
   - Network throughput

Monitor the job status using `get_model_customization_job` and track detailed metrics in CloudWatch. See [Model Customization Monitoring](https://docs.aws.amazon.com/bedrock/latest/userguide/monitoring-customization.html) for comprehensive monitoring guidance.

In [None]:
# Record the distillation job arn
job_arn = response['jobArn']
print("job arn", job_arn)

# print job status
job_status = bedrock_client.get_model_customization_job(jobIdentifier=job_arn)["status"]
print(job_status)

# Deploying the Distilled Model

Production deployment requires careful consideration of several factors:

1. Infrastructure Configuration
   - Provisioned Throughput (PT) sizing
   - High availability setup
   - Auto-scaling policies

2. Performance Optimization
   - Response caching strategies
   - Load balancing configuration
   - Request routing optimization

3. Monitoring and Management
   - CloudWatch metrics integration
   - Alert configuration
   - Performance dashboards

4. Cost Management
   - Resource utilization tracking
   - Cost allocation monitoring
   - Budget alerts

For detailed deployment guidance, see [Provisioned Throughput](https://docs.aws.amazon.com/bedrock/latest/userguide/provisioned-throughput.html).

In [None]:
# Deploy the distilled model
custom_model_id = bedrock_client.get_model_customization_job(jobIdentifier=job_arn)['outputModelArn']
distilled_model_name = model_name

provisioned_model_id = bedrock_client.create_provisioned_model_throughput(
    modelUnits=1,
    provisionedModelName=distilled_model_name,
    # commitmentDuration # ommitted for no-commit
    modelId=custom_model_id 
)['provisionedModelArn']

In [None]:
provisioned_model_id

Store the provisioned model endpoint ARN for subsequent inference operations.

In [None]:
%store provisioned_model_id
%store custom_model_id

## Configure Production Inference Endpoint

The Provisioned Throughput endpoint provides dedicated capacity for consistent performance in production environments.

# Resource Management

Proper cleanup of resources is essential for cost management. Use the following code to remove created resources when they're no longer needed.

In [None]:
# # delete bucket and dataset
# delete_distillation_buckets(bucket_name)

# delete role and its policy:
# delete_role_and_attached_policies(role_name=role_name)

# delete provisioned throughput:
# response = bedrock_client.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_id)

# Conclusion

This notebook has demonstrated advanced implementation techniques for model distillation in Amazon Bedrock, with a focus on citation generation use cases. 

## Next Steps

In the next notebook ([03_batch_inference.ipynb](03_batch_inference.ipynb)), we'll explore:
1. Implementing batch inference with the distilled model
2. Evaluating citation accuracy and performance metrics
3. Optimizing throughput and latency
4. Monitoring production workloads

For additional resources:
- [Amazon Bedrock Model Distillation Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-distillation.html)
- [Best Practices for Production Deployments](https://docs.aws.amazon.com/bedrock/latest/userguide/best-practices.html)
- [Advanced Monitoring and Optimization](https://docs.aws.amazon.com/bedrock/latest/userguide/monitoring-customization.html)