# Amazon Bedrock Model Distillation Guide - JSONL training data available in Amazon S3 bucket

## Introduction

Model distillation is a knowledge transfer technique where a smaller 'student' model learns to mimic the behavior of a larger 'teacher' model. In Amazon Bedrock, this process allows you to create more efficient models while maintaining performance by:

1. Having the teacher model (e.g., Nova Premier) generate high-quality responses
2. Training a smaller student model (e.g., Nova Lite) to replicate these responses
3. Optimizing the student model's parameters through supervised learning

This guide demonstrates how to implement model distillation using **JSONL training data in Amazon S3**. The process involves:

- Set up and configure distillation jobs
- Prepare and format training data for distillation
- Upload and use training data from S3
- Manage model provisioning and deployment
- Run inference with distilled models

The guide covers essential API operations including:
- Creating and configuring distillation jobs
- Managing training data sources in S3
- Handling model deployments
- Implementing production best practices using boto3 and the Bedrock SDK

While model distillation offers benefits like improved efficiency and reduced costs, this guide focuses on the practical implementation details and API usage patterns needed to successfully execute distillation workflows in Amazon Bedrock.

## Best Practices and Considerations

When using model distillation:
1. Ensure your training data is diverse and representative of your use case
2. Monitor distillation metrics in the S3 output location
3. Evaluate the distilled model's performance against your requirements
4. Consider cost-performance tradeoffs when selecting model units for deployment

The distilled model should provide faster responses and lower costs while maintaining acceptable performance for your specific use case.

### Setup and Prerequisites

Before we begin, make sure you have the following:

- An active AWS account with appropriate permissions
- Amazon Bedrock access enabled in your preferred region
- An S3 bucket for storing training data and output
- Training data in JSONL format
- Sufficient service quota to use Provisioned Throughput in Bedrock
- An IAM role with the following permissions:

IAM Policy:
```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_DISTILLATION_OUTPUT_BUCKET",
                "arn:aws:s3:::YOUR_DISTILLATION_OUTPUT_BUCKET/*",
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateModelCustomizationJob",
                "bedrock:GetModelCustomizationJob",
                "bedrock:ListModelCustomizationJobs",
                "bedrock:StopModelCustomizationJob"
            ],
            "Resource": "arn:aws:bedrock:YOUR_REGION:YOUR_ACCOUNT_ID:model-customization-job/*"
        }
    ]
}
```

Trust Relationship:
```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "bedrock.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "YOUR_ACCOUNT_ID"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:bedrock:YOUR_REGION:YOUR_ACCOUNT_ID:model-customization-job/*"
                }
            }
        }
    ]
}
```

#### Dataset:
As an example, in this notebook we will be using the `Uber10K dataset`.

First, let's set up our environment and import required libraries.

In [1]:
# upgrade boto3 
%pip install --upgrade pip --quiet
%pip install boto3 --upgrade --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

# Model Selection and Configuration

Choosing the right teacher and student models is crucial for successful distillation. Consider these key factors:

1. Performance targets
   - Accuracy requirements for your specific use case
   - Acceptable trade-offs between performance and efficiency
   - Quality metrics specific to your task (e.g., citation accuracy)

2. Latency requirements
   - Maximum acceptable inference time
   - Throughput needs (requests per second)
   - Response time consistency requirements

3. Total Cost of Ownership (TCO)
   - Model hosting costs
   - Inference costs per request
   - Training and maintenance costs

In this example, we're using:
- Teacher: Amazon Nova Premier (high accuracy, larger model)
- Student: Amazon Nova Lite (faster inference, smaller footprint)

For production use cases, evaluate multiple model combinations and run thorough benchmarks. See the [Amazon Bedrock Model Selection Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-selection.html) for detailed guidance.

**Note**: Run this code sample in a [supporting region](https://docs.aws.amazon.com/bedrock/latest/userguide/regions.html) for your chosen models.

In [2]:
import json
import sys
import os

current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)
sys.path.append(parent_dir)

import boto3
from datetime import datetime
from botocore.exceptions import ClientError
from utils import create_s3_bucket, upload_training_data_to_s3, delete_s3_bucket_and_contents, \
create_model_distillation_role_and_permissions, delete_role_and_attached_policies, delete_distillation_buckets

# Create Bedrock client
bedrock_client = boto3.client(service_name="bedrock",region_name='us-east-1')

# Create runtime client for inference
bedrock_runtime = boto3.client(service_name='bedrock-runtime',region_name='us-east-1')

# Region and accountID
session = boto3.session.Session(region_name='us-east-1')
region =  'us-east-1' # session.region_name
sts_client = session.client(service_name='sts',region_name='us-east-1')
account_id = sts_client.get_caller_identity()['Account']

# define bucket you want to create and upload the dataset to:
bucket_name='905418197933-distillation' # Replace by your bucket name
data_prefix = 'citations_distillation' # Replace by your defined prefix

# configure teacher nd student model
teacher_model = "us.amazon.nova-premier-v1:0"
student_model = "amazon.nova-lite-v1:0:300k"

# Prepare Dataset for Model Distillation

Before we start the distillation process, we need to prepare our dataset. We'll create a function to convert our input data into the format required by Amazon Bedrock.

#### Model Distillation Input Format

The training data must follow the Bedrock conversation schema in JSONL format. Each line should be a valid JSON object with this structure:

```json
{
    "schemaVersion": "bedrock-conversation-2024",
    "system": [
        {
            "text": <Your-System-Prompt>
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": <Your-Prompt-And-OR-Context>
                }
            ]
        },
        {
            "role": "assistant",
            "content": [
                {
                    "text": <Your-Ground-Truth-Response>
                }
            ]
        }
    ]
}
```

Key formatting requirements:
- Each line must be a complete JSON object
- The schemaVersion field must be specified as `bedrock-conversation-2024`
- System instructions should be included in the system array
- Messages (including any context) must include both user and assistant roles in the correct order
- All text content must be wrapped in the appropriate content structure

In [None]:
# Generate unique names for the job and model
distillation_dataset = 'distillation_data.jsonl'
current_dt = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
job_name = f"distill-citations-{current_dt}"
model_name = f"distilled-citations-{current_dt}"



Creating IAM role...
Creating IAM policy...
Attaching policy to role...
Successfully created role and policy!
Successfully created bucket '905418197933-distillation' in region 'us-east-1'
Bucket ARN: arn:aws:s3:::905418197933-distillation
Uploading distillation_data.jsonl to bucket 905418197933-distillation with prefix citations_distillation...
Successfully uploaded distillation_data.jsonl to S3 bucket!
File S3 URI: s3://905418197933-distillation/citations_distillation/distillation_data.jsonl


In [None]:

# Configure models and IAM role
role_name, role_arn = create_model_distillation_role_and_permissions(bucket_name=bucket_name, account_id=account_id)

# creating training data bucket
create_s3_bucket(bucket_name=bucket_name)

# Specify S3 locations
training_data_s3_uri = upload_training_data_to_s3(bucket_name, distillation_dataset, prefix=data_prefix)
# training_data = "s3://sample-data-us-east-1-228707323172-1/citations_distillation/distillation_data.jsonl"
output_path = f"s3://{bucket_name}/{data_prefix}/outputs/"

# Set maximum response length
max_response_length = 1000

# Starting the Distillation Job

With our dataset prepared, we can now start the distillation job. We'll use the `create_model_customization_job` API to do this.

In [6]:
# need to add wait to let the role be persisted here
response = bedrock_client.create_model_customization_job(
    jobName=job_name,
    customModelName=model_name,
    roleArn=role_arn,
    baseModelIdentifier=student_model,
    customizationType="DISTILLATION",
    trainingDataConfig={
        "s3Uri": training_data_s3_uri
    },
    outputDataConfig={
        "s3Uri": output_path
    },
    customizationConfig={
        "distillationConfig": {
            "teacherModelConfig": {
                "teacherModelIdentifier": teacher_model,
                "maxResponseLengthForInference": max_response_length 
            }
        }
    }
)

# Monitoring the Distillation Job

After starting the distillation job, it's crucial to monitor both its progress and quality metrics. The distillation process involves several phases:

1. Data Preparation
   - Loading and validating training data
   - Preprocessing examples for teacher model inference

2. Teacher Model Inference
   - Generating high-quality responses for training examples
   - Validating response quality and format

3. Student Model Training
   - Fine-tuning the student model on teacher outputs
   - Optimizing for performance and efficiency

We'll use the `get_model_customization_job` API to track progress and access metrics. Key status values:
- `InProgress`: Job is actively running
- `Completed`: Distillation finished successfully
- `Failed`: Job encountered errors (check error messages)
- `Stopped`: Job was manually terminated

Monitor the S3 output location for detailed logs and metrics. See [Model Customization Monitoring](https://docs.aws.amazon.com/bedrock/latest/userguide/monitoring-customization.html) for more details.

In [7]:
# Record the distillation job arn
job_arn = response['jobArn']

# print job status
job_status = bedrock_client.get_model_customization_job(jobIdentifier=job_arn)["status"]
print(job_status)

InProgress


In [6]:
job_arn = "arn:aws:bedrock:us-east-1:905418197933:model-customization-job/amazon.nova-lite-v1:0:300k/68vwpdrxdgrm"
model_name = "distilled-citations-2025-06-09-15-31-33"

# Deploying the Distilled Model

Once the distillation job completes successfully, we can deploy our optimized model. Deployment involves creating a Provisioned Throughput (PT) model instance, which provides:

1. Dedicated Capacity
   - Consistent performance with dedicated resources
   - Predictable latency for production workloads
   - Ability to scale based on demand

2. Cost Management
   - Pay only for provisioned capacity
   - Option for cost savings with longer commitments
   - Ability to adjust capacity as needed

3. Monitoring & Management
   - CloudWatch metrics for performance tracking
   - Auto-scaling capabilities (if configured)
   - Health checks and automated recovery

For production deployments, consider:
- Setting up monitoring and alerts
- Implementing retry logic and fallbacks
- Regular performance evaluation

See [Provisioned Throughput](https://docs.aws.amazon.com/bedrock/latest/userguide/provisioned-throughput.html) for deployment best practices.

In [7]:
# Deploy the distilled model
custom_model_id = bedrock_client.get_model_customization_job(jobIdentifier=job_arn)['outputModelArn']
distilled_model_name = model_name

provisioned_model_id = bedrock_client.create_provisioned_model_throughput(
    modelUnits=1,
    provisionedModelName=distilled_model_name,
    # commitmentDuration # ommitted for no-commit
    modelId=custom_model_id 
)['provisionedModelArn']

 We need to store the provisioned throughput endpoint ARN for use in our invoke calls in the subsequent notebook.

In [8]:
%store provisioned_model_id
%store custom_model_id

Stored 'provisioned_model_id' (str)
Stored 'custom_model_id' (str)


## Purchase a PT endpoint to set up inferencing for evaluation

# Clean Up
Let's delete the resources that were created in this notebook. `Uncomment` the code below to delete the resources.

In [None]:
# # delete bucket and dataset
# delete_distillation_buckets(bucket_name)

# delete role and its policy:
# delete_role_and_attached_policies(role_name=role_name)

# delete provisioned throughput:
# response = bedrock_client.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_id)

# Conclusion

In this guide, we've walked through the entire process of model distillation using Amazon Bedrock. We covered:

1. Setting up the environment
2. Preparing the dataset
3. Configuring and starting a distillation job
4. Monitoring the job's progress
5. Deploying the distilled model
6. Cleaning up resources

Model distillation is a powerful technique that can help you create more efficient models tailored to your specific use case. By following this guide, you should now be able to implement model distillation in your own projects using Amazon Bedrock.

Remember to always consider your specific use case requirements when selecting models and configuring the distillation process. 

**Happy distilling!**