# Supervised Fine-Tuning (SFT) with Parameter Efficient Fine Tuning (PEFT LoRA) of Amazon Nova for Text-to-SQL Generation



This notebook demonstrates Supervised Fine-Tuning (SFT) with Parameter-Efficient Fine-Tuning (PEFT) of Amazon Nova Micro for text-to-SQL generation using Amazon SageMaker Training Job. SFT is a technique that allows fine-tuning language models on specific tasks using labeled examples, while PEFT enables efficient fine-tuning by updating only a small subset of the model's parameters.

## Overview

This notebook illustrates the process of fine-tuning Amazon Nova Micro for text-to-SQL generation and demonstrates the complete workflow from data preparation to model evaluation. The approach combines proven techniques from multiple sources:

- **Data Preparation**: SQL dataset converted to bedrock-conversation-2024 schema 
- **Training Method**: SFT using SageMaker training job LoRA approach using Nova-specific recipes and configurations
- **Use Case**: Text-to-SQL generation with comprehensive evaluation and inference pipeline


## Installing Dependencies

In [49]:
!pip install sagemaker datasets pandas scikit-learn --upgrade --quiet

## Setup and Prerequisites

In [76]:
import sagemaker
import boto3
import json
import pandas as pd
from datasets import load_dataset, Dataset
from sklearn.model_selection import train_test_split
import time
import utils

sess = sagemaker.Session()
sagemaker_session_bucket = None

if sagemaker_session_bucket is None and sess is not None:
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
bucket_name = sess.default_bucket()
default_prefix = sess.default_bucket_prefix
region = "us-east-1" # Currently US-EAST-1 is the only region where nova model customization is supported 


print(f'sagemaker role arn: {role}')
print(f'sagemaker bucket: {bucket_name}')
print(f'sagemaker bucket prefix: {default_prefix}')
print(f'sagemaker session region: {region}')



sagemaker role arn: arn:aws:iam::133856113780:role/service-role/AmazonSageMaker-ExecutionRole-20250805T200090
sagemaker bucket: sagemaker-us-east-1-133856113780
sagemaker bucket prefix: None
sagemaker session region: us-east-1


In [77]:
# S3 prefix for training data
training_input_path = f's3://{sess.default_bucket()}/datasets/nova-sql-context'

## 1. Data Preparation

We'll use the [sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context) dataset and format it according to the bedrock-conversation-2024 schema that Nova expects.

### Step 1a: Load and Explore the Dataset

In [78]:
# Load the SQL dataset
dataset = load_dataset('b-mc2/sql-create-context')
print(f'Dataset size: {len(dataset["train"])}')
print('Sample record:')
print(dataset['train'][19])

Dataset size: 78577
Sample record:
{'answer': 'SELECT Theme FROM farm_competition ORDER BY YEAR', 'question': 'What are the themes of farm competitions sorted by year in ascending order?', 'context': 'CREATE TABLE farm_competition (Theme VARCHAR, YEAR VARCHAR)'}


### Step 1b: Convert to Bedrock Conversation Format

Nova expects data in the bedrock-conversation-2024 format:

```json
{
  "schemaVersion": "bedrock-conversation-2024",
  "system": [
    {
      "text": "System prompt content"
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "User question"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "text": "Assistant response"
        }
      ]
    }
  ]
}
```

In [79]:
#Here we use a helper function to convert our data into the bedrock conversation format needed for finetuning our model
from utils import create_bedrock_conversation


sample_converted = create_bedrock_conversation(dataset['train'][0])
print('Sample converted record:')
print(json.dumps(sample_converted, indent=2))

Sample converted record:
{
  "schemaVersion": "bedrock-conversation-2024",
  "system": [
    {
      "text": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You can use the following table schema for context: CREATE TABLE head (age INTEGER)"
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "Return the SQL query that answers the following question: How many heads of the departments are older than 56 ?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "text": "SELECT COUNT(*) FROM head WHERE age > 56"
        }
      ]
    }
  ]
}


### Step 1c: Create Train/Test Split and Convert Dataset

In [80]:
MAX_TRAINING_SAMPLES = 200
total_samples = min(len(dataset['train']), MAX_TRAINING_SAMPLES + 2000)

print(f'Original dataset size: {len(dataset["train"])}')
print(f'Using {total_samples} samples (to get ~{MAX_TRAINING_SAMPLES} training samples)')

# Convert limited dataset
converted_data = []
for i, record in enumerate(dataset['train']):
    if i >= total_samples:
        break
    converted_data.append(create_bedrock_conversation(record))

print(f'Converted {len(converted_data)} records')

# Create train/test split
train_data, test_data = train_test_split(converted_data, test_size=0.1, random_state=42)

# Ensure training data doesn't exceed 20k limit
if len(train_data) > MAX_TRAINING_SAMPLES:
    train_data = train_data[:MAX_TRAINING_SAMPLES]
    print(f'Trimmed training data to {MAX_TRAINING_SAMPLES} samples')

print(f'Final training samples: {len(train_data)}')
print(f'Final test samples: {len(test_data)}')

Original dataset size: 78577
Using 2200 samples (to get ~200 training samples)
Converted 2200 records
Trimmed training data to 200 samples
Final training samples: 200
Final test samples: 220


### Step 1d: Save Data in JSONL Format and upload to S3

In [81]:
import os

# Create data directory
os.makedirs('data', exist_ok=True)

# Save training data
with open('data/train_dataset.jsonl', 'w', encoding='utf-8') as f:
    for record in train_data:
        f.write(json.dumps(record, separators=(',', ':')) + '\n')

# Save test data
with open('data/test_dataset.jsonl', 'w', encoding='utf-8') as f:
    for record in test_data:
        f.write(json.dumps(record, separators=(',', ':')) + '\n')

print('Datasets saved successfully!')
print(f'Training file: data/train_dataset.jsonl ({len(train_data)} records)')
print(f'Test file: data/test_dataset.jsonl ({len(test_data)} records)')


Datasets saved successfully!
Training file: data/train_dataset.jsonl (200 records)
Test file: data/test_dataset.jsonl (220 records)


In [82]:
# Upload datasets to S3
train_s3_path = sagemaker.s3.S3Uploader.upload('data/train_dataset.jsonl', training_input_path)
test_s3_path = sagemaker.s3.S3Uploader.upload('data/test_dataset.jsonl', training_input_path)

print('Training data uploaded to:', train_s3_path)
print('Test data uploaded to:', test_s3_path)

Training data uploaded to: s3://sagemaker-us-east-1-133856113780/datasets/nova-sql-context/train_dataset.jsonl
Test data uploaded to: s3://sagemaker-us-east-1-133856113780/datasets/nova-sql-context/test_dataset.jsonl


## Nova Micro Fine-Tuning Setup 

In [83]:

from datasets import Dataset, DatasetDict
from random import randint
from utils import prepare_dataset_for_nova

# Convert to datasets format
train_df = pd.DataFrame(train_data)
test_df = pd.DataFrame(test_data)


train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

dataset_dict = DatasetDict({
    'train': train_dataset,
    'test': test_dataset
})

# Apply Nova preparation
train_dataset_nova = dataset_dict['train'].map(
    prepare_dataset_for_nova, 
    remove_columns=train_dataset.features
)

test_dataset_nova = dataset_dict['test'].map(
    prepare_dataset_for_nova,
    remove_columns=test_dataset.features
)

print(f'Prepared {len(train_dataset_nova)} training samples')
print(f'Prepared {len(test_dataset_nova)} test samples')
print('\nSample prepared record:')
print(train_dataset_nova[0])

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/220 [00:00<?, ? examples/s]

Prepared 200 training samples
Prepared 220 test samples

Sample prepared record:
{'system': [{'text': 'You are a powerful text-to-SQL model. Your job is to answer questions about a database. You can use the following table schema for context: CREATE TABLE editor (Name VARCHAR, Age VARCHAR)'}], 'messages': [{'content': [{'text': 'Return the SQL query that answers the following question: What is the name of the youngest editor?'}], 'role': 'user'}, {'content': [{'text': 'SELECT Name FROM editor ORDER BY Age LIMIT 1'}], 'role': 'assistant'}]}


# 2. Fine tuning the model

In this step we are going to fine tune Nova Micro using a PyTorch estimator to run the supervised fine-tuning job with LoRa a Parameter-Efficient Fine-Tuning (PEFT) technique. we will also use a [Nova recipe](https://docs.aws.amazon.com/sagemaker/latest/dg/nova-model-recipes.html), which is a YAML configuration file that provides details to SageMaker AI on how to run your model customization job,it defines optimization settings, and includes any additional options required to fine-tune or train the model successfully. The code will be packaged to run inside a SageMaker training job.

In [84]:
first_row_dataset = train_dataset.select(range(1))  # Gets indices 0 through 0
first_row_dataset

Dataset({
    features: ['schemaVersion', 'system', 'messages'],
    num_rows: 1
})

In [86]:
# Nova configuration
model_id = "nova-micro/prod"
recipe = "fine-tuning/nova/nova_micro_g5_g6_48x_gpu_lora_sft"
instance_type = "ml.g5.48xlarge" 
instance_count = 1 

# Nova-specific image URI
image_uri = f"708977205387.dkr.ecr.{sess.boto_region_name}.amazonaws.com/nova-fine-tune-repo:SM-TJ-SFT-latest"

print(f'Model ID: {model_id}')
print(f'Recipe: {recipe}')
print(f'Instance type: {instance_type}')
print(f'Instance count: {instance_count}')
print(f'Image URI: {image_uri}')

Model ID: nova-micro/prod
Recipe: fine-tuning/nova/nova_micro_g5_g6_48x_gpu_lora_sft
Instance type: ml.g5.48xlarge
Instance count: 1
Image URI: 708977205387.dkr.ecr.us-east-1.amazonaws.com/nova-fine-tune-repo:SM-TJ-SFT-latest


## Step 2a: Create PyTorch estimator

In [87]:
from sagemaker.pytorch import PyTorch

# Define Training Job Name
job_name = f"train-{model_id.split('/')[0].replace('.', '-')}-sql-peft-sft"

# Define OutputDataConfig path
if default_prefix:
    output_path = f"s3://{bucket_name}/{default_prefix}/{job_name}"
else:
    output_path = f"s3://{bucket_name}/{job_name}"

# Recipe overrides
recipe_overrides = {
    "run": {
        "replicas": instance_count,
    },
}

# Create PyTorch estimator
estimator = PyTorch(
    output_path=output_path,
    base_job_name=job_name,
    role=role,
    disable_output_compression=True,
    disable_profiler=True,
    debugger_hook_config=False,
    instance_count=instance_count,
    instance_type=instance_type,
    training_recipe=recipe,
    recipe_overrides=recipe_overrides,
    max_run=432000,  
    sagemaker_session=sess,
    image_uri=image_uri
)

print(f'Training job name: {job_name}')
print(f'Output path: {output_path}')
print('PyTorch estimator created successfully!')

Cloning into '/tmp/launcher_j4lg7sdf'...
INFO:sagemaker:Remote debugging, profiler and debugger hooks are disabled for Nova recipes.


Training job name: train-nova-micro-sql-peft-sft
Output path: s3://sagemaker-us-east-1-133856113780/train-nova-micro-sql-peft-sft
PyTorch estimator created successfully!


In [88]:
# Configure Data Channels
from sagemaker.inputs import TrainingInput

train_input = TrainingInput(
    s3_data=train_s3_path,
    distribution="FullyReplicated",
    s3_data_type="Converse",  # Important: Nova uses "Converse" data type
)

val_input = TrainingInput(
    s3_data=test_s3_path,
    distribution="FullyReplicated",
    s3_data_type="Converse",
)

print('Data channels configured:')
print(f'Training data: {train_s3_path}')
print(f'Validation data: {test_s3_path}')
print('Data type: Converse (Nova-specific)')

Data channels configured:
Training data: s3://sagemaker-us-east-1-133856113780/datasets/nova-sql-context/train_dataset.jsonl
Validation data: s3://sagemaker-us-east-1-133856113780/datasets/nova-sql-context/test_dataset.jsonl
Data type: Converse (Nova-specific)


## Step 2b: Begin training

In [89]:
# Start the Nova training job
print('Starting Nova Micro fine-tuning job for Text-to-SQL...')

# Launch training job with train and validation inputs
estimator.fit(inputs={"train": train_input, "validation": val_input}, wait=False)

print(f'Training job started: {estimator.latest_training_job.name}')
print('Monitor progress in SageMaker console')

INFO:sagemaker.telemetry.telemetry_logging:SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features.
To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk.
INFO:sagemaker:Creating training-job with name: train-nova-micro-sql-peft-sft-2025-11-24-21-10-30-120


Starting Nova Micro fine-tuning job for Text-to-SQL...
Training job started: train-nova-micro-sql-peft-sft-2025-11-24-21-10-30-120
Monitor progress in SageMaker console


In [90]:
# Monitor training job progress - checks every minute until completion
import time
from IPython.display import HTML, display, clear_output

training_job_name = estimator.latest_training_job.name
region = sess.boto_region_name

print(f'Training Job Name: {training_job_name}')
print('Monitoring job status (updates every minute)...')

# Monitor job status
sagemaker_client = boto3.client('sagemaker', region_name=region)
start_time = time.time()

while True:
    try:
        response = sagemaker_client.describe_training_job(TrainingJobName=training_job_name)
        status = response['TrainingJobStatus']
        
        # Calculate elapsed time
        elapsed_minutes = int((time.time() - start_time) / 60)
        
        # Clear previous output and show current status
        clear_output(wait=True)
        print(f'Training Job: {training_job_name}')
        print(f'Status: {status}')
        print(f'Elapsed time: {elapsed_minutes} minutes')
        
        if 'TrainingStartTime' in response:
            start_time_str = response['TrainingStartTime'].strftime('%Y-%m-%d %H:%M:%S')
            print(f'Started: {start_time_str}')
        
        if status in ['Completed', 'Failed', 'Stopped']:
            if 'TrainingEndTime' in response:
                end_time_str = response['TrainingEndTime'].strftime('%Y-%m-%d %H:%M:%S')
                print(f'Ended: {end_time_str}')
            
            if status == 'Completed':
                print('\nTraining completed successfully!')
                if 'ModelArtifacts' in response:
                    model_uri = response['ModelArtifacts']['S3ModelArtifacts']
                    print(f'Model artifacts: {model_uri}')
            elif status == 'Failed':
                print('\nTraining failed!')
                if 'FailureReason' in response:
                    print(f'Reason: {response["FailureReason"]}')
            else:
                print(f'\nTraining stopped with status: {status}')
            
            break
        
        print('\nTraining in progress... (checking again in 60 seconds)')
        time.sleep(60)  # Wait 1 minute before next check
        
    except KeyboardInterrupt:
        print('\nMonitoring stopped by user')
        break
    except Exception as e:
        print(f'\nError checking job status: {str(e)}')
        break

Training Job: train-nova-micro-sql-peft-sft-2025-11-24-21-10-30-120
Status: Completed
Elapsed time: 30 minutes
Started: 2025-11-24 21:13:05
Ended: 2025-11-24 21:40:15

Training completed successfully!
Model artifacts: s3://sagemaker-us-east-1-133856113780/train-nova-micro-sql-peft-sft/train-nova-micro-sql-peft-sft-2025-11-24-21-10-30-120/output/model


---
## Wait Until the ^^ Training Job ^^ Completes Above! 
---

## Retrieve the Manifest File Containing the Custom Model Artifact URI

The model artifacts are stored in an Amazon-managed S3 escrow bucket. To deploy the model, we need to fetch the URI path from the manifest file.





In [94]:
# Initialize S3 client
s3 = boto3.client('s3', region_name=region)

# Construct the manifest path from the output_path
manifest_key = f"{output_path.replace(f's3://{bucket_name}/', '')}/{training_job_name}/output/output/manifest.json"
print(f"Reading manifest from: s3://{bucket_name}/{manifest_key}")

try:
    # Download and read the manifest file
    response = s3.get_object(Bucket=bucket_name, Key=manifest_key)
    manifest_content = response['Body'].read().decode('utf-8')
    manifest_data = json.loads(manifest_content)
    
    # Extract the checkpoint S3 URI
    checkpoint_s3_uri = manifest_data.get('checkpoint_s3_bucket')
    
    if checkpoint_s3_uri:
        print(f"Checkpoint S3 URI found:")
        print(f"{checkpoint_s3_uri}")
        
        # Store it in a variable for later use
        checkpoint_uri = checkpoint_s3_uri
        print(f"Stored in variable: checkpoint_uri")
    else:
        print("'checkpoint_s3_bucket' key not found in manifest.json")
        print("Manifest contents:")
        print(json.dumps(manifest_data, indent=2))
        
except Exception as e:
    print(f"Error reading manifest: {str(e)}")
    print(f"Troubleshooting:")
    print(f"  - Verify the output_path variable is set correctly")
    print(f"  - Check that manifest.json exists at: s3://{bucket_name}/{manifest_key}")



Reading manifest from: s3://sagemaker-us-east-1-133856113780/train-nova-micro-sql-peft-sft/train-nova-micro-sql-peft-sft-2025-11-24-21-10-30-120/output/output/manifest.json
Checkpoint S3 URI found:
s3://customer-escrow-133856113780-smtj-23f4ff73/train-nova-micro-sql-peft-sft-2025-11-24-21-10-30-120/384
Stored in variable: checkpoint_uri


## 3. Model Deployment

Now that the model training has been completed, we'll deploy the model to Bedrock for inferenceing 

In [95]:
#Bedrock Client initilizations and configs
from botocore.config import Config
from botocore.exceptions import ClientError
my_config = Config(connect_timeout=60*3, read_timeout=60*3)
bedrock = boto3.client('bedrock', region_name='us-east-1')
bedrock_runtime = boto3.client(service_name='bedrock-runtime', config=my_config)


Create IAM Role to allow model deployment to Bedrock 

In [96]:

import boto3
import json
from botocore.exceptions import ClientError

# Create the role to deploy model to bedrock 
iam = boto3.client('iam')
sts = boto3.client('sts')
account_id = sts.get_caller_identity()['Account']
bucket = bucket_name

role_name = 'BedrockNovaImportRole'
policy_name = 'BedrockNovaS3Access'

# Create role with error handling
try:
    role = iam.create_role(
        RoleName=role_name,
        AssumeRolePolicyDocument=json.dumps({
            "Version": "2012-10-17",
            "Statement": [{
                "Effect": "Allow",
                "Principal": {"Service": "bedrock.amazonaws.com"},
                "Action": "sts:AssumeRole",
                "Condition": {
                    "StringEquals": {"aws:SourceAccount": account_id},
                    "ArnEquals": {"aws:SourceArn": f"arn:aws:bedrock:{region}:{account_id}:model-import-job/*"}
                }
            }]
        })
    )
    bedrock_role = role['Role']['Arn']
    print(f"Created new role: {bedrock_role}")
except ClientError as e:
    if e.response['Error']['Code'] == 'EntityAlreadyExists':
        print(f"Role {role_name} already exists, retrieving existing role...")
        role = iam.get_role(RoleName=role_name)
        bedrock_role = role['Role']['Arn']
        print(f"Using existing role: {bedrock_role}")
    else:
        print(f"Error creating role: {e}")
        raise

# Create and attach S3 policy with error handling
try:
    policy = iam.create_policy(
        PolicyName=policy_name,
        PolicyDocument=json.dumps({
            "Version": "2012-10-17",
            "Statement": [{
                "Effect": "Allow",
                "Action": ["s3:GetObject", "s3:ListBucket"],
                "Resource": [f"arn:aws:s3:::{bucket}", f"arn:aws:s3:::{bucket}/*"]
            }]
        })
    )
    policy_arn = policy['Policy']['Arn']
    print(f"Created new policy: {policy_arn}")
except ClientError as e:
    if e.response['Error']['Code'] == 'EntityAlreadyExists':
        print(f"Policy {policy_name} already exists, retrieving existing policy...")
        policy_arn = f"arn:aws:iam::{account_id}:policy/{policy_name}"
        print(f"Using existing policy: {policy_arn}")
    else:
        print(f"Error creating policy: {e}")
        raise

# Attach policy to role with error handling
try:
    iam.attach_role_policy(
        RoleName=role_name,
        PolicyArn=policy_arn
    )
    print(f"Successfully attached policy to role")
except ClientError as e:
    if e.response['Error']['Code'] == 'LimitExceeded':
        print(f"Policy already attached to role")
    else:
        print(f"Error attaching policy: {e}")
        raise

print(f"Final Role ARN: {bedrock_role}")




Role BedrockNovaImportRole already exists, retrieving existing role...
Using existing role: arn:aws:iam::133856113780:role/BedrockNovaImportRole
Policy BedrockNovaS3Access already exists, retrieving existing policy...
Using existing policy: arn:aws:iam::133856113780:policy/BedrockNovaS3Access
Successfully attached policy to role
Final Role ARN: arn:aws:iam::133856113780:role/BedrockNovaImportRole


## Step 3a: Create custom model in Bedrock

Now we create a new custom model in Amazon Bedrock from our SageMaker Amazon Nova model stored in the Amazon-managed S3 bucket

In [99]:

import uuid

from utils import wait_for_model_active
def create_custom_model(bedrock_client, model_name, s3_uri, role_arn):
    """
    Deploy a PEFT/LoRA fine-tuned Nova model for on-demand inference.
    """
    try:
        client_request_token = str(uuid.uuid4())
        
        model_source_config = {
            's3DataSource': {
                's3Uri': s3_uri,  
            }
        }
        
        response = bedrock_client.create_custom_model(
            modelName=model_name,
            roleArn=role_arn,
            modelSourceConfig=model_source_config,
            clientRequestToken=client_request_token
        )
        
        print(f"Model import initiated: {response['modelArn']}")
        return response['modelArn']
        
    except ClientError as e:
        print(f"Error: {e}")
        raise

print("Deploying Model")
print(checkpoint_uri)
model_arn = create_custom_model(
    bedrock,
    model_name='nova-micro-peft-lora',
    s3_uri=checkpoint_uri,
    role_arn=bedrock_role
)

wait_for_model_active(bedrock, model_arn)

Deploying Model
s3://customer-escrow-133856113780-smtj-23f4ff73/train-nova-micro-sql-peft-sft-2025-11-24-21-10-30-120/384
Model import initiated: arn:aws:bedrock:us-east-1:133856113780:custom-model/imported/rtild21brc0d
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Active
✓ Model is ready for on-demand inference!


True

## Step 3b: Deploy custom model for Amazon Bedrock on-demand inferencing
Now that we have created our new custom model in Amazon Bedrock, we can begin deploying the model for inference

In [1]:
def create_model_deployment(custom_model_arn):
    """
    Create an on-demand inferencing deployment for the custom model
    
    Parameters:
    -----------
    custom_model_arn : str
        ARN of the custom model to deploy
        
    Returns:
    --------
    deployment_arn : str
        ARN of the created deployment
    """
    try:
        print(f"Creating on-demand inferencing deployment for model: {custom_model_arn}")
        
        # Generate a unique name for the deployment
        deployment_name = f"nova-sql-deployment-{time.strftime('%Y%m%d-%H%M%S')}"
        
        # Create the deployment
        response = bedrock.create_custom_model_deployment(
            modelArn=custom_model_arn,
            modelDeploymentName=deployment_name,
            description=f"on-demand inferencing deployment for model: {custom_model_arn}",
        )
        
        # Get the deployment ARN
        deployment_arn = response.get('customModelDeploymentArn')
        
        print(f"Deployment request submitted. Deployment ARN: {deployment_arn}")
        return deployment_arn
    
    except Exception as e:
        print(f"Error creating deployment: {e}")
        return None

from utils import check_deployment_status

deployed_model_arn = create_model_deployment(model_arn)
if deployed_model_arn:
    while True:
        status = check_deployment_status(bedrock, deployed_model_arn)
        print(f"Model is in {status} phase")
        
        if status == 'Active':
            break
        elif status == 'Failed':
            raise Exception(f"Deployment failed: {deployed_model_arn}")
        
        time.sleep(15) #sleep for 15 seconds 

print(f"Use the deployment Arn for inferencing: {deployed_model_arn}")
%store deployed_model_arn

NameError: name 'model_arn' is not defined

# 4. Model Evaluation and Testing

Once training completes, we'll evaluate the model's text-to-SQL generation capabilities.

### Evaluation using an LLM as a judge

Since we have access to the "right" answer, we can evaluate similarity between the SQL queries returned by the fine-tuned Llama model and the right answer. Evaluation can be a bit tricky, since there is no single metric that evaluates semantic and syntactic similarity between two SQL queries. One alternative is to use a more powerful LLM, like Claude 3 Sonnet, to measure the similarity between the two SQL queries (LLM as a judge).

In [2]:
# Prepare 100 evaluation samples, prompt our fine-tuned model for the sql generation task then ask our judge model to give a score
from utils import ( ask_nova_micro,ask_claude,
    prepare_evaluation_samples,
    test_sql_generation,
    get_score,
    metrics_test,
    run_cold_and_warm_benchmark,
    plot_ttft_comparison
)

eval_samples = prepare_evaluation_samples(test_data, num_samples=100)
# Show a sample
print('\nSample evaluation record:')
print(json.dumps(eval_samples[0], indent=2))

results = test_sql_generation(eval_samples, deployed_model_arn)

scores = []
print("Grading responses with LLM Judge model")
for result in results:
    if result['status'] == 'success':
        response = float(get_score(
            result['system_prompt'],
            result['query'],
            result['expected_sql'],
            result['generated_sql']
        ))
        scores.append(response)
       

print("Assigned scores: ", scores)
print("The average score of the fine tuned model is: ", sum(scores)/float(len(scores)), "%")


NameError: name 'test_data' is not defined

## Operational Metrics for Nova Micro SFT
Now lets test the latency of our Fine tuned Nova Micro LLM by measuring:

* Time To First Token (TTFS) - Cold start time to first token for loading Lora adapters and invoking the model should be is 1 second
* Overall Throughput per Second (OTPS)

In [None]:
metrics_results= metrics_test(
    model_id= model_arn,
    system = "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You can use the following table schema for context: CREATE TABLE table_name_6 (winner_and_score VARCHAR, week VARCHAR)", 
    prompt="Return the SQL query that answers the following question: who is the winner and score for the week of august 9?"
)
print(f"TTFT: {metrics_results['ttft_ms']:.2f}ms")
print(f"OTPS: {metrics_results['otps']:.2f} tokens/s")
print(f"Total end-to-end latency: {metrics_results['total_time_ms']:.2f}ms")

Now lets increase our test cases to get an average result for cold start time as well as warm start time to first token 

In [None]:
# For quick testing (3 cold starts, 10 warm calls, 2 min wait)

results = run_cold_and_warm_benchmark(
    model_id=deployed_model_arn,
    system = "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You can use the following table schema for context: CREATE TABLE table_name_6 (winner_and_score VARCHAR, week VARCHAR)", 
    prompt="Return the SQL query that answers the following question: who is the winner and score for the week of august 9?",
    num_cold_starts=5,
    num_warm_calls=10,
    cold_start_wait=600  
)

## Plot the results 

In [None]:
plot_ttft_comparison(results)

## Use case price comparison analysis 

Below we run an analysis of running a similar workload on a self-hosted ec2 instance as well as a Sagemaker real-time endpoint 
For this analysis we make the following assumptions 

* Users = 100
* Queries per day = 10
* Usage days 30 - 8(weekend) = 22
* Total queries per month = users * queries per day * 22 = 22,000
* Compute hours = 12



In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Scenario
users = 100
queries_per_day = 10
total_queries_per_month = users * queries_per_day * 22

# Average tokens per query
avg_input_tokens = 80
avg_output_tokens = 60


In [None]:

# Bedrock On-Demand 
# ============================================================================

input_cost = (total_queries_per_month * avg_input_tokens / 1000) * 0.000035
output_cost = (total_queries_per_month * avg_output_tokens / 1000) * 0.00014
bedrock_on_demand = input_cost + output_cost

print(f"\n Bedrock On-Demand: ${bedrock_on_demand:.2f}/month")
print(f"   Cost per query: ${bedrock_on_demand/total_queries_per_month:.6f}")

# Self-Hosted on EC2  g5.12xlarge
# ============================================================================

ec2_hourly = 5.672  
ec2_compute = ec2_hourly * 12 * 22

print(f"\n Self-Hosted (EC2 g5.12xlarge): ${ec2_compute:.2f}/month")
print(f"   Cost per query: ${ec2_compute/total_queries_per_month:.4f}")

# SageMaker Endpoint
# ============================================================================

sagemaker_hourly = ec2_hourly+1.418  # (EC2 + SageMaker overhead)
sagemaker_compute = sagemaker_hourly * 12 * 22
sagemaker_total = sagemaker_compute + ec2_hourly

print(f"\n SageMaker Endpoint: ${sagemaker_total:.2f}/month")
print(f"   Compute: ${sagemaker_compute:.2f}")
print(f"   Cost per query: ${sagemaker_total/total_queries_per_month:.4f}")

# ============================================================================
# VISUALIZATION
# ============================================================================
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Total Monthly Cost
options = ['Bedrock\nOn-Demand', 'Self-Hosted\nEC2', 'SageMaker\nEndpoint']
costs = [bedrock_on_demand, ec2_compute, sagemaker_total]
colors = ['#2ecc71', '#e74c3c', '#f39c12']

bars = ax1.bar(options, costs, color=colors, alpha=0.7, edgecolor='black', linewidth=2)

for bar, cost in zip(bars, costs):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
            f'${cost:,.0f}',
            ha='center', va='bottom', fontsize=12, fontweight='bold')

ax1.set_ylabel('Monthly Cost ($)', fontsize=13, fontweight='bold')
ax1.set_title(f'Monthly Cost Comparison (Verified Pricing)\n{users} users, {queries_per_day} queries/user/day',
             fontsize=14, fontweight='bold')
ax1.grid(axis='y', alpha=0.3)

# Highlight winner
winner_idx = np.argmin(costs)
ax1.get_children()[winner_idx].set_edgecolor('green')
ax1.get_children()[winner_idx].set_linewidth(4)

# Plot 2: Cost per Query
cost_per_query = [c / total_queries_per_month for c in costs]

bars2 = ax2.barh(options, cost_per_query, color=colors, alpha=0.7, edgecolor='black', linewidth=2)

for bar, cpq in zip(bars2, cost_per_query):
    width = bar.get_width()
    ax2.text(width, bar.get_y() + bar.get_height()/2.,
            f' ${cpq:.5f}',
            ha='left', va='center', fontsize=11, fontweight='bold')

ax2.set_xlabel('Cost Per Query ($)', fontsize=13, fontweight='bold')
ax2.set_title('Cost Efficiency', fontsize=14, fontweight='bold')
ax2.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.savefig('cost_comparison_verified.png', dpi=300, bbox_inches='tight')
plt.show()

## Break Even Scale

Now we can see that finetuning our model for on demand usage is significantly cheaper, but at what scale do the other options st

In [None]:
print("BREAK-EVEN ANALYSIS")

cost_per_query_bedrock = bedrock_on_demand / total_queries_per_month
fixed_self_hosted_costs = ec2_compute

break_even_queries = fixed_self_hosted_costs / cost_per_query_bedrock
break_even_users = break_even_queries / (queries_per_day * 30)

print(f"\nSelf-hosted breaks even at:")
print(f"  {break_even_queries:,.0f} queries/month")
print(f"   = {break_even_users:,.0f} users @ {queries_per_day} queries/day")
print(f"   = {break_even_users/users:.0f}x your current scale")

### Get the throughput for the custom model 
* To get the throughput we will get the average of tokens generated per second


In [None]:
from utils import test_model_throughput, visualize_throughput_results

# Run the test
throughput_results = test_model_throughput(deployed_model_arn)

#Visualize the results 
visualize_throughput_results(throughput_results)


### Compare TTFT for Base Nova Micro Model to our SFT Model

In [None]:
#Becasue the base model does not have knowledge of the SQL data we will ask a generic question to both models

from utils import compare_models

custom_model_arn = deployed_model_arn
base_model_id = "us.amazon.nova-micro-v1:0"

system_prompt = "You are a helpful assistant."
test_prompt = "What are the performance specs of a bmw x5m and how does it compare with the porsche macan turbo"

results = compare_models(
    custom_model_arn=custom_model_arn,
    base_model_id=base_model_id,
    system=system_prompt,
    prompt=test_prompt,
    num_runs=10  
)


In [None]:
# what is the percentage in latency that the SFT model has in generating TTFT
ttft_percentage_difference = ((381.49-356.28)/356.28)*100
print("Our Custom model has a Time to first token differene of: ", ttft_percentage_difference,"%")
otps_percentage_difference = ((184.56-253.57)/253.57)*100
print("Our Custom model has a Output per second differene of: ", otps_percentage_difference,"%")


---
## Cleanup Resources


In [78]:
# # Cleanup - Delete all resources
# import boto3, shutil, os

# bedrock = boto3.client('bedrock', region_name=region)
# iam = boto3.client('iam')

# # Delete deployment
# try:
#     bedrock.delete_custom_model_deployment(customModelDeploymentIdentifier=deployment_arn)
#     print("Deployment deleted")
# except: pass

# # Delete model
# try:
#     bedrock.delete_custom_model(modelIdentifier=model_arn)
#     print("Model deleted")
# except: pass

# # Delete IAM role
# try:
#     for p in iam.list_attached_role_policies(RoleName='BedrockNovaImportRole')['AttachedPolicies']:
#         iam.detach_role_policy(RoleName='BedrockNovaImportRole', PolicyArn=p['PolicyArn'])
#         iam.delete_policy(PolicyArn=p['PolicyArn'])
#     iam.delete_role(RoleName='BedrockNovaImportRole')
#     print("IAM resources deleted")
# except: pass

# # Delete local data
# if os.path.exists('data'): shutil.rmtree('data'); print("✓ Local data deleted")

# print("\nCleanup complete!")

✓ Deployment deleted
✓ Model deleted
✓ IAM resources deleted
✓ Local data deleted

Cleanup complete!
