# Amazon Nova Customization SDK - Quick Start Guide

This notebook provides a basic walkthrough of the Amazon Nova Customization SDK for fine-tuning Nova models.

## What You'll Learn

1. Loading and preparing datasets
2. Fine-tuning a Nova model with SFT (Supervised Fine-Tuning)
3. Monitoring training progress
4. Deploying your model
5. Using other training methods to customize your Nova model

## Table of Contents
- [Step 1: Import Required Modules](#step-1-import-required-modules)
- [Step 2: Configure Your AWS Resources](#step-2-configure-your-aws-resources)
- [Step 3: Prepare your Dataset](#step-3-prepare-your-dataset)
- [Step 4: Configure Runtime Infrastructure](#step-4-configure-runtime-infrastructure)
- [Step 5: Initialize Nova Model Customizer](#step-5-initialize-nova-model-customizer)
  - [Step 5.1 Data Mixing Configuration (Optional)](#51-data-mixing-configuration-optional)
- [Step 6: Start Training](#step-6-start-training)
- [Step 7: Monitor Training Progress](#step-7-monitor-training-progress)
- [Step 8: Evaluate the custom Model (After Training Completes)](#step-8-evaluate-the-custom-model-after-training-completes)
- [Step 9: Deploy Your Model (After Training Completes)](#step-9-deploy-your-model-after-training-completes)
- [Try Additional Training Methods (Optional)](#try-additional-training-methods-optional)
- [Summary](#summary)

## Prerequisites

- AWS credentials configured
- S3 bucket for data and model artifacts
- IAM permissions for SageMaker and Bedrock
- Nova Customization SDK installed per [its README](https://github.com/aws-samples/sample-nova-customization-sdk?tab=readme-ov-file#installation)
- (If using the SMHP runtime below) Correct version of Sagemaker HyperPod CLI installed; see the SDK README for details

## Helpful Links
- If you haven't already, please take a look at the `docs/spec.md` file for more information on what parameters you can change in the code below.
- Also visit the `README.md` for a high-level overview of the Nova SDK and its capabilities.

## Step 1: Import Required Modules

In [None]:
!cd ../ && pip install .

In [None]:
import os

import boto3
from botocore.exceptions import ClientError, NoCredentialsError, ProfileNotFound


def load_credentials(profile=None):
    """
    Load AWS credentials with fallback behavior.

    Args:
        profile (str, optional): AWS profile name. If provided, loads from credentials file.
                               If None, uses current authenticated AWS session.

    Returns:
        dict: Dictionary containing AWS credentials and region

    Raises:
        RuntimeError: If credential loading fails
    """
    if profile:
        # Try loading from credentials file
        try:
            session = boto3.Session(profile_name=profile)
            credentials = session.get_credentials()

            if not credentials:
                raise RuntimeError(f"No credentials found for profile '{profile}'")

        except ProfileNotFound:
            raise RuntimeError(f"Profile '{profile}' not found in credentials file")
        except Exception as e:
            raise RuntimeError(f"Failed to load credentials from file: {e}")

    else:
        # Try loading from current authenticated session
        try:
            session = boto3.Session()
            credentials = session.get_credentials()

            if not credentials:
                raise RuntimeError("No credentials found in current AWS session")

        except NoCredentialsError:
            raise RuntimeError("No AWS credentials configured")
        except Exception as e:
            raise RuntimeError(f"Failed to load credentials from current session: {e}")

        # Validate credentials by making a test call
    try:
        sts_client = session.client("sts")
        sts_client.get_caller_identity()
    except ClientError as e:
        raise RuntimeError(f"Invalid AWS credentials: {e}")
    except Exception as e:
        raise RuntimeError(f"Failed to validate credentials: {e}")

    return {
        "aws_access_key_id": credentials.access_key,
        "aws_secret_access_key": credentials.secret_key,
        "aws_session_token": credentials.token,
        "region_name": session.region_name or "us-east-1",
    }

In [None]:
creds = load_credentials()

In [None]:
# Core import
from amzn_nova_customization_sdk import *

print("‚úÖ SDK imported successfully!")

## Step 2: Configure Your AWS Resources

In [None]:
# TODO: Update these values for your environment
S3_BUCKET = "nova-customization-beta"  # TODO: Replace with your S3 bucket
S3_DATA_PATH = f"s3://{S3_BUCKET}/demo/input"
S3_OUTPUT_PATH = f"s3://{S3_BUCKET}/demo/output"

print(f"Data Path: {S3_DATA_PATH}")
print(f"Output Path: {S3_OUTPUT_PATH}")

## Step 3: Prepare Your Dataset

The SDK supports three formats: **JSONL**, **JSON**, and **CSV**. This example uses JSONL.

In [None]:
# Create sample training data
import json

sample_data = [
    {
        "question": "What is machine learning?",
        "answer": "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.",
    },
    {
        "question": "Explain what AWS is.",
        "answer": "AWS (Amazon Web Services) is a comprehensive cloud computing platform that provides on-demand computing resources and services.",
    },
    {
        "question": "What is Python used for?",
        "answer": "Python is a versatile programming language used for web development, data analysis, artificial intelligence, scientific computing, and automation.",
    },
] * 100

# Save sample data locally
with open("training_data.jsonl", "w") as f:
    for item in sample_data:
        f.write(json.dumps(item) + "\n")

print("‚úÖ Sample data created: training_data.jsonl")

### Load, Transform, and Validate the Dataset

In [None]:
# Initialize dataset loader
loader = JSONLDatasetLoader(
    question="question",  # Column name for questions in your data
    answer="answer",  # Column name for answers in your data
)

# Load the data
loader.load("training_data.jsonl")

# Preview the data
print("\nüìä Dataset Preview:")
loader.show(n=3)

In [None]:
# Transform data for Nova model training
loader.transform(method=TrainingMethod.SFT_LORA, model=Model.NOVA_LITE_2)

print("‚úÖ Data transformed to Converse format")
print("\nüìä Transformed Data Preview:")

loader.show(n=5)

In [None]:
# Validates transformed data for the method and model combination. Prints out a "Validation completed" message if successful.
loader.validate(method=TrainingMethod.SFT_LORA, model=Model.NOVA_LITE_2)

### Split and Save Dataset

In [None]:
# Split into train/validation sets
train_loader, val_loader, _ = loader.split_data(
    train_ratio=0.7, val_ratio=0.2, test_ratio=0.1
)

# Save datasets
# For production, upload to S3:
train_path = train_loader.save_data(f"{S3_DATA_PATH}/train.jsonl")
val_path = val_loader.save_data(f"{S3_DATA_PATH}/val.jsonl")

print(f"\n‚úÖ Training data saved to: {train_path}")
print(f"‚úÖ Validation data saved to: {val_path}")

## Step 4: Configure Runtime Infrastructure

Choose between:
- **SMTJRuntimeManager**: For SageMaker Training Jobs
- **SMHPRuntimeManager**: For SageMaker HyperPod clusters

In [None]:
from amzn_nova_customization_sdk.model.model_enums import Platform

In [None]:
# Option 1: SageMaker Training Jobs (SMTJ)
runtime = SMTJRuntimeManager(
    instance_type="ml.p5.48xlarge",  # Choose appropriate instance
    instance_count=4,  # Number of instances
    # execution_role="<your execution role>",  # TODO: Choose execution role (if different from current role)
)

platform = Platform.SMTJ

print("‚úÖ Runtime configured for SageMaker Training Jobs")
print(f"   Instance Type: {runtime.instance_type}")
print(f"   Instance Count: {runtime.instance_count}")

In [None]:
# Option 2: SageMaker HyperPod (if using HyperPod cluster)
# Uncomment and configure if using HyperPod:

# runtime = SMHPRuntimeManager(
#     instance_type="ml.p5.48xlarge",
#     instance_count=4,
#     cluster_name="your-cluster-name",
#     namespace="your-namespace"
# )
#
# platform = Platform.SMHP
#
# print("‚úÖ Runtime configured for SageMaker HyperPod")

In [None]:
# If you want to trigger Hyperpod job you will need to add the nemo_launcher to PYTHONPATH

# import os

# hyperpod_clone_path = <path where you cloned the hyperpod repo>
# os.environ['PYTHONPATH'] = f'{hyperpod_clone_path}/src/hyperpod_cli/sagemaker_hyperpod_recipes/launcher/nemo/nemo_framework_launcher/launcher_scripts:' + os.environ.get('PYTHONPATH', '')

## Step 5: Initialize Nova Model Customizer

### Create MLFlow monitor

In [None]:
# Create MLflow monitor to monitor metrics, this is optional
mlflow_monitor = MLflowMonitor(
    tracking_uri="<Mlflow App/Server Arn>",
    experiment_name="nova-customization-experiment",  # replace with experiment name
    run_name="nova-lite2-sft-run-1",  # replace with run name
)

# mlflow_monitor = MLflowMonitor() # uses default mlflow app, if it exists

# mlflow_monitor = MLflowMonitor(
#     experiment_name="nova-customization-experiment", # replace with experiment name
#     run_name="nova-lite2-sft-run-1" # replace with run name
# ) # uses default mlflow app, if it exists

In [None]:
# Create customizer
customizer = NovaModelCustomizer(
    model=Model.NOVA_LITE_2,  # Choose your Nova model
    method=TrainingMethod.SFT_LORA,  # Training method
    infra=runtime,  # Runtime configuration
    data_s3_path=train_path,  # Training data path
    output_s3_path=S3_OUTPUT_PATH,  # Output path for artifacts
    mlflow_monitor=mlflow_monitor,  # optional
)
print("‚úÖ NovaModelCustomizer initialized")
print(f"   Model: Nova Lite 2.0")
print(f"   Method: SFT with LoRA")

## 5.1 Data Mixing Configuration (Optional)

Data mixing is a Nova Forge feature that allows you to blend your custom data with Nova's curated datasets. This can improve model performance by maintaining general capabilities while adding domain-specific knowledge.

When data mixing is enabled:
- Your custom data and Nova's curated data are mixed at specified percentages
- The sum of Nova data percentages must equal 100%
- Customer data can range from 0-100%
- If customer data is 100%, no Nova data is used and all nova data should sum to 0

In [None]:
# Example 1: Enable data mixing with 50% customer data and 50% Nova data
# Create customizer
customizer_with_mixing = NovaModelCustomizer(
    model=Model.NOVA_LITE_2,  # Choose your Nova model
    method=TrainingMethod.SFT_LORA,  # Training method
    infra=runtime,  # Runtime configuration
    data_s3_path=train_path,  # Training data path
    output_s3_path=S3_OUTPUT_PATH,  # Output path for artifacts
    mlflow_monitor=mlflow_monitor,  # optional
    data_mixing_enabled=True,
)

# Check default data mixing configuration
customizer_with_mixing.get_data_mixing_config()
"""
{
    "customer_data_percent": 50,
    "nova_code_percent": 1.0,
    "nova_general_percent": 0.10,
    ......
    ......
    "nova_chat_percent": 50
    # all Nova fields sum to 100
}
"""

# Overwrite the data mixing percentages in default config
customizer_with_mixing.set_data_mixing_config(
    {
        "customer_data_percent": 50,  # 50% customer data
        "nova_code_percent": 30,  # 30% Nova code data
        "nova_general_percent": 70,  # 70% Nova general data
        # rest all nova fields are made 0
        # Nova percentages must sum to 100%
    }
)

# Verify your updates
customizer_with_mixing.get_data_mixing_config()
"""
{
    "customer_data_percent": 50,
    "nova_code_percent": 30,
    "nova_general_percent": 70,
    # rest all nova fields are 0
}
}
"""

## Step 6: Start Training

In [None]:
# Define training hyperparameters - You can edit these by navigating to the Nova Customization public documentation, linked in the ../docs/spec.md document.
training_config = {
    "lr": 5e-6,  # Learning rate
    "warmup_steps": 100,  # Warmup steps
    "global_batch_size": 64,  # Batch size
    "max_length": 8192,  # Max sequence length
}

# Start training
training_result = customizer.train(
    job_name="nova-quickstart-training-nova-2", overrides=training_config
)

# "Dry Run" mode is also supported. This feature is useful whenever you want to test or validate inputs and still have a recipe generated, without starting a job.
# customizer.train(
#     job_name="nova-quickstart-training-dry-run",
#     dry_run=True, <-- Set dry_run parameter
#     overrides=training_config
# )

print("\nüöÄ Training job started!")
print(training_result)
print(
    f"   üìç Checkpoint URI where the model will be saved: {training_result.model_artifacts.checkpoint_s3_path}"
)
print(f"   üÜî Job ID: {training_result.job_id}")
print(f"   üìÇ Output Path: {training_result.model_artifacts.output_s3_path}")

# Save job ID for later
job_id = training_result.job_id
escrow_uri = training_result.model_artifacts.checkpoint_s3_path
output_path = training_result.model_artifacts.output_s3_path

## Step 7: Monitor Training Progress

### A) While training is ongoing

In [None]:
# View recent training logs
print("üìã Training Logs:")
print("=" * 80)
customizer.get_logs(limit=50, start_from_head=False)

### B) After Training is completed

In [None]:
job_id = training_result.job_id

In [None]:
monitor = CloudWatchLogMonitor.from_job_id(job_id=job_id, platform=platform)
monitor.show_logs(limit=100, start_from_head=True)

## Step 8: Evaluate the custom Model (After Training Completes)

Evaluation jobs allow you to test your customized model against pre-set or custom benchmarks.

In [None]:
# TODO: Update these values for your environment
S3_BUCKET = S3_BUCKET
S3_DATA_PATH = f"s3://{S3_BUCKET}/demo/input"
S3_OUTPUT_PATH = f"s3://{S3_BUCKET}/demo/output"

infra = SMTJRuntimeManager(
    instance_type="ml.p5.48xlarge",  # Change the instance type if needed (e.g. p5.48xlarge)
    instance_count=1,
)

evaluator = NovaModelCustomizer(
    model=Model.NOVA_LITE_2,  # You can also use your trained model here for eval
    method=TrainingMethod.EVALUATION,
    infra=infra,
    data_s3_path=S3_DATA_PATH,  # The data_s3_path is not used in eval job
    output_s3_path=S3_OUTPUT_PATH,  # This will be your eval output path
)

### Evaluation can be 3 dimensional 
- Using Public Benchmark to check on Models generalizability is maintained or not.
- Using your custom Data to validate models performance on YOUR tasks.
- Using LLM As Judge in domains where response quality is hard to evaluate.

In [None]:
mmlu_eval_result = evaluator.evaluate(
    job_name="eval-test-mmlu",  # The job name you specified
    eval_task=EvaluationTask.MMLU,  # The eval task
)

byod_eval_result = evaluator.evaluate(
    job_name="eval-test-byod",
    eval_task=EvaluationTask.GEN_QA,
    data_s3_path="s3://<data-s3-bucket>/nova-customization/gen_qa.jsonl",  # TODO: Replace with your data path
    # model_path='s3://customer-escrow-<your-model-ckpt-bucket>/your-model-path/' # TODO: Replace with your model path
    overrides={"max_new_tokens": 2048},
)

# byom_eval_result = evaluator.evaluate(
#     job_name='eval-test-byom',
#     eval_task=EvaluationTask.GEN_QA,
#     data_s3_path='s3://<your-byom-dataset-bucket>/input/eval/byom/byom_data.jsonl', # TODO: Replace with your dataset
#     processor={
#         "lambda_arn": "arn:aws:lambda:<region>:<account>:function:<lambda>" # TODO: Your byom lambda
#     }
# )

# llm_judge_eval_result = evaluator.evaluate(
#     job_name='eval-test-llm-judge',
#     eval_task=EvaluationTask.LLM_JUDGE,
#     data_s3_path='s3://<your-llm-judge-dataset-bucket>/input/eval/llm_judge/llm_judge.jsonl' # TODO: Replace with your dataset
# )

In [None]:
print("  üìç Bring Your Own Data Job ID: ", byod_eval_result.job_id)
print("  üìÇ Bring Your Own Data Output Path:", byod_eval_result.eval_output_path)
print("  üìç MMLU Job ID:", mmlu_eval_result.eval_output_path)
print("  üìÇ MMLU Output Path:", mmlu_eval_result.eval_output_path)

In [None]:
# View recent training logs

print("üìã Evaluation Job Logs:")
print("=" * 80)
evaluator.get_logs(limit=50, start_from_head=False)

## Step 9: Deploy Your Model (After Training Completes)

Once training is complete, deploy your model to Amazon Bedrock.

In [None]:
# Get the model artifacts path from training result
# After training completes, use:

# Deploy to Bedrock On-Demand
deployment_result = customizer.deploy(
    endpoint_name="my-custom-nova-model", job_result=training_result
)

print("\nüöÄ Model deployment started!")
print(f"   Endpoint Name: {deployment_result.endpoint.endpoint_name}")
print(f"   Status: {deployment_result.status}")

## Try Additional Training Methods (Optional)

### Continuous Pre-Training (CPT)
* CPT is another training technique offered for Nova model customization.
* Expand this section for example code!
* More information can be found here: [AWS Docs on CPT](https://docs.aws.amazon.com/sagemaker/latest/dg/nova-cpt.html)

In [None]:
# Step 1: Data Creation
import json

cpt_sample_data = [{"text": "AWS stands for Amazon Web Services"}] * 100

# Save sample data locally
with open("cpt_training_data.jsonl", "w") as f:
    for item in cpt_sample_data:
        f.write(json.dumps(item) + "\n")

In [None]:
# Step 2: Load and Save Data to s3
loader = JSONLDatasetLoader(
    text="text",  # Column name for text in your data
)

# Load and save the data to s3.
loader.load("cpt_training_data.jsonl")
train_path = loader.save_data(
    f"{S3_DATA_PATH}/train.jsonl"
)  # S3_DATA_PATH is set up in Step 2 of the SFT example.

In [None]:
# Step 3: Infrastructure Setup
runtime = SMHPRuntimeManager(
    instance_type="ml.p5.48xlarge",
    instance_count=4,
    cluster_name="your-cluster-name",
    namespace="your-namespace",
)

print("‚úÖ Runtime configured for SageMaker HyperPod")
print(f"   Instance Type: {runtime.instance_type}")
print(f"   Instance Count: {runtime.instance_count}")

# Step 4: Create customizer
customizer = NovaModelCustomizer(
    model=Model.NOVA_LITE_2,  # Choose your Nova model
    method=TrainingMethod.CPT,  # Training method
    infra=runtime,  # Runtime configuration
    data_s3_path=train_path,  # Training data path
    output_s3_path=S3_OUTPUT_PATH,  # Output path for artifacts, set up in Step 2 of the SFT example.
)
print("‚úÖ NovaModelCustomizer initialized")
print(f"   Model: Nova Lite 2.0")
print(f"   Method: CPT")

In [None]:
# Step 5: Training

# Define training hyperparameters
training_config = {
    "lr": 5e-6,  # Learning rate
    "warmup_steps": 100,  # Warmup steps
    "global_batch_size": 64,  # Batch size
    "max_length": 8192,  # Max sequence length
}

# Start training
training_result = customizer.train(
    job_name="cpt-quickstart-training-nova-2", overrides=training_config
)

print("\nüöÄ Training job started!")
print(training_result)
print(
    f"   üìç Checkpoint URI where the model will be saved: {training_result.model_artifacts.checkpoint_s3_path}"
)
print(f"   üÜî Job ID: {training_result.job_id}")
print(f"   üìÇ Output Path: {training_result.model_artifacts.output_s3_path}")

### Direct Preference Optimization (DPO)

- DPO is another fine-tuning method that can be used to train models. Below is an example dataset you can download and the commands necessary to run it.
- Expand this section for example code!
- More information can be found here: [AWS Docs on DPO](https://docs.aws.amazon.com/sagemaker/latest/dg/nova-dpo.html)

In [None]:
# Step 1: Data Creation
import json

dpo_sample_data = [
    {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": "Question: You are configuring an AWS application that needs to handle increased traffic. If scaling horizontally adds more instances to distribute load, what does scaling vertically do to handle increased demand?"
                    }
                ],
            },
            {
                "role": "assistant",
                "candidates": [
                    {
                        "content": [
                            {
                                "text": "Scaling vertically increases the resources (CPU, memory, storage) of existing instances to handle more load."
                            }
                        ],
                        "preferenceLabel": "preferred",
                    },
                    {
                        "content": [
                            {
                                "text": "Scaling vertically distributes the workload across multiple availability zones."
                            }
                        ],
                        "preferenceLabel": "non-preferred",
                    },
                ],
            },
        ]
    }
] * 100

# Save sample data locally
with open("dpo_training_data.jsonl", "w") as f:
    for item in dpo_sample_data:
        f.write(json.dumps(item) + "\n")

In [None]:
# Step 2: Load and Save Data to s3
loader = JSONLDatasetLoader()  # DPO dataset transformation isn't supported yet, so column mappings don't need to be provided here.

# Load and save the data to s3.
loader.load("dpo_training_data.jsonl")
train_path = loader.save_data(
    f"{S3_DATA_PATH}/dpo_train.jsonl"
)  # S3_DATA_PATH is set up in Step 2 of the SFT example.

In [None]:
# Step 3: Infrastructure Setup
runtime = SMHPRuntimeManager(
    instance_type="ml.p5.48xlarge",
    instance_count=2,
    cluster_name="your-cluster-name",
    namespace="your-namespace",
)

print("‚úÖ Runtime configured for SageMaker HyperPod")
print(f"   Instance Type: {runtime.instance_type}")
print(f"   Instance Count: {runtime.instance_count}")

# Step 4: Create customizer
customizer = NovaModelCustomizer(
    model=Model.NOVA_MICRO,  # Choose your Nova model
    method=TrainingMethod.DPO_LORA,  # Training method
    infra=runtime,  # Runtime configuration
    data_s3_path=train_path,  # Training data path
    output_s3_path=S3_OUTPUT_PATH,  # Output path for artifacts, set up in Step 2 of the SFT example.
)
print("‚úÖ NovaModelCustomizer initialized")
print(f"   Model: Nova Micro")
print(f"   Method: DPO")

In [None]:
# Step 5: Training

# Define training hyperparameters
training_config = {
    "lr": 1e-6,  # Learning rate
    "warmup_steps": 10,  # Warmup steps
    "max_length": 4096,  # Max sequence length
}

# Start training
training_result = customizer.train(
    job_name="dpo-quickstart-training-nova", overrides=training_config
)

print("\nüöÄ Training job started!")
print(training_result)
print(
    f"   üìç Checkpoint URI where the model will be saved: {training_result.model_artifacts.checkpoint_s3_path}"
)
print(f"   üÜî Job ID: {training_result.job_id}")
print(f"   üìÇ Output Path: {training_result.model_artifacts.output_s3_path}")

## Summary

You've completed the basic workflow:

‚úÖ **Loaded and prepared data** in JSONL format  
‚úÖ **Transformed data** to Nova's Converse format  
‚úÖ **Configured runtime** infrastructure (SMTJ or HyperPod)  
‚úÖ **Started training** with custom hyperparameters  
‚úÖ **Monitored progress** via logs  
‚úÖ **Deployed the model** to Amazon Bedrock  

## Next Steps

- **Use your own data**: Replace the sample data with your training dataset
- **Tune hyperparameters**: Adjust learning rate, batch size, epochs, etc.
- **Evaluate performance**: Use `customizer.evaluate()` to benchmark your model
- **Run batch inference**: Process large datasets with `customizer.batch_inference()`

## Available Models

- `Model.NOVA_MICRO`
- `Model.NOVA_LITE`
- `Model.NOVA_PRO`
- `Model.NOVA_LITE_2`

## Available Training Methods

- `TrainingMethod.CPT` - Continued Pre-Training
- `TrainingMethod.DPO_LORA` - Direct Preference Optimization with LoRA
- `TrainingMethod.DPO_FULL` - Full Direct Preference Optimization
- `TrainingMethod.SFT_LORA` - Supervised Fine-Tuning with LoRA
- `TrainingMethod.SFT_FULL` - Full supervised fine-tuning
- `TrainingMethod.RFT_LORA` - Reinforcement Fine-Tuning with LoRA
- `TrainingMethod.RFT_FULL` - Full reinforcement Fine-Tuning
- `TrainingMethod.Evaluation` - Model evaluation

## Resources

- [Amazon Nova Models](https://aws.amazon.com/bedrock/nova/)
- [SageMaker Documentation](https://docs.aws.amazon.com/sagemaker/)
- [Amazon Bedrock](https://aws.amazon.com/bedrock/)