# Reinforcement Fine-Tuning Amazon Nova 2.0 Lite with GSM8K

This notebook walks through training an Amazon Nova model using Reinforcement Fine-Tuning (RFT) on the [GSM8K](https://huggingface.co/datasets/openai/gsm8k) math dataset.

## What's RFT?

Traditional fine-tuning shows a model examples and says "produce outputs like this." RFT takes a different approach: it lets the model generate its own responses, then uses a reward signal to reinforce good outputs and discourage bad ones. Think of it like training with a coach who gives feedback rather than just copying from a textbook.

For math problems, this works particularly well because we can automatically verify if an answer is correct—no human labeling needed.

## What's GSM8K?

GSM8K (Grade School Math 8K) is a dataset of ~8,000 grade-school math word problems. Each problem requires multi-step reasoning to solve. It's become a standard benchmark for testing whether language models can actually "think" through problems rather than just pattern-match.

Example problem:
> *Janet's ducks lay 16 eggs per day. She eats three for breakfast and bakes muffins with four. She sells the rest at $2 each. How much does she make daily?*

## What we'll build

1. Prepare GSM8K data in the format Bedrock RFT expects
2. Deploy a Lambda function that scores model responses (correct answer = reward)
3. Kick off an RFT training job on Amazon Bedrock
4. Monitor the job until completion

By the end, you'll have a Nova model that's better at step-by-step math reasoning.

## Prerequisites: SageMaker Role Permissions

**NOTE:** If you are running this notebook using an AWS Profile with Admin you can skip this cell...

....otherwise this Jupyter notebook requires your SageMaker execution role to have these IAM permissions:

| Service | Actions | Resources | Why |
|---------|---------|-----------|-----|
| **S3** | `PutObject`, `GetObject`, `ListBucket`, `DeleteObject` | `arn:aws:s3:::YOUR-BUCKET/*` and `arn:aws:s3:::YOUR-BUCKET` | Upload/download training data |
| **IAM** | `CreateRole`, `GetRole`, `AttachRolePolicy`, `PutRolePolicy`, **`PassRole`** | `arn:aws:iam::ACCOUNT:role/GSMBK-Lambda-Role`, `arn:aws:iam::ACCOUNT:role/BedrockRFTRole` | Create Lambda & Bedrock roles |
| **Lambda** | `CreateFunction`, `GetFunction`, `UpdateFunctionCode`, `InvokeFunction` | `arn:aws:lambda:REGION:ACCOUNT:function:gsm8k-reward-function` | Deploy reward function |
| **Bedrock** | `CreateModelCustomizationJob`, `GetModelCustomizationJob` | `*` | Start/monitor training |
| **STS** | `GetCallerIdentity` | `*` | Get account info |

### To Add These Permissions:

1. Go to [IAM Console](https://console.aws.amazon.com/iam) (with Admin access) → Roles → Your SageMaker role
2. **Add permissions** → **Create inline policy** → **JSON** tab

**Critical**: Ensure `iam:PassRole` is included - this allows Bedrock to assume the training role.

If you get `AccessDenied` errors while running the notebook, you're missing one of these permissions.

**Once you've updated your permissions, ensure you restart your notebook kernel to ensure the changes are propagated**

---
## 0. Install Dependencies

We need `datasets` to pull GSM8K from HuggingFace, and up-to-date AWS SDK packages.

In [None]:
%pip install -qU datasets boto3 botocore

---
## 1. Configuration & Data Prep

First, set your AWS region, S3 bucket, and profile. Then we'll pull GSM8K from HuggingFace, format it for Bedrock RFT, and upload to S3.

The key formatting requirement: each training example needs a `prompt` (the math question) and metadata containing the `ground_truth` answer that our reward function will check against.

In [None]:
import sys
sys.path.insert(0, "../..")

import boto3
import json
import time
import os
import re
from datasets import load_dataset

from helpers import (
    create_lambda_deployment_package,
    cleanup_lambda_deployment_package
)

# ============== UPDATE THESE VALUES ==============
AWS_REGION = "us-east-1"
S3_BUCKET = "your-bucket-name"
AWS_PROFILE = None  # Set to your profile name, or None for default credentials
# =================================================

# Create session
session = boto3.Session(profile_name=AWS_PROFILE, region_name=AWS_REGION) if AWS_PROFILE else boto3.Session(region_name=AWS_REGION)
AWS_ACCOUNT_ID = session.client('sts').get_caller_identity()['Account']

# Dataset configuration
DATASET_NAME = "gsm8k"
HF_DATASET = "openai/gsm8k"
TOTAL_SAMPLES = None  # Set to None to use all available data, or an integer to limit
LOCAL_DATA_DIR = "../../tmp-data"

assert S3_BUCKET != "your-bucket-name", "Please update S3_BUCKET with your own bucket name"
S3_OUTPUT_PATH = f"s3://{S3_BUCKET}/rft-output/"

# Resource names
LAMBDA_FUNCTION_NAME = f"{DATASET_NAME}-reward-function"
LAMBDA_ROLE_NAME = f"{DATASET_NAME.upper()}-Lambda-Role"
BEDROCK_ROLE_NAME = f"BedrockRFT-{DATASET_NAME}-Role"
REWARD_FUNCTION_FILE = f"../../reward-functions/{DATASET_NAME}_rew_func.py"
REWARD_FUNCTION_MODULE = f"{DATASET_NAME}_rew_func"

# Model configuration
BASE_MODEL_ID = f"arn:aws:bedrock:{AWS_REGION}::foundation-model/amazon.nova-2-lite-v1:0:256k"

# Initialize AWS clients
s3_client = session.client('s3')
bedrock_client = session.client('bedrock')
lambda_client = session.client('lambda')
iam_client = session.client('iam')


In [None]:
def format_size(n):
    """Format sample count as human-readable string (e.g., 7k, 1.2k)."""
    if n >= 1000:
        return f"{n/1000:.0f}k" if n % 1000 == 0 else f"{n/1000:.1f}k"
    return str(n)

# --- Preprocess GSM8K ---
def preprocess_gsm8k(hf_path, total_samples, output_dir, train_ratio=0.8, val_ratio=0.1):
    os.makedirs(output_dir, exist_ok=True)
    ds = load_dataset(hf_path, "main")

    # Use all data if total_samples is None
    available = len(ds["train"])
    total = min(total_samples, available) if total_samples else available

    train_size = int(total * train_ratio)
    val_size = int(total * val_ratio)
    test_size = total - train_size - val_size

    def extract_answer(answer_text):
        match = re.search(r'####\s*(-?\d+(?:,\d+)*)', answer_text)
        return match.group(1).replace(',', '') if match else ""

    def format_row(row, idx, split):
        final_answer = extract_answer(row['answer'])

        # Extract reasoning steps from the answer
        steps = []
        if '####' in row['answer']:
            reasoning_part = row['answer'].split('####')[0].strip()
            steps = [s.strip() for s in reasoning_part.split('\n') if s.strip()]

        return {
            "messages": [
                {"role": "system", "content": "You are a helpful math tutor who solves word problems step by step."},
                {"role": "user", "content": f"{row['question']} Let's think step by step and output the final answer after \"####\"."}
            ],
            "reference_answer": {
                "final_answer": final_answer,
                "steps": steps if steps else None
            },
            "task_id": f"gsm8k_{split}_{idx}",
            "domain": "math",
            "difficulty_level": "grade_school",
            "data_source": hf_path,
            "original_question": row['question'],
            "original_answer": row['answer']
        }

    def write_split(data, start_idx, size, filename, split_name):
        with open(f"{output_dir}/{filename}", "w") as f:
            for i, row in enumerate(data.select(range(start_idx, start_idx + size))):
                f.write(json.dumps(format_row(row, i, split_name)) + "\n")
        print(f"✓ Created {output_dir}/{filename} ({size} samples)")

    hf_train = ds["train"].shuffle(seed=42)
    write_split(hf_train, 0, train_size, "train.jsonl", "train")
    write_split(hf_train, train_size, val_size, "val.jsonl", "val")
    write_split(hf_train, train_size + val_size, test_size, "test.jsonl", "test")

    return train_size, val_size, test_size

print("Preprocessing GSM8K dataset...")
train_size, val_size, test_size = preprocess_gsm8k(HF_DATASET, TOTAL_SAMPLES, LOCAL_DATA_DIR)

# S3 paths with sample counts in filenames
S3_TRAINING_DATA = f"s3://{S3_BUCKET}/rft-data/datasets/{DATASET_NAME}/train-{format_size(train_size)}.jsonl"
S3_VALIDATION_DATA = f"s3://{S3_BUCKET}/rft-data/datasets/{DATASET_NAME}/val-{format_size(val_size)}.jsonl"
S3_TEST_DATA = f"s3://{S3_BUCKET}/rft-data/datasets/{DATASET_NAME}/test-{format_size(test_size)}.jsonl"

print("\nUploading to S3...")
for local_file, s3_uri in [
    ("train.jsonl", S3_TRAINING_DATA),
    ("val.jsonl", S3_VALIDATION_DATA),
    ("test.jsonl", S3_TEST_DATA)
]:
    s3_key = '/'.join(s3_uri.split('/')[3:])
    s3_client.upload_file(f"{LOCAL_DATA_DIR}/{local_file}", S3_BUCKET, s3_key)
    print(f"✓ Uploaded {s3_uri.split('/')[-1]}")

print(f"\n✓ Ready | {train_size} train / {val_size} val / {test_size} test")

In [None]:
# Clean up temporary local data
import shutil

print("\nCleaning up temporary files...")
if os.path.exists(LOCAL_DATA_DIR):
    shutil.rmtree(LOCAL_DATA_DIR)
    print(f"✓ Removed {LOCAL_DATA_DIR}")
else:
    print(f"✓ No temporary files to clean")

---
## 2. Deploy the Reward Function

The reward function is the "coach" in our RFT setup. During training, Bedrock generates candidate responses and sends them to this Lambda. The Lambda extracts the model's answer, compares it to the ground truth, and returns a score (1.0 for correct, 0.0 for wrong).

We also create the IAM roles that Bedrock and Lambda need to do their jobs.

In [None]:
# Create Lambda execution role
print("Creating Lambda execution role...")

lambda_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [{"Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]
}

try:
    response = iam_client.create_role(
        RoleName=LAMBDA_ROLE_NAME,
        AssumeRolePolicyDocument=json.dumps(lambda_trust_policy),
        Description=f"Execution role for {DATASET_NAME} reward function"
    )
    lambda_role_arn = response['Role']['Arn']
    iam_client.attach_role_policy(RoleName=LAMBDA_ROLE_NAME, PolicyArn='arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole')
    print(f"✓ Created role: {LAMBDA_ROLE_NAME}")
    print("Waiting 10s for role propagation...")
    time.sleep(10)
except iam_client.exceptions.EntityAlreadyExistsException:
    lambda_role_arn = iam_client.get_role(RoleName=LAMBDA_ROLE_NAME)['Role']['Arn']
    print(f"✓ Using existing role: {LAMBDA_ROLE_NAME}")

# Package and deploy Lambda
lambda_zip_content = create_lambda_deployment_package(
    source_file=REWARD_FUNCTION_FILE,
    zip_filename="lambda_deployment.zip",
    archive_name=f"{REWARD_FUNCTION_MODULE}.py"
)

print(f"\nDeploying Lambda: {LAMBDA_FUNCTION_NAME}...")
try:
    lambda_client.get_function(FunctionName=LAMBDA_FUNCTION_NAME)
    lambda_client.update_function_code(FunctionName=LAMBDA_FUNCTION_NAME, ZipFile=lambda_zip_content)
    waiter = lambda_client.get_waiter('function_updated_v2')
    waiter.wait(FunctionName=LAMBDA_FUNCTION_NAME)
    print("✓ Updated existing function")
except lambda_client.exceptions.ResourceNotFoundException:
    lambda_client.create_function(
        FunctionName=LAMBDA_FUNCTION_NAME,
        Runtime='python3.11',
        Role=lambda_role_arn,
        Handler=f"{REWARD_FUNCTION_MODULE}.lambda_handler",
        Code={'ZipFile': lambda_zip_content},
        Timeout=300,
        MemorySize=512
    )
    print("✓ Created new function")

waiter = lambda_client.get_waiter('function_active_v2')
waiter.wait(FunctionName=LAMBDA_FUNCTION_NAME)
lambda_arn = lambda_client.get_function(FunctionName=LAMBDA_FUNCTION_NAME)['Configuration']['FunctionArn']
print(f"✓ Lambda ready: {lambda_arn}")

# Create Bedrock role
print(f"\nCreating Bedrock role: {BEDROCK_ROLE_NAME}...")

bedrock_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [{"Effect": "Allow", "Principal": {"Service": "bedrock.amazonaws.com"}, "Action": "sts:AssumeRole"}]
}

bedrock_permissions = {
    "Version": "2012-10-17",
    "Statement": [
        {"Effect": "Allow", "Action": ["s3:GetObject", "s3:ListBucket"], "Resource": [f"arn:aws:s3:::{S3_BUCKET}/*", f"arn:aws:s3:::{S3_BUCKET}"]},
        {"Effect": "Allow", "Action": "s3:PutObject", "Resource": f"arn:aws:s3:::{S3_BUCKET}/rft-output/*"},
        {"Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": lambda_arn}
    ]
}

try:
    response = iam_client.create_role(
        RoleName=BEDROCK_ROLE_NAME,
        AssumeRolePolicyDocument=json.dumps(bedrock_trust_policy),
        Description="Execution role for Bedrock RFT"
    )
    bedrock_role_arn = response['Role']['Arn']
    print(f"✓ Created role: {BEDROCK_ROLE_NAME}")
except iam_client.exceptions.EntityAlreadyExistsException:
    bedrock_role_arn = iam_client.get_role(RoleName=BEDROCK_ROLE_NAME)['Role']['Arn']
    print(f"✓ Using existing role: {BEDROCK_ROLE_NAME}")

iam_client.put_role_policy(RoleName=BEDROCK_ROLE_NAME, PolicyName='BedrockRFTPermissions', PolicyDocument=json.dumps(bedrock_permissions))
print(f"✓ Bedrock role ready: {bedrock_role_arn}")

cleanup_lambda_deployment_package()

---
## 3. Test the Reward Function

Before kicking off a multi-hour training job, let's make sure our reward function actually works. We'll send it a sample response and verify it returns the expected score.

In [None]:
print("Testing reward function...")

test_payload = [{
    "id": "test_001",
    "messages": [
        {"role": "user", "content": "What is 2 + 2?"},
        {"role": "assistant", "content": "Let me solve this step by step.\n\n2 + 2 = 4\n\n#### 4"}
    ],
    "metadata": {"reference_answer": {"final_answer": "4"}}
}]

response = lambda_client.invoke(
    FunctionName=LAMBDA_FUNCTION_NAME,
    InvocationType='RequestResponse',
    Payload=json.dumps(test_payload)
)

result = json.loads(response['Payload'].read())
print(json.dumps(result, indent=2))

if 'errorMessage' in result:
    print(f"\n✗ Error: {result['errorMessage']}")
elif isinstance(result, list) and result[0].get('aggregate_reward_score') == 1.0:
    print("\n✓ Reward function working correctly!")
else:
    print("\n⚠ Unexpected result - check the output above")

### Test with Real Training & Validation Data

Let's verify the reward function works correctly with actual samples from our dataset by simulating correct model responses.

In [None]:
import random

def load_samples_from_s3(s3_uri, n=5):
    bucket = s3_uri.split('/')[2]
    key = '/'.join(s3_uri.split('/')[3:])
    obj = s3_client.get_object(Bucket=bucket, Key=key)
    lines = obj['Body'].read().decode('utf-8').strip().split('\n')
    return [json.loads(line) for line in random.sample(lines, min(n, len(lines)))]

def simulate_correct_response(sample):
    """Add an assistant message with the correct answer."""
    answer = sample['reference_answer']['final_answer']
    sample_copy = sample.copy()
    sample_copy['messages'] = sample['messages'] + [
        {'role': 'assistant', 'content': f'Working through this step by step...\n\n#### {answer}'}
    ]
    return sample_copy

print('Loading samples from S3...')
train_samples = load_samples_from_s3(S3_TRAINING_DATA, n=5)
val_samples = load_samples_from_s3(S3_VALIDATION_DATA, n=5)

# Simulate correct responses
test_payloads = [simulate_correct_response(s) for s in train_samples + val_samples]

print(f'Testing {len(test_payloads)} samples (5 train + 5 val)...')
response = lambda_client.invoke(
    FunctionName=LAMBDA_FUNCTION_NAME,
    InvocationType='RequestResponse',
    Payload=json.dumps(test_payloads)
)
results = json.loads(response['Payload'].read())

print('\nResults:')
for r in results:
    score = r.get('aggregate_reward_score', 0)
    status = '✓' if score == 1.0 else '✗'
    print(f"  {status} {r['id']}: {score}")

total_score = sum(r.get('aggregate_reward_score', 0) for r in results)
print(f'\nTotal: {total_score}/{len(results)} correct')

if total_score == len(results):
    print('✓ All samples scored correctly!')
else:
    print('⚠ Some samples failed - check reward function logic')

### Analyze Dataset for Hyperparameter Selection

Before training, let's analyze our dataset to set appropriate values for `maxPromptLength` and `inferenceMaxTokens`.

**Key hyperparameters to consider:**

| Parameter | What it controls | Trade-off |
|-----------|-----------------|-----------|
| `maxPromptLength` | Max tokens for input prompts | Higher = more context, but more memory & slower training |
| `inferenceMaxTokens` | Max tokens the model can generate per response | Higher = longer reasoning chains, but slower & more expensive |
| `trainingSamplePerPrompt` | Number of response samples per prompt | More samples = better reward estimation, but slower |
| `batchSize` | Samples per training batch | Larger = more stable gradients, but more memory |
| `epochCount` | Full passes through the dataset | More epochs = more learning, but risk of overfitting |
| `reasoningEffort` | How much "thinking" the model does | Higher = better quality, but slower inference |

**For GSM8K specifically:**
- Prompts are relatively short (math word problems)
- Responses need room for step-by-step reasoning + final answer
- We want `inferenceMaxTokens` large enough for multi-step solutions, but not wastefully large

In [None]:
%pip install tiktoken
import tiktoken

# Load tokenizer (cl100k_base is close to Nova's tokenizer)
try:
    enc = tiktoken.get_encoding('cl100k_base')
except:
    %pip install -q tiktoken
    import tiktoken
    enc = tiktoken.get_encoding('cl100k_base')

def count_tokens(text):
    return len(enc.encode(text))

# Load all training samples
print('Analyzing training data...')
obj = s3_client.get_object(Bucket=S3_BUCKET, Key='/'.join(S3_TRAINING_DATA.split('/')[3:]))
samples = [json.loads(line) for line in obj['Body'].read().decode('utf-8').strip().split('\n')]

# Calculate token counts
prompt_tokens = []
answer_tokens = []

for s in samples:
    # Prompt = system + user messages
    prompt_text = ' '.join(m['content'] for m in s['messages'])
    prompt_tokens.append(count_tokens(prompt_text))

    # Reference answer (what we expect the model to produce)
    answer_tokens.append(count_tokens(s['original_answer']))

# Statistics
import statistics

print(f'\nDataset Statistics ({len(samples)} samples)')
print(f'\nPrompt tokens (input):')
print(f'  Min: {min(prompt_tokens)}, Max: {max(prompt_tokens)}, Mean: {statistics.mean(prompt_tokens):.0f}')
print(f'  P95: {sorted(prompt_tokens)[int(len(prompt_tokens)*0.95)]}, P99: {sorted(prompt_tokens)[int(len(prompt_tokens)*0.99)]}')

print(f'\nAnswer tokens (expected output):')
print(f'  Min: {min(answer_tokens)}, Max: {max(answer_tokens)}, Mean: {statistics.mean(answer_tokens):.0f}')
print(f'  P95: {sorted(answer_tokens)[int(len(answer_tokens)*0.95)]}, P99: {sorted(answer_tokens)[int(len(answer_tokens)*0.99)]}')

# Recommendations
recommended_prompt_len = sorted(prompt_tokens)[int(len(prompt_tokens)*0.99)] * 2  # 2x P99 for safety
recommended_max_tokens = sorted(answer_tokens)[int(len(answer_tokens)*0.99)] * 3  # 3x P99 for reasoning

print(f'\nRecommended hyperparameters:')
print(f'  maxPromptLength: {recommended_prompt_len} (2x P99 prompt length)')
print(f'  inferenceMaxTokens: {recommended_max_tokens} (3x P99 answer length, room for reasoning)')
print(f'\nNote: inferenceMaxTokens should be higher than reference answers since the model')
print(f'      may generate longer reasoning chains during exploration.')

---
## 4. Start the RFT Training Job

Now for the main event. We'll create a model customization job that:
- Takes our base Nova model
- Trains it on GSM8K using reinforcement learning
- Uses our Lambda to score responses

Training typically takes several hours depending on dataset size and hyperparameters.

In [None]:
print("Creating RFT training job...")

# Generate unique model/job names with date and key hyperparams
from datetime import datetime
date_str = datetime.now().strftime('%Y%m%d')
hp_suffix = f"e{1}_bs{32}_lr{5e-5}".replace('.', '').replace('-', '')  # e1_bs32_lr5e05
CUSTOM_MODEL_NAME = f"{DATASET_NAME}-nova-rft-{date_str}-{hp_suffix}"
JOB_NAME = f"{DATASET_NAME}-rft-{date_str}-{int(time.time())}"

print(f"  Job: {JOB_NAME}")
print(f"  Model: {CUSTOM_MODEL_NAME}")
print(f"  Base: {BASE_MODEL_ID}")

response = bedrock_client.create_model_customization_job(
    jobName=JOB_NAME,
    customModelName=CUSTOM_MODEL_NAME,
    roleArn=bedrock_role_arn,
    baseModelIdentifier=BASE_MODEL_ID,
    customizationType='REINFORCEMENT_FINE_TUNING',
    trainingDataConfig={'s3Uri': S3_TRAINING_DATA},
    validationDataConfig={'validators': [{'s3Uri': S3_VALIDATION_DATA}]},
    outputDataConfig={'s3Uri': S3_OUTPUT_PATH},
    customizationConfig={
        'rftConfig': {
            'graderConfig': {'lambdaGrader': {'lambdaArn': lambda_arn}},
            'hyperParameters': {
                'batchSize': 32, # Balances stability with enough updates. If training seems unstable (reward oscillating wildly), try increasing to 64. If you want faster iteration during experimentation, you could try 16.
                'epochCount': 1, # Start with 1, if validation rewards still increasing at the end of training, increase. Risk of overfitting otherwise as gsm8k isn't a large dataset.
                'evalInterval': 10, # Defined across training steps (6k training samples / 32 batch size = ~188 steps per epoch)
                'inferenceMaxTokens': 750,
                'learningRate': 0.00005,
                'maxPromptLength': 310,
                'reasoningEffort': 'high',
                'trainingSamplePerPrompt': 4 # With 6k training prompts, means that our model is seeing 24k responses per epoch.
            }
        }
    }
)

print(f"\n✓ Job created: {response['jobArn']}")

---
## 5. Monitor Training Progress

Run this cell periodically to check on your training job. Status will progress through: `InProgress` → `Completed` (or `Failed`).

In [None]:
response = bedrock_client.get_model_customization_job(jobIdentifier=JOB_NAME)
print(f"Job: {JOB_NAME}")
print(f"Status: {response['status']}")

if response['status'] == 'Completed' and 'outputModelArn' in response:
    print(f"\n✓ Training complete!")
    print(f"  Model ARN: {response['outputModelArn']}")
elif response['status'] == 'Failed':
    print(f"\n✗ Training failed: {response.get('failureMessage', 'Unknown error')}")
elif response['status'] == 'InProgress':
    print("\nStill training... run this cell again to check progress")


## Conclusion

Congratulations, you've successfully launched a Reinforcement Fine-Tuning job for Amazon Nova on the GSM8K math dataset.

### What You've Built

- **Preprocessed GSM8K dataset** into Bedrock RFT format  
- **Deployed a Lambda reward function** that scores model responses  
- **Created IAM roles** for Lambda and Bedrock execution  
- **Started an RFT training job** with customized hyperparameters  

### Next Steps

Once your training job completes (check status in cell above):

1. **Test your fine-tuned model** via the Bedrock API using the model ARN
2. **Evaluate performance** on the held-out test set (`test.jsonl`)
3. **Compare results** against the base Nova model
4. **Experiment with hyperparameters** (learning rate, batch size, epochs) for better performance


### Learn More

- [Amazon Bedrock RFT Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/reinforcement-fine-tuning.html)
- [Amazon Nova 2 Lite](https://docs.aws.amazon.com/ai/responsible-ai/nova-2-lite/overview.html)
- [GSM8K Dataset on HuggingFace](https://huggingface.co/datasets/openai/gsm8k)