# Reinforcement Fine-Tuning Amazon Nova 2.0 Lite with FinQA

This notebook walks through training an Amazon Nova model using Reinforcement Fine-Tuning (RFT) on the [FinQA](https://huggingface.co/datasets/ibm-research/finqa) FinQA dataset.

## What's RFT?

Traditional fine-tuning shows a model examples and says "produce outputs like this." RFT takes a different approach: it lets the model generate its own responses, then uses a reward signal to reinforce good outputs and discourage bad ones. Think of it like training with a coach who gives feedback rather than just copying from a textbook.

For math problems, this works particularly well because we can automatically verify if an answer is correct—no human labeling needed.

## What's FinQA?

[FinQA](https://github.com/czyssrs/FinQA) is a dataset of 8,281 question-answer pairs derived from 2,789 earnings reports of S&P 500 companies. Each example contains:

- **Textual context**: Paragraphs from financial reports describing business performance
- **Structured data**: Tables with financial metrics (revenue, expenses, ratios, etc.)
- **Multi-step reasoning questions**: Questions requiring numerical calculations across both text and tables

Unlike simple QA datasets, FinQA requires models to:
1. Locate relevant numbers across text and tables
2. Determine the correct mathematical operations (addition, subtraction, division, percentage change)
3. Execute multi-step calculations to arrive at the final answer

This makes it an ideal benchmark for testing financial reasoning capabilities—and a great candidate for RFT, since answers can be automatically verified.



> *Example problem:*

> **Context:** The following table shows the breakdown of net revenues by segment.
>
> | $ in millions | 2018 | 2017 | 2016 |
> |---------------|------|------|------|
> | Investment Banking | 7862 | 7371 | 6273 |
> | Institutional Client Services | 13482 | 11802 | 14342 |
>
> *What is the percentage change in investment banking revenue from 2017 to 2018?*
>
> **Answer:** 6.66%

## What we'll build

1. Prepare FinQA data in the format Bedrock RFT expects
2. Deploy a Lambda function that scores model responses (correct answer = reward)
3. Kick off an RFT training job on Amazon Bedrock
4. Monitor the job until completion

By the end, you'll have a Nova model that's better at step-by-step math reasoning.

## Prerequisites: SageMaker Role Permissions

**NOTE:** If you are running this notebook using an AWS Profile with Admin you can skip this cell...

....otherwise this Jupyter notebook requires your SageMaker execution role to have these IAM permissions:

| Service | Actions | Resources | Why |
|---------|---------|-----------|-----|
| **S3** | `PutObject`, `GetObject`, `ListBucket`, `DeleteObject` | `arn:aws:s3:::YOUR-BUCKET/*` and `arn:aws:s3:::YOUR-BUCKET` | Upload/download training data |
| **IAM** | `CreateRole`, `GetRole`, `AttachRolePolicy`, `PutRolePolicy`, **`PassRole`** | `arn:aws:iam::ACCOUNT:role/FINQA-Lambda-Role`, `arn:aws:iam::ACCOUNT:role/BedrockRFTRole` | Create Lambda & Bedrock roles |
| **Lambda** | `CreateFunction`, `GetFunction`, `UpdateFunctionCode`, `InvokeFunction` | `arn:aws:lambda:REGION:ACCOUNT:function:finqa-reward-function` | Deploy reward function |
| **Bedrock** | `CreateModelCustomizationJob`, `GetModelCustomizationJob` | `*` | Start/monitor training |
| **STS** | `GetCallerIdentity` | `*` | Get account info |

### To Add These Permissions:

1. Go to [IAM Console](https://console.aws.amazon.com/iam) (with Admin access) → Roles → Your SageMaker role
2. **Add permissions** → **Create inline policy** → **JSON** tab

**Critical**: Ensure `iam:PassRole` is included - this allows Bedrock to assume the training role.

If you get `AccessDenied` errors while running the notebook, you're missing one of these permissions.

**Once you've updated your permissions, ensure you restart your notebook kernel to ensure the changes are propagated**

---
## 0. Install Dependencies

In [None]:
%pip install -qU boto3 botocore

---
## 1. Configuration & Data Prep

First, set your AWS region, S3 bucket, and profile. Then we'll pull FinQA from GitHub, format it for Bedrock RFT, and upload to S3.

The key formatting requirement: each training example needs a `prompt` and metadata containing the `ground_truth` answer that our reward function will check against.

In [None]:
import sys
sys.path.insert(0, "../..")

import boto3
import json
import time
import os
import random
import urllib.request

from helpers import (
    create_lambda_deployment_package,
    cleanup_lambda_deployment_package
)

# ============== UPDATE THESE VALUES ==============
AWS_REGION = "us-east-1"
S3_BUCKET = "your-bucket-name"
AWS_PROFILE = None  # Set to your profile name, or None for default credentials
# =================================================

# Create session
session = boto3.Session(profile_name=AWS_PROFILE, region_name=AWS_REGION) if AWS_PROFILE else boto3.Session(region_name=AWS_REGION)
AWS_ACCOUNT_ID = session.client('sts').get_caller_identity()['Account']

# Dataset configuration
DATASET_NAME = "finqa"
FINQA_BASE_URL = "https://raw.githubusercontent.com/czyssrs/FinQA/main/dataset"
TRAIN_SAMPLES = 6251  # Number of samples from train.json (entire dataset)
VAL_SAMPLES = 883     # Number of samples from dev.json
TEST_SAMPLES = 1147    # Number of samples from test.json
LOCAL_DATA_DIR = "../../tmp-data"

assert S3_BUCKET != "your-bucket-name", "Please update S3_BUCKET with your own bucket name"

# S3 paths
S3_TRAINING_DATA = f"s3://{S3_BUCKET}/rft-data/datasets/{DATASET_NAME}/train.jsonl"
S3_VALIDATION_DATA = f"s3://{S3_BUCKET}/rft-data/datasets/{DATASET_NAME}/val.jsonl"
S3_OUTPUT_PATH = f"s3://{S3_BUCKET}/rft-output/"

# Resource names
LAMBDA_FUNCTION_NAME = f"{DATASET_NAME}-reward-function"
LAMBDA_ROLE_NAME = f"{DATASET_NAME.upper()}-Lambda-Role"
BEDROCK_ROLE_NAME = "BedrockRFTRole"
REWARD_FUNCTION_FILE = f"../../reward-functions/{DATASET_NAME}_rew_func.py"
REWARD_FUNCTION_MODULE = f"{DATASET_NAME}_rew_func"

# Model configuration
BASE_MODEL_ID = f"arn:aws:bedrock:{AWS_REGION}::foundation-model/amazon.nova-2-lite-v1:0:256k"
CUSTOM_MODEL_NAME = f"{DATASET_NAME}-nova-lite-rft-{int(time.time())}"
JOB_NAME = f"{DATASET_NAME}-rft-job-{int(time.time())}"

# Initialize AWS clients
s3_client = session.client('s3')
bedrock_client = session.client('bedrock')
lambda_client = session.client('lambda')
iam_client = session.client('iam')

In [None]:
# --- Preprocess FinQA ---
def preprocess_finqa(base_url, output_dir, train_samples=256, val_samples=32, test_samples=32):
    os.makedirs(output_dir, exist_ok=True)

    def download_json(filename):
        url = f"{base_url}/{filename}"
        print(f"Downloading {url}...")
        with urllib.request.urlopen(url) as response:
            return json.loads(response.read().decode())

    def format_table(table):
        if not table:
            return ""
        return "\n".join(" | ".join(str(cell) for cell in row) for row in table)

    def format_row(item, idx, split):
        qa = item.get("qa", {})
        question = qa.get("question", "")
        answer = str(qa.get("exe_ans", qa.get("answer", "")))

        pre_text = " ".join(item.get("pre_text", []))
        post_text = " ".join(item.get("post_text", []))
        table_str = format_table(item.get("table", []))

        context_parts = []
        if pre_text:
            context_parts.append(f"Context:\n{pre_text}")
        if table_str:
            context_parts.append(f"\nTable:\n{table_str}")
        if post_text:
            context_parts.append(f"\n{post_text}")
        context = "\n".join(context_parts)

        user_content = f"""{context}

Question: {question}

Solve this step by step. Show your reasoning, then provide your final answer in the format: ANSWER: <your answer>"""

        return {
            "messages": [
                {"role": "system", "content": "You are a financial analyst who answers questions about financial data and tables. Provide step-by-step reasoning and calculations."},
                {"role": "user", "content": user_content}
            ],
            "reference_answer": {"answer": answer},
            "task_id": f"finqa_{split}_{idx}",
            "domain": "finance",
            "data_source": "finqa",
            "original_question": question,
            "original_program": qa.get("program", "")
        }

    def write_split(data, num_samples, filename, split_name):
        random.seed(42)
        samples = random.sample(data, min(num_samples, len(data)))
        with open(f"{output_dir}/{filename}", "w") as f:
            for i, item in enumerate(samples):
                f.write(json.dumps(format_row(item, i, split_name)) + "\n")
        print(f"✓ Created {output_dir}/{filename} ({len(samples)} samples)")
        return len(samples)

    train_data = download_json("train.json")
    dev_data = download_json("dev.json")
    test_data = download_json("test.json")

    train_count = write_split(train_data, train_samples, "train.jsonl", "train")
    val_count = write_split(dev_data, val_samples, "val.jsonl", "val")
    test_count = write_split(test_data, test_samples, "test.jsonl", "test")

    return train_count, val_count, test_count

print("Preprocessing FinQA dataset...")
train_size, val_size, test_size = preprocess_finqa(FINQA_BASE_URL, LOCAL_DATA_DIR, TRAIN_SAMPLES, VAL_SAMPLES, TEST_SAMPLES)

print("\nUploading to S3...")
for local_file, s3_key in [
    ("train.jsonl", f"rft-data/datasets/{DATASET_NAME}/train.jsonl"),
    ("val.jsonl", f"rft-data/datasets/{DATASET_NAME}/val.jsonl"),
    ("test.jsonl", f"rft-data/datasets/{DATASET_NAME}/test.jsonl")
]:
    s3_client.upload_file(f"{LOCAL_DATA_DIR}/{local_file}", S3_BUCKET, s3_key)
    print(f"✓ Uploaded {local_file}")

print(f"\n✓ Ready | {train_size} train / {val_size} val / {test_size} test")

In [None]:
# Clean up temporary local data
import shutil

print("\nCleaning up temporary files...")
if os.path.exists(LOCAL_DATA_DIR):
    shutil.rmtree(LOCAL_DATA_DIR)
    print(f"✓ Removed {LOCAL_DATA_DIR}")
else:
    print(f"✓ No temporary files to clean")

---
## 2. Deploy the Reward Function

The reward function is the "coach" in our RFT setup. During training, Bedrock generates candidate responses and sends them to this Lambda. The Lambda extracts the model's answer, compares it to the ground truth, and returns a score (1.0 for correct, 0.0 for wrong).

We also create the IAM roles that Bedrock and Lambda need to do their jobs.

In [None]:
# Create Lambda execution role
print("Creating Lambda execution role...")

lambda_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [{"Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]
}

try:
    response = iam_client.create_role(
        RoleName=LAMBDA_ROLE_NAME,
        AssumeRolePolicyDocument=json.dumps(lambda_trust_policy),
        Description=f"Execution role for {DATASET_NAME} reward function"
    )
    lambda_role_arn = response['Role']['Arn']
    iam_client.attach_role_policy(RoleName=LAMBDA_ROLE_NAME, PolicyArn='arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole')
    print(f"✓ Created role: {LAMBDA_ROLE_NAME}")
    print("Waiting 10s for role propagation...")
    time.sleep(10)
except iam_client.exceptions.EntityAlreadyExistsException:
    lambda_role_arn = iam_client.get_role(RoleName=LAMBDA_ROLE_NAME)['Role']['Arn']
    print(f"✓ Using existing role: {LAMBDA_ROLE_NAME}")

# Package and deploy Lambda
lambda_zip_content = create_lambda_deployment_package(
    source_file=REWARD_FUNCTION_FILE,
    zip_filename="lambda_deployment.zip",
    archive_name=f"{REWARD_FUNCTION_MODULE}.py"
)

print(f"\nDeploying Lambda: {LAMBDA_FUNCTION_NAME}...")
try:
    lambda_client.get_function(FunctionName=LAMBDA_FUNCTION_NAME)
    lambda_client.update_function_code(FunctionName=LAMBDA_FUNCTION_NAME, ZipFile=lambda_zip_content)
    waiter = lambda_client.get_waiter('function_updated_v2')
    waiter.wait(FunctionName=LAMBDA_FUNCTION_NAME)
    print("✓ Updated existing function")
except lambda_client.exceptions.ResourceNotFoundException:
    lambda_client.create_function(
        FunctionName=LAMBDA_FUNCTION_NAME,
        Runtime='python3.11',
        Role=lambda_role_arn,
        Handler=f"{REWARD_FUNCTION_MODULE}.lambda_handler",
        Code={'ZipFile': lambda_zip_content},
        Timeout=300,
        MemorySize=512
    )
    print("✓ Created new function")

waiter = lambda_client.get_waiter('function_active_v2')
waiter.wait(FunctionName=LAMBDA_FUNCTION_NAME)
lambda_arn = lambda_client.get_function(FunctionName=LAMBDA_FUNCTION_NAME)['Configuration']['FunctionArn']
print(f"✓ Lambda ready: {lambda_arn}")

# Create Bedrock role
print(f"\nCreating Bedrock role: {BEDROCK_ROLE_NAME}...")

bedrock_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [{"Effect": "Allow", "Principal": {"Service": "bedrock.amazonaws.com"}, "Action": "sts:AssumeRole"}]
}

bedrock_permissions = {
    "Version": "2012-10-17",
    "Statement": [
        {"Effect": "Allow", "Action": ["s3:GetObject", "s3:ListBucket"], "Resource": [f"arn:aws:s3:::{S3_BUCKET}/*", f"arn:aws:s3:::{S3_BUCKET}"]},
        {"Effect": "Allow", "Action": "s3:PutObject", "Resource": f"arn:aws:s3:::{S3_BUCKET}/rft-output/*"},
        {"Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": lambda_arn}
    ]
}

try:
    response = iam_client.create_role(
        RoleName=BEDROCK_ROLE_NAME,
        AssumeRolePolicyDocument=json.dumps(bedrock_trust_policy),
        Description="Execution role for Bedrock RFT"
    )
    bedrock_role_arn = response['Role']['Arn']
    print(f"✓ Created role: {BEDROCK_ROLE_NAME}")
except iam_client.exceptions.EntityAlreadyExistsException:
    bedrock_role_arn = iam_client.get_role(RoleName=BEDROCK_ROLE_NAME)['Role']['Arn']
    print(f"✓ Using existing role: {BEDROCK_ROLE_NAME}")

iam_client.put_role_policy(RoleName=BEDROCK_ROLE_NAME, PolicyName='BedrockRFTPermissions', PolicyDocument=json.dumps(bedrock_permissions))
print(f"✓ Bedrock role ready: {bedrock_role_arn}")

cleanup_lambda_deployment_package()

#### ---
## 3. Test the Reward Function

Before kicking off a multi-hour training job, let's make sure our reward function actually works. We'll send it a sample response and verify it returns the expected score.

In [None]:
print("Testing reward function...")

test_payload = [{
    "id": "test_001",
    "messages": [
        {"role": "user", "content": "What is the percentage increase from 100 to 125?"},
        {"role": "assistant", "content": "Let me calculate this step by step.\n\nPercentage increase = (New - Old) / Old × 100\n= (125 - 100) / 100 × 100\n= 25%\n\nANSWER: 25"}
    ],
    "reference_answer": {"answer": "25"}
}]

response = lambda_client.invoke(
    FunctionName=LAMBDA_FUNCTION_NAME,
    InvocationType='RequestResponse',
    Payload=json.dumps(test_payload)
)

result = json.loads(response['Payload'].read())
print(json.dumps(result, indent=2))

if 'errorMessage' in result:
    print(f"\n✗ Error: {result['errorMessage']}")
elif isinstance(result, list) and result[0].get('aggregate_reward_score') == 1.0:
    print("\n✓ Reward function working correctly!")
else:
    print("\n⚠ Unexpected result - check the output above")


---
## 4. Start the RFT Training Job

Now for the main event. We'll create a model customization job that:
- Takes our base Nova model
- Trains it on FinQA using reinforcement learning
- Uses our Lambda to score responses

Training typically takes several hours depending on dataset size and hyperparameters.

In [None]:
print("Creating RFT training job...")
print(f"  Job: {JOB_NAME}")
print(f"  Model: {CUSTOM_MODEL_NAME}")
print(f"  Base: {BASE_MODEL_ID}")

response = bedrock_client.create_model_customization_job(
    jobName=JOB_NAME,
    customModelName=CUSTOM_MODEL_NAME,
    roleArn=bedrock_role_arn,
    baseModelIdentifier=BASE_MODEL_ID,
    customizationType='REINFORCEMENT_FINE_TUNING',
    trainingDataConfig={'s3Uri': S3_TRAINING_DATA},
    validationDataConfig={'validators': [{'s3Uri': S3_VALIDATION_DATA}]},
    outputDataConfig={'s3Uri': S3_OUTPUT_PATH},
    customizationConfig={
        'rftConfig': {
            'graderConfig': {'lambdaGrader': {'lambdaArn': lambda_arn}},
            'hyperParameters': {
                'batchSize': 32,
                'epochCount': 1,
                'evalInterval': 5,
                'inferenceMaxTokens': 1000,
                'learningRate': 0.00005,
                'maxPromptLength': 4000,
                'reasoningEffort': 'high',
                'trainingSamplePerPrompt': 4
            }
        }
    }
)

print(f"\n✓ Job created: {response['jobArn']}")

---
## 5. Monitor Training Progress

Run this cell periodically to check on your training job. Status will progress through: `InProgress` → `Completed` (or `Failed`).

In [None]:
response = bedrock_client.get_model_customization_job(jobIdentifier=JOB_NAME)
print(f"Job: {JOB_NAME}")
print(f"Status: {response['status']}")

if response['status'] == 'Completed' and 'outputModelArn' in response:
    print(f"\n✓ Training complete!")
    print(f"  Model ARN: {response['outputModelArn']}")
elif response['status'] == 'Failed':
    print(f"\n✗ Training failed: {response.get('failureMessage', 'Unknown error')}")
elif response['status'] == 'InProgress':
    print("\n⏳ Still training... run this cell again to check progress")


## Conclusion

Congratulations, you've successfully launched a Reinforcement Fine-Tuning job for Amazon Nova on the FinQA dataset.

### What You've Built

- **Preprocessed FinQA dataset** into Bedrock RFT format  
- **Deployed a Lambda reward function** that scores model responses  
- **Created IAM roles** for Lambda and Bedrock execution  
- **Started an RFT training job** with customized hyperparameters  

### Next Steps

Once your training job completes (check status in cell above):

1. **Test your fine-tuned model** via the Bedrock API using the model ARN
2. **Evaluate performance** on the held-out test set (`test.jsonl`)
3. **Compare results** against the base Nova model
4. **Experiment with hyperparameters** (learning rate, batch size, epochs) for better performance


### Learn More

- [Amazon Bedrock RFT Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/reinforcement-fine-tuning.html)
- [Amazon Nova 2 Lite](https://docs.aws.amazon.com/ai/responsible-ai/nova-2-lite/overview.html)
- [FinQA Dataset on GitHub](https://github.com/czyssrs/FinQA)