# Transaction Categorization Fine-Tuning

This notebook demonstrates how to fine-tune a language model to categorize financial transactions using a multiple-choice approach.

## Overview

Financial transaction categorization is a common task in personal finance apps. This notebook improves upon traditional approaches by:

1. Presenting the model with 5 potential categories (including the correct one)
2. Using a structured format for responses with reasoning and answers
3. Providing a clear system prompt with instructions

## Setup

First, let's install the required dependencies:

In [None]:
# Install required packages
!pip install torch datasets transformers trl accelerate sentencepiece

Let's import the necessary libraries:

In [None]:
import random
import torch
import os
from datasets import Dataset
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer, 
    TrainingArguments
)
from trl import SFTTrainer

## Define System Prompt and Transaction Categories

Let's define the system prompt and the list of valid transaction categories:

In [None]:
# Define system prompt
SYSTEM_PROMPT = """
You are a financial assistant that categorizes transactions.
For each transaction, you will be given the description, amount, and five possible categories.
Choose the most appropriate category from the given options.

Respond in the following format:
<reasoning>
Think step by step about the transaction details and determine the appropriate category from the provided options.
</reasoning>
<answer>
[SELECTED CATEGORY]
</answer>
"""

# Define all valid categories
ALL_CATEGORIES = [
    "Food & Dining", "Shopping", "Transportation", "Entertainment", 
    "Health & Medical", "Groceries", "Insurance", "Bills & Utilities",
    "Auto & Transport", "Travel", "Income", "Transfer", "Education",
    "Personal Care", "Gifts & Donations", "Fees & Charges"
]

print(f"Total categories: {len(ALL_CATEGORIES)}")
print("Sample categories: ", ", ".join(ALL_CATEGORIES[:5]))

## Create Dataset

Next, we'll prepare our dataset of transactions and format it for fine-tuning:

In [None]:
# Define function to get random category options
def get_category_options(correct_category):
    # Remove the correct category from options
    other_categories = [cat for cat in ALL_CATEGORIES if cat != correct_category]
    # Select 4 random categories
    random_categories = random.sample(other_categories, 4)
    # Add back the correct category and shuffle
    all_options = random_categories + [correct_category]
    random.shuffle(all_options)
    return all_options

# Sample transaction data
def get_transaction_dataset() -> Dataset:
    transactions = [
        {"description": "STARBUCKS COFFEE #123", "amount": 5.75, "category": "Food & Dining"},
        {"description": "AMAZON.COM AMZN.COM/BI", "amount": 29.99, "category": "Shopping"},
        {"description": "UBER TRIP 12345", "amount": 18.50, "category": "Transportation"},
        {"description": "NETFLIX.COM", "amount": 13.99, "category": "Entertainment"},
        {"description": "CVS PHARMACY #1234", "amount": 32.47, "category": "Health & Medical"},
        {"description": "WALMART GROCERY", "amount": 87.65, "category": "Groceries"},
        {"description": "GEICO AUTO INSURANCE", "amount": 112.00, "category": "Insurance"},
        {"description": "AT&T WIRELESS", "amount": 85.99, "category": "Bills & Utilities"},
        {"description": "SHELL OIL 12345", "amount": 45.23, "category": "Auto & Transport"},
        {"description": "MARRIOTT HOTELS", "amount": 189.99, "category": "Travel"},
    ]
    
    # Add more varied examples
    more_transactions = [
        {"description": "PAYPAL *TRANSFER", "amount": 100.00, "category": "Transfer"},
        {"description": "DEPOSIT - THANK YOU", "amount": 1250.00, "category": "Income"},
        {"description": "UNIVERSITY BOOKSTORE", "amount": 75.50, "category": "Education"},
        {"description": "SUPERCUTS", "amount": 25.00, "category": "Personal Care"},
        {"description": "AMERICAN RED CROSS", "amount": 50.00, "category": "Gifts & Donations"},
        {"description": "LATE PAYMENT FEE", "amount": 35.00, "category": "Fees & Charges"},
    ]
    
    transactions.extend(more_transactions)
    
    # Create a Dataset object
    data = Dataset.from_dict({
        "description": [t["description"] for t in transactions],
        "amount": [t["amount"] for t in transactions],
        "category": [t["category"] for t in transactions]
    })
    
    # Map function that correctly handles category options
    def create_example(example):
        # Generate options just once per example
        options = get_category_options(example['category'])
        
        # Format the options as a bulleted list
        options_text = "- " + "\n- ".join(options)
        
        # Create the input prompt
        input_text = f"{SYSTEM_PROMPT}\n\nUser: Categorize this transaction: Description: {example['description']}, Amount: ${example['amount']}\n\nPossible categories:\n{options_text}\n\nAssistant:"
        
        # Create the expected response
        response_text = f" <reasoning>\nThis transaction is from {example['description']} for ${example['amount']}. Based on the merchant name and amount, this is clearly a {example['category']} transaction.\n</reasoning>\n<answer>\n{example['category']}\n</answer>"
        
        return {
            "input_text": input_text,
            "response_text": response_text,
            "text": input_text + response_text,
            "category_options": options,
            "correct_category": example['category']
        }
    
    # Apply the mapping function
    data = data.map(create_example)
    return data

# Create the dataset
dataset = get_transaction_dataset()
print(f"Dataset size: {len(dataset)} examples")

Let's examine a sample from our dataset to confirm it's formatted correctly:

In [None]:
# Display a sample example
sample_idx = 0  # Choose any index from 0 to len(dataset)-1
sample = dataset[sample_idx]

print(f"Transaction: {sample['description']}, Amount: ${sample['amount']}")
print(f"Correct category: {sample['correct_category']}")
print("Provided category options:")
for option in sample['category_options']:
    print(f"- {option}")
    
print("\nPrompt:")
print(sample['input_text'])

print("\nExpected completion:")
print(sample['response_text'])

## Fine-tune a Model

Now we'll select a model to fine-tune. We'll use a lightweight model for demonstration purposes, but for better results, you might want to use a larger model.

In [None]:
# Select the model you want to fine-tune
# For smaller compute requirements, use a smaller model like "microsoft/phi-2"
# For better results, use a larger model like "meta-llama/Llama-2-7b-hf"
model_name = "microsoft/phi-2"  # Replace with your preferred model

# Create a directory for saving model outputs
output_dir = "categorization-model"
os.makedirs(output_dir, exist_ok=True)

try:
    # Load tokenizer
    print(f"Loading tokenizer for {model_name}...")
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.pad_token = tokenizer.eos_token
    
    # Load model
    print(f"Loading model {model_name}...")
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="auto",  # Automatically distribute across available GPUs
        trust_remote_code=True,
    )
    
    print(f"Model loaded successfully with {sum(p.numel() for p in model.parameters())/1e6:.1f}M parameters")
except Exception as e:
    print(f"Error loading model: {e}")
    print("\nNote: If you're running this notebook in an environment without GPU,")
    print("you may want to modify the device_map parameter or use a smaller model.")

In [None]:
# Set up training arguments
training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=3,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    save_steps=10,
    logging_steps=10,
    learning_rate=2e-5,
    weight_decay=0.01,
    warmup_steps=5,
    optim="adamw_torch",
    bf16=False,  # Set to True if your GPU supports it
    save_total_limit=3,
)

Now we'll set up the SFT trainer from TRL library and start training. This cell may take a long time to execute depending on your hardware.

In [None]:
# Set up the SFT trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    max_seq_length=1024,
    dataset_text_field="text",
)

# To actually run training, uncomment the next line
# This may take a long time depending on your hardware
# trainer.train()

After training, we would save the model:

In [None]:
# Save the model (uncomment after training)
# model.save_pretrained(output_dir)
# tokenizer.save_pretrained(output_dir)

## Test the Fine-tuned Model

After fine-tuning, we can test how our model performs on new transactions:

In [None]:
def test_transaction(description, amount, model, tokenizer):
    # Generate random categories including a likely correct one
    if "COFFEE" in description or "STARBUCKS" in description:
        likely_category = "Food & Dining"
    elif "AMAZON" in description:
        likely_category = "Shopping"
    elif "UBER" in description or "LYFT" in description:
        likely_category = "Transportation"
    else:
        likely_category = random.choice(ALL_CATEGORIES)
        
    options = get_category_options(likely_category)
    options_text = "- " + "\n- ".join(options)
    
    # Create the prompt
    prompt = f"{SYSTEM_PROMPT}\n\nUser: Categorize this transaction: Description: {description}, Amount: ${amount}\n\nPossible categories:\n{options_text}\n\nAssistant:"
    
    print("\nTest prompt:")
    print(prompt)
    
    # Generate a response
    try:
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        outputs = model.generate(
            inputs.input_ids,
            max_new_tokens=200,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
        )
        response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
        
        print("\nModel response:")
        print(response)
        
        # Extract the answer
        if "<answer>" in response and "</answer>" in response:
            answer = response.split("<answer>")[1].split("</answer>")[0].strip()
            print(f"\nSelected category: {answer}")
        else:
            print("\nCould not extract category from response.")
    except Exception as e:
        print(f"Error generating response: {e}")
        print("Note: If you haven't trained the model yet, you'll need to do that first.")

In [None]:
# Test with a few examples
test_examples = [
    {"description": "CHIPOTLE MEXICAN GRILL", "amount": 12.49},
    {"description": "TESLA SUPERCHARGER", "amount": 18.75},
    {"description": "SPOTIFY PREMIUM", "amount": 9.99}
]

# Uncomment to test after training
# for example in test_examples:
#     test_transaction(example["description"], example["amount"], model, tokenizer)

## Using the Alternative Approach with unsloth/vllm (Optional)

For faster fine-tuning with larger models, you can use the `unsloth` and `vllm` libraries. This section is optional and requires additional dependencies.

In [None]:
# Uncomment to install unsloth and vllm
# !pip install unsloth vllm

In [None]:
# Code for unsloth approach (requires GPU with sufficient memory)
"""
from unsloth import FastLanguageModel, is_bfloat16_supported
import torch

# Configuration
max_seq_length = 512
lora_rank = 16

# Load model with unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-4",  # You can change this to your preferred model
    max_seq_length = max_seq_length,
    load_in_4bit = True,
    fast_inference = True,
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.7,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank,
    target_modules = ["gate_proj", "up_proj", "down_proj"],
    lora_alpha = lora_rank,
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
)

# Training with GRPOTrainer
from trl import GRPOConfig, GRPOTrainer

training_args = GRPOConfig(
    use_vllm = True,
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "paged_adamw_8bit",
    logging_steps = 1,
    bf16 = is_bfloat16_supported(),
    fp16 = not is_bfloat16_supported(),
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 1,
    num_generations = 6,
    max_prompt_length = 256,
    max_completion_length = 200,
    max_steps = 100,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
    output_dir = "outputs",
)
"""

## Conclusion

This notebook demonstrated how to fine-tune a model for transaction categorization using a multiple-choice approach. The key improvements over traditional approaches include:

1. Providing a limited set of category options for each transaction
2. Using a structured format for model responses
3. Guiding the model to provide reasoning before giving an answer

These improvements make the fine-tuning more focused and effective for real-world transaction categorization scenarios.