Skip to content

XIECHENG6/agenttune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentTune: Teaching Small LLMs Multi-Step Agent Reasoning

QLoRA fine-tuning for ReAct-style agent reasoning in small language models (1.5B–7B)

Python 3.10+ License: MIT Colab

Motivation

In our previous work, we showed that QLoRA fine-tuning enables small models to achieve 86–89% exact match on single-turn function calling. This project takes the next step: teaching small models to think and act like agents — planning multi-step solutions, calling tools, processing results, and adapting their approach.

Key Results

Model Comparison (500 training samples, QLoRA, 3 epochs)

Model Metric Zero-Shot Fine-Tuned Δ
Qwen2.5-3B-Instruct Task Success Rate 93.3% 100% +6.7%
Tool Selection Acc 30.0% 100% +70.0%
Exact Tool Match 30.0% 100% +70.0%
Qwen2.5-7B-Instruct Task Success Rate 83.3% 100% +16.7%
Tool Selection Acc 53.3% 100% +46.7%
Exact Tool Match 53.3% 100% +46.7%

Data Scaling (Qwen2.5-3B-Instruct)

Training Samples Tool Selection Acc Exact Tool Match Training Loss Time
50 73.3% 60.0% 1.880 65s
100 100% 93.3% 1.452 127s
250 96.7% 93.3% 0.785 317s
500 96.7% 96.7% 0.419 634s
1000 96.7% 93.3% 0.229 1266s

Key finding: Just 100 training samples are enough to reach near-perfect tool selection accuracy. The 3B model matches the 7B model after fine-tuning, demonstrating that QLoRA can close the capability gap between model sizes for agent reasoning tasks.

Scaling Experiments

How It Works

User: "What's the weather in Tokyo? If it's cold, recommend a ramen place."

Agent (fine-tuned Qwen2.5-3B):
  Thought: I need to check the weather in Tokyo first.
  Action: {"name": "get_weather", "arguments": {"city": "Tokyo"}}
  Observation: {"temperature": 8, "condition": "cloudy"}
  Thought: 8°C is cold. I should recommend a ramen restaurant.
  Action: {"name": "search_restaurant", "arguments": {"location": "Tokyo", "cuisine": "ramen"}}
  Observation: {"name": "Ichiran Ramen", "rating": 4.6}
  Thought: I have all the information.
  Answer: Tokyo is 8°C and cloudy. I recommend Ichiran Ramen (rated 4.6)!

Architecture

                         ┌─────────────┐
                         │  User Query │
                         └──────┬──────┘
                                │
                    ┌───────────▼───────────┐
                    │   Fine-tuned LLM      │
                    │  (Qwen2.5-3B + LoRA)  │
                    └───────────┬───────────┘
                                │
                    ┌───────────▼───────────┐
                    │    ReAct Loop          │
                    │  Thought → Action →    │
                    │  Observation → ...     │◄──── Tool Registry
                    │  → Answer             │      (10 tools)
                    └───────────────────────┘

Quick Start

Run in Colab

Open notebooks/01_Agent_FineTune.ipynb in Google Colab with an L4 GPU. The notebook is self-contained — all code is inline.

Local Setup

pip install -r requirements.txt

# Train
python -m src.training.train \
    --model_id Qwen/Qwen2.5-3B-Instruct \
    --train_samples 500 \
    --output_dir ./output/agent_qwen25_3b

# Evaluate
python -m src.evaluation.evaluate \
    --model_id Qwen/Qwen2.5-3B-Instruct \
    --adapter_path ./output/agent_qwen25_3b

# Demo
python demo/app.py \
    --model_id Qwen/Qwen2.5-3B-Instruct \
    --adapter_path ./output/agent_qwen25_3b

Project Structure

agenttune/
├── README.md
├── LICENSE
├── requirements.txt
├── .gitignore
├── src/
│   ├── agent/
│   │   ├── tools.py           # 10 tool definitions + simulated executor
│   │   └── runtime.py         # ReAct execution engine
│   ├── data/
│   │   ├── react_formatter.py # Format trajectories → training text
│   │   └── build_dataset.py   # Seed trajectories + data generation
│   ├── training/
│   │   └── train.py           # QLoRA fine-tuning script
│   └── evaluation/
│       └── evaluate.py        # Agent task evaluation
├── notebooks/
│   ├── 01_Agent_FineTune.ipynb    # Main Colab notebook (training + eval)
│   └── 02_Scaling_Experiments.ipynb  # Model comparison + data scaling
├── demo/
│   └── app.py                 # Gradio web demo
└── results/

Method

Training Data: ReAct Trajectories

Each training sample is a complete agent trajectory in ReAct format:

Thought → Action → Observation → Thought → ... → Answer

Data sources:

  1. Seed trajectories: 10 hand-crafted multi-step examples (1–3 tool calls)
  2. Augmented data: Seed tasks re-executed with varied simulated tool responses
  3. Synthetic generation: (Planned) GPT-4/Claude-generated diverse trajectories

Fine-Tuning Configuration

Component Setting
Quantization QLoRA 4-bit (NF4, double quantization)
LoRA rank / alpha 16 / 32
LoRA targets All attention + MLP projections
Learning rate 2e-4 (cosine schedule)
Max sequence length 2048 tokens
Training epochs 3

Evaluation Metrics

Metric Description
Task Success Rate Agent reaches a final answer
Tool Selection Accuracy Correct tools called
Exact Tool Match Only the expected tools called
Step Efficiency Completed within expected step range

Relationship to Phase 1

Phase 1: Tool Use Phase 2: Agent Reasoning
Task Single-turn function calling Multi-step planning & execution
Format User → JSON tool call ReAct: Thought → Action → Observation → Answer
Sequence length 512 tokens 2048 tokens
Key question Can small models call tools? Can small models think like agents?

Citation

@misc{agenttune-2026,
  title={AgentTune: Teaching Small LLMs Multi-Step Agent Reasoning via QLoRA},
  author={ChengXie},
  year={2026},
  url={https://github.com/XIECHENG6/agenttune}
}

License

MIT

About

Teaching Small LLMs Multi-Step Agent Reasoning via QLoRA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors