BudgetMem is a runtime agent memory framework that enables explicit performance–cost control for on-demand memory extraction. Instead of building a fixed memory once and using it for all future queries, BudgetMem triggers memory computation at runtime and makes it budget-aware through module-level budget tiers and learned routing.
At a high level, BudgetMem organizes memory extraction as a modular pipeline. Each module exposes three budget tiers (Low / Mid / High), which can be instantiated along three complementary axes:
- Implementation tiering: vary the module implementation (e.g., lightweight heuristics → task-specific models → LLM-based processing)
- Reasoning tiering: vary inference behavior (e.g., direct → CoT → multi-step/reflection)
- Capacity tiering: vary model capacity (e.g., small → medium → large LLM backbones)
A lightweight budget-tier router selects tiers module-wise based on the query and intermediate states, and is trained with reinforcement learning under a cost-aware objective to provide controllable performance–cost behavior.
BudgetMem is designed as a unified testbed to study how different tiering strategies translate compute into downstream gains. We evaluate BudgetMem on LoCoMo, LongMemEval, and HotpotQA, demonstrating strong performance in performance-first settings and clear performance–cost frontiers under tighter budgets.
- 🚀 [2026-02]: BudgetMem is officially released — a runtime agent memory framework that enables explicit performance–cost control via module-level budget tiers and learned budget-tier routing, supporting controllable on-demand memory extraction across diverse benchmarks ✨. Stay tuned! More detailed instruction updates coming soon.
- Overview
- News
- Get Started
- Installation
- Preparing Training Data
- Experiments
- Training
- Evaluation
- Acknowledgments
- Citation
# Clone the repository
git clone https://github.com/ViktorAxelsen/BudgetMem
cd BudgetMem
# Install uv if not already installed
# See: https://docs.astral.sh/uv/getting-started/installation/
# Create virtual environment and install dependencies
uv sync
# Activate the virtual environment
source .venv/bin/activate
# Run scripts with uv
uv run python train/train_locomo.py --help# Clone the repository
git clone https://github.com/ViktorAxelsen/BudgetMem
cd BudgetMem
# Create and activate virtual environment
conda create -n budgetmem python=3.10
conda activate budgetmem
# Install dependencies
pip install -r requirements.txtBudgetMem builds training and evaluation data from the datasets below. Please download data from the official sources and place them under data/. Unless otherwise noted, splits are handled by our codebase.
- Download LoCoMo from the official repo: LoCoMo
- Put the downloaded file under:
data/locomo10.json
- Download LongMemEval from the official repo: LongMemEval
- Put the processed file under:
data/longmemeval_s_cleaned.json
- Use our split file:
data/longmemeval_s_splits.json(train/val/test)
- Download HotpotQA from: HotpotQA-Modified (Source: HotpotQA)
- For evaluation, we use the test file:
data/eval_200.json
-
Configure API keys and credentials: Set up your API keys and credentials in the training scripts (
scripts/train_*.sh) or via environment variables:- API keys: Configure API KEYS in
src/config.py - Hugging Face token: Set
HF_TOKENorHUGGINGFACE_TOKENenvironment variable - Wandb (optional): Set
WANDB_API_KEYfor experiment tracking, or setWANDB_DISABLE=trueto disable
- API keys: Configure API KEYS in
-
Configure GPU device (if using GPU): Set
CUDA_VISIBLE_DEVICESin the training scripts to specify which GPU to use:export CUDA_VISIBLE_DEVICES=0 # Use GPU 0
-
Set data paths: Update the data file paths in the training scripts:
scripts/train_locomo.shfor LoCoMo datasetscripts/train_longmemeval.shfor LongMemEval datasetscripts/train_hotpotqa.shfor HotpotQA dataset
-
Run training: Execute the corresponding script for your dataset:
bash scripts/train_locomo.sh # For LoCoMo bash scripts/train_longmemeval.sh # For LongMemEval bash scripts/train_hotpotqa.sh # For HotpotQA
You can customize training parameters (model, cost strategy, reward/cost weights, etc.) directly in the script files.
-
Training outputs: After training completes, the script automatically evaluates both the best model and the last epoch model on the test set. Model checkpoints are saved in:
- Best model:
./test_model/best_model_{cost_strategy}.pt - Epoch checkpoints:
./test_model/checkpoint_epoch_{epoch}_{cost_strategy}.pt - Evaluation results and API statistics are saved in the same directory
Note: For
rule_llmcost strategy, preprocessed memory pools are cached to disk for faster subsequent runs. - Best model:
Automatic evaluation: Training scripts automatically evaluate models on the test set after training completes.
Manual evaluation: To evaluate a trained model separately, use the test utilities in the tests/ directory. Each dataset has a corresponding test module:
- LoCoMo:
tests/test_utils.py→test_on_test_set() - LongMemEval:
tests/test_utils_longmemeval.py→test_on_test_set() - HotpotQA:
tests/test_utils_hotpotqa.py→test_on_test_set()
The test functions require the same arguments as training (tokenizers, encoders, modules, etc.) and accept a model_path parameter to specify the checkpoint. You can also set the model path via --model-path argument or MODEL_PATH environment variable when calling the test functions.
Example: To test a specific checkpoint, modify the test script to load your model and call test_on_test_set() with the appropriate model_path pointing to your checkpoint file (e.g., ./test_model/best_model_rule_llm.pt).
We thank the authors and maintainers of LoCoMo, LongMemEval, and HotpotQA-Modified (source: HotpotQA) for releasing their datasets, evaluation protocols, and supporting code. Their efforts in building and open-sourcing high-quality benchmarks make it possible to develop, evaluate, and reproduce research on agent memory.
We also thank LightMem for pioneering performance–efficiency considerations in memory systems, which helped motivate our focus on explicit budget control, and GAM for advancing runtime agent memory frameworks.
@article{BudgetMem,
title={Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory},
author={Haozhen Zhang and Haodong Yue and Tao Feng and Quanyu Long and Jianzhu Bao and Bowen Jin and Weizhi Zhang and Xiao Li and Jiaxuan You and Chengwei Qin and Wenya Wang},
journal={arXiv preprint arXiv:xxxx.xxxxx},
year={2026}
}
