Official reproduction code for HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents.
The paper uses the name HiGMem. Some source files keep the internal development name fphm; this is a naming difference only.
- Hierarchical memory: organizes long conversations into Turn, Event, and optional Profile layers.
- LLM-guided memory construction: incrementally builds structured event memories while reading conversations.
- Reasoning-aware retrieval: retrieves candidate events and turns, then filters evidence with LLM judgments.
- LoCoMo reproduction: includes the LoCoMo-10 split used by our experiments for convenient reproduction.
- OpenAI-compatible backend: supports OpenAI APIs and local OpenAI-compatible services such as vLLM.
- Architecture Overview
- Repository Structure
- Installation
- Quick Start
- Reproducing the Paper Setting
- Baselines
- Analysis
- Configuration Notes
- Dataset
- Citation
- License
HiGMem processes each conversation turn incrementally:
- Turn construction: each raw dialogue turn is converted into a
TurnNotewith metadata. - Event affiliation: the new turn is assigned to existing events or starts a new event.
- Event update: event metadata and summaries are updated as the conversation grows.
- Retrieval for QA: the question is rewritten into retrieval keywords, candidate events and turns are retrieved, and an LLM filter selects final evidence turns.
- Final answer generation: the final QA prompt uses the retrieved turns and follows the LoCoMo prompt family aligned with A-Mem.
.
├── data/
│ ├── locomo10.json # LoCoMo-10 data used for reproduction
│ └── README.md # Dataset note
├── run_fphm_evaluation.py # Main HiGMem evaluation script
├── fphm_core.py # HiGMem memory construction and retrieval core
├── memory_layer.py # LLM and embedding backend utilities
├── prompts.py # Prompt templates
├── oracle_test.py # Oracle evidence baseline
├── full_context_test.py # Full-context baseline
├── analyze_recall.py # Single-run retrieval analysis
├── analyze_recall_full.py # Full-run retrieval analysis
├── requirements.txt
├── CITATION.bib
└── LICENSE
This codebase is tested with Python 3.9.
Create and activate a virtual environment:
python3.9 -m venv .venv
source .venv/bin/activateOn Windows PowerShell:
py -3.9 -m venv .venv
.\.venv\Scripts\Activate.ps1Install PyTorch first. Choose the wheel matching your CUDA version. Example for CUDA 12.4:
pip install torch==2.4.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124Then install the remaining dependencies:
pip install -r requirements.txtConfigure API credentials with environment variables or a local .env file:
OPENAI_API_KEY=sk-...
# Optional for OpenAI-compatible servers such as vLLM
OPENAI_API_BASE=http://127.0.0.1:8000/v1Run a single LoCoMo sample:
python run_fphm_evaluation.py --model gpt-4o-mini --backend openai --ablation-no-profile --ablation-event-metadata-only --ablation-no-link --k_event 10 --sample_index 0The first run may download the sentence-transformer embedding model used for vector retrieval.
Use the following command for the main HiGMem LoCoMo setting:
python run_fphm_evaluation.py --model gpt-4o-mini --backend openai --ablation-no-profile --ablation-event-metadata-only --ablation-no-link --k_event 10 --num-workers 5This configuration corresponds to:
- query rewriting enabled by default;
- character profiles disabled;
- event metadata mode enabled;
- immediate turn-turn links disabled;
k_event = 10.
For an OpenAI-compatible local server, for example vLLM:
python run_fphm_evaluation.py --model Qwen2.5-7B-Instruct --backend openai --api_base http://127.0.0.1:8010/v1 --api_key EMPTY --ablation-no-profile --ablation-event-metadata-only --ablation-no-link --k_event 10 --num-workers 5The code does not set an explicit max_tokens limit for OpenAI-compatible endpoints; context limits are controlled by the model server.
Oracle evidence baseline:
python oracle_test.py --model gpt-4o-mini --backend openai --num-workers 5Full-context baseline:
python full_context_test.py --model gpt-4o-mini --backend openai --num-workers 5Analyze the latest single-sample run:
python analyze_recall.pyAnalyze the latest full-dataset run:
python analyze_recall_full.pyGenerated outputs are written to:
fphm_runs/: full-run logs, checkpoints, and aggregated results;fphm_logs/: single-sample logs;checkpoints/: single-sample memory checkpoints;results/: single-sample metrics;analysis_results/: recall, precision, and cost analysis tables.
These generated directories are ignored by Git.
The main evaluation script exposes ablation flags used in our experiments:
--ablation-no-profile--ablation-event-title-only--ablation-event-metadata-only--ablation-attribute-profile--ablation-no-fact-judgment--ablation-no-filter--ablation-no-link--ablation-no-event--ablation-mpnet-retrieval--disable_query_rewriting_llm
For the main paper setting, use:
--ablation-no-profile --ablation-event-metadata-only --ablation-no-link --k_event 10For convenience, this repository includes data/locomo10.json, the LoCoMo-10 data split used by our reproduction scripts. LoCoMo is a public long-term conversation benchmark; please refer to the original LoCoMo release at snap-research/locomo for dataset provenance and usage terms.
If you use this code, please cite:
@inproceedings{cao2026higmem,
title = {HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents},
author = {Cao, Shuqi and He, Jingyi and Tan, Fei},
booktitle = {Findings of the Association for Computational Linguistics: ACL 2026},
year = {2026},
publisher = {Association for Computational Linguistics},
url = {https://github.com/ZeroLoss-Lab/HiGMem}
}This project is released under the MIT License. See LICENSE for details.