A fast-convergence, plug-and-play, expandable FFN-style memory mechanism for LLMs.
Locas Memory is a lightweight parametric memory mechanism that augments LLM Feed-Forward Network (FFN) layers with external, trainable memory entries. Unlike KV-cache or retrieval-augmented approaches, Locas Memory directly injects structured knowledge into the MLP computation path, enabling:
- Fast convergence – Memory tensors are extracted from activation statistics in a single forward pass (
memorize()), then refined through gradient-based optimization. - Plug-and-play – Works with standard HuggingFace Qwen3 models out of the box; no architectural redesign required.
- Dense export – Memory can be fused back into standard MLP weights via
to_dense(), producing a vanilla Qwen3 model that is compatible with any inference engine (e.g., vLLM). - Flexible training – Supports Next-Token Prediction (NTP), Self-Distillation (SD), and Reinforcement Learning (GRPO) training paradigms.
- LoRA baseline – Includes a LoRA adapter baseline for fair comparison.
┌──────────────────────────┐
│ Qwen3 Decoder Layer │
│ │
hidden_states ──► │ Self-Attention │
│ │ │
│ Post-Attention LayerNorm │
│ │ │
│ ┌────┴────┐ │
│ │Original │ ┌────────┐ │
│ │ MLP │ │ Memory │ │
│ │(frozen) │ │ MLP │ │
│ └────┬────┘ └───┬────┘ │
│ └─────+─────┘ │
│ │ │
│ output │
└──────────────────────────┘
Memory tensor shape: (num_layers, batch_size, memory_size, hidden_size, 3) — storing key, gate, and value components per layer.
| Method | Description |
|---|---|
model.memorize(input_ids, keep_top=k) |
Extract top-k memory entries from activation importance scores |
model.to_dense(memory) |
Fuse memory into MLP weights → standard Qwen3ForCausalLM |
model.compute_nll(input_ids, memory) |
Compute per-token negative log-likelihood with memory |
Locas-Memory/
├── models/
│ └── modeling_qwen3_locas.py # Core: Qwen3ForCausalLMWithMemory model
├── utils/
│ └── evaluate_mmlu.py # MMLU benchmark evaluation (NLL-based)
├── launch_pg19_experiment.py # PG-19 long-document perplexity evaluation
├── launch_locomo_experiments.py # LoCoMo conversational QA evaluation
├── requirements.txt # Python dependencies
└── data/
├── pg-19-docs/ # PG-19 long documents
└── locomo.json # LoCoMo dataset
pip install -r requirements.txt| Package | Purpose |
|---|---|
torch |
Core deep learning framework |
transformers |
Qwen3 model backbone |
peft |
LoRA adapter baseline |
datasets |
HuggingFace dataset loading |
flash_attn |
Efficient attention computation |
from models.modeling_qwen3_locas import Qwen3ForCausalLMWithMemory
# Load base model
model = Qwen3ForCausalLMWithMemory.from_pretrained("Qwen/Qwen3-0.6B")
# Extract memory from sample input
input_ids = tokenizer.encode("Your knowledge text here", return_tensors="pt")
memory = model.memorize(input_ids, keep_top=32, memory_init="highest")
# memory shape: (num_layers, 1, 32, hidden_size, 3)
# continue to update your memory via BP
# Fuse memory into standard dense model (compatible with vLLM, etc.)
dense_model = model.to_dense(memory)
dense_model.save_pretrained("./output/dense_model")Evaluate long-document language modeling with online memory adaptation:
python launch_pg19_experiment.py \
--model Qwen/Qwen3-1.7B-Base \
--memory_width 64 \
--memory_init highest \
--loss_function NTP \
--lr 1e-3 \
--window_size 1024 \
--num_gpus 8Supported loss functions:
NTP– Next-Token PredictionST– Self-Distillation (teacher = frozen base model)MIX– Mix-NTP Distillation
Evaluate long-context conversational question answering on the LoCoMo benchmark:
python launch_locomo_experiments.py \
--model Qwen/Qwen3-1.7B-Base \
--memory_type locas \
--ttt_loss SD \
--context_mode date_splitMemory types: locas (Locas Memory) or lora (LoRA adapter baseline)
| Strategy | Description |
|---|---|
highest |
Select neurons with highest activation magnitude (default) |
lowest |
Select neurons with lowest activation magnitude |
random |
Random Gaussian initialization |
random_index |
Randomly permute extracted neuron indices |
Currently released part: MIT License The complete project: Copyright 2026 Tencent