Skip to content

Waerden001/LMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LMM: Large Memory Model

LLMs have no memory. The context window is a register file, not RAM — we need a real memory layer.

LMM is a research prototype that explores a radical idea: what if a small language model could serve as the memory substrate for a larger working model? Instead of stuffing everything into the context window or maintaining flat markdown files, LMM compresses interaction history into model weights and generates contextual memory on demand.

Read the full essay: LMM: When Large Language Models Learn to Remember

The Idea

Current LLM systems have a fundamental architectural gap. In computer architecture terms:

  • Context window = CPU registers (fast, finite, temporary)
  • CLAUDE.md / .cursorrules = crib notes taped to the monitor
  • Re-scanning codebases = reading the entire hard drive into registers every time

What's missing is the memory hierarchy — L1/L2 cache, RAM, virtual memory, and the OS that manages data flow between layers.

LMM explores one piece of this: a small model (1.7B parameters) that ingests your interaction history and generates relevant context on demand. Your memory lives in the model's weights, runs locally, and belongs to you — not to any model provider.

What This Repository Contains

This is the first implementation: fine-tuning Qwen3-1.7B on Claude Code session data to generate contextual memory. It includes:

  • Data pipeline: Parse Claude Code sessions → extract cognitions → build training pairs
  • Two-stage training: Continual Pre-Training (CPT) on raw documents + Supervised Fine-Tuning (SFT) on context→memory pairs
  • Evaluation framework: Keyword coverage metrics, diversity analysis, per-category breakdown

What We Learned

Four rounds of experiments revealed both the promise and limits of pure generation:

Run Approach Val Loss Key Finding
1 Baseline SFT 0.4917 Model learns the format, but outputs are generic
2 Salience-weighted 0.4476 Weighting helps, but coverage still low
3 + Codebase knowledge 0.6978 "Parrot problem" — 49% of pairs shared 4 memories
4 CPT + deduped SFT 1.6497 Diversity fixed (Jaccard 0.048), but specificity limited

Key insights:

  1. Pure generation without retrieval hits fundamental limits at 1.7B scale. The model learns patterns ("this project uses Python", "the user prefers X") but can't memorize specifics (exact function names, API details).
  2. Data quality dominates. The "parrot problem" in Run 3 (model repeating the same 4 memories) was a data issue, not a model issue. Deduplication + chunk-based pairs fixed it.
  3. Memory should be retrieval + generation, not generation alone. The next architecture should: store raw data → retrieve relevant chunks → generate/synthesize contextual memory.

Quick Start

Prerequisites

  • Python 3.11+
  • PyTorch 2.0+ with CUDA
  • ~8GB+ GPU VRAM (tested on RTX 4090)

Installation

# Clone the repo
git clone https://github.com/Waerden001/LMM.git
cd LMM

# Install dependencies
pip install -e ".[dev]"

# Download the base model
python -c "from transformers import AutoTokenizer, AutoModelForCausalLM; \
  AutoTokenizer.from_pretrained('Qwen/Qwen3-1.7B'); \
  AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-1.7B', dtype='auto')"

Prepare Your Data

LMM trains on Claude Code session data. You'll need your own sessions from ~/.claude/projects/.

# Step 1: Parse sessions into structured format
python scripts/01_parse_sessions.py

# Step 2: Extract cognitions (key knowledge) from sessions
python scripts/02_extract_cognitions.py

# Step 2a (optional): Ingest codebase knowledge
python scripts/02a_ingest_codebases.py

# Step 3: Build training pairs
python scripts/03_build_training_data.py --use-chunk-pairs --max-per-memory 3

Train

# Two-stage training: CPT then SFT
WANDB_MODE=offline python scripts/04_train.py --stage both --use-chunk-pairs --max-per-memory 3

# Or run stages separately:
WANDB_MODE=offline python scripts/04_train.py --stage cpt
WANDB_MODE=offline python scripts/04_train.py --stage sft --from-checkpoint checkpoints/run_cpt/best

Evaluate

python scripts/05_evaluate.py

Demo

python scripts/06_demo.py

Project Structure

LMM/
├── src/lmm/                # Core library
│   ├── config.py           # Training & model configuration
│   ├── data/               # Dataset classes (SFT, CPT)
│   ├── training/           # Trainer, data collation
│   ├── eval/               # Evaluation metrics
│   └── utils/              # Parsing, cognition extraction
├── scripts/                # Pipeline scripts (00-06)
├── configs/                # Project configuration (YAML)
├── data/                   # Generated data (gitignored)
├── results/                # Evaluation results
└── docs/                   # Blog posts & research plan

Privacy by Design

Your memory belongs to you, not to any model provider. Core principles:

  • Local training: The model runs on your hardware
  • No data upload: Raw sessions and trained models never leave your machine
  • Your weights, your memory: The fine-tuned model is yours — it encodes your interaction patterns

This repo does not include any pre-trained checkpoints or session data. You train your own LMM on your own data.

What's Next

The experiments point clearly toward a retrieval + generation architecture:

  1. Store raw interaction data in a structured knowledge base
  2. Retrieve relevant chunks given the current context (semantic search, not brute-force)
  3. Generate synthesized memory from retrieved chunks (the LMM's role shifts from "remember everything" to "synthesize what's retrieved")

This combines the best of both worlds: retrieval provides specificity, generation provides contextual adaptation. The 1.7B model doesn't need to memorize every function name — it just needs to be good at synthesizing relevant retrieved information into useful context.

Documentation

License

Apache 2.0


中文说明

核心愿景

LLM 的 context window 是寄存器,不是内存。我们需要一个真正的记忆层。

LMM(Large Memory Model)探索一个激进的想法:用一个小型语言模型作为大型工作模型的记忆基底。与其把所有东西塞进上下文窗口,或者维护扁平的 markdown 文件,不如把交互历史压缩到模型权重中,按需生成上下文记忆。

完整文章:LMM:当大语言模型学会记忆

实验发现

四轮实验揭示了纯生成路线的潜力和局限:

  • Run 1-2:模型能学会记忆的格式和通用模式,但具体细节(函数名、API 参数)记不住
  • Run 3:发现"鹦鹉问题"——49% 的训练对共享仅 4 条记忆,模型只是在复读
  • Run 4:CPT + 去重 SFT 修复了多样性(Jaccard 相似度 0.048),但在 1.7B 规模下纯生成的覆盖率仍然有限

关键结论:记忆应该是"检索 + 生成",而不是纯生成。下一步架构:存储原始数据 → 检索相关片段 → 生成/综合上下文记忆。

隐私设计

你的记忆属于你,不属于任何模型提供商:

  • 模型在你的硬件上本地训练
  • 原始会话数据和训练好的模型永远不离开你的机器
  • 本仓库不包含任何预训练权重或会话数据——你用自己的数据训练自己的 LMM

使用方法

请参考上方英文部分的 Quick Start 指南。主要流程:

  1. 安装依赖:pip install -e ".[dev]"
  2. 下载基础模型 Qwen3-1.7B
  3. 准备数据:运行 scripts/01-03 处理你的 Claude Code 会话
  4. 训练:python scripts/04_train.py --stage both
  5. 评估:python scripts/05_evaluate.py

详细研究计划和实现笔记见 docs/ 目录。

About

LLMs have no memory. We fine-tune a small model (Qwen3-1.7B) to serve as a private memory layer — compressing interaction history into weights and generating contextual memory on demand.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors