| license | tags | library_name | pipeline_tag | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
mit |
|
arms-hat |
feature-extraction |
An experimental index structure for AI memory systems that exploits known semantic hierarchy for retrieval.
Status: Early-Stage Research / Under Active Development
HAT is a research prototype exploring whether known data hierarchy (sessions, documents, chunks) can outperform generic ANN indexes for structured AI memory. Current benchmarks are on synthetic hierarchical data where HAT's structural prior gives it a natural advantage. The HNSW comparisons below reflect this favorable test condition — not general-purpose retrieval performance. Standard retrieval benchmarks (BEIR, MS MARCO, etc.) have not been run. This is not a production-ready replacement for existing vector databases.
Rigorous evaluation is in progress. Feedback and contributions welcome.
![]()
![]()
![]()
![]()
HAT exploits the known hierarchy in AI conversations: sessions contain documents, documents contain chunks. This structural prior enables O(log n) queries with 100% recall.
| Metric | HAT | HNSW | Notes |
|---|---|---|---|
| Recall@10 | 100% | 70% | On synthetic data with known hierarchy (favorable to HAT) |
| Build Time | 30ms | 2.1s | Micro-benchmark, not at production scale |
| Query Latency | 3.1ms | - | Small corpus; not validated at large scale |
Benchmarked on synthetically-generated hierarchical data. HNSW is not designed for hierarchical structure, so this comparison highlights HAT's domain advantage rather than a general superiority claim. Performance on unstructured or real-world data may differ significantly.
HAT achieves 100% recall where HNSW achieves only ~70% on hierarchically-structured data.
HAT builds indexes 70x faster than HNSW - critical for real-time applications.
Large language models have finite context windows. A 10K context model can only "see" the most recent 10K tokens, losing access to earlier conversation history.
Current solutions fall short:
- Longer context models: Expensive to train and run
- Summarization: Lossy compression that discards detail
- RAG retrieval: Re-embeds and recomputes attention every query
HAT exploits known structure in AI workloads. Unlike general vector databases that treat data as unstructured point clouds, AI conversations have inherent hierarchy:
Session (conversation boundary)
└── Document (topic or turn)
└── Chunk (individual message)
HAT mirrors human memory architecture - functioning as an artificial hippocampus for AI systems.
HAT uses beam search through the hierarchy:
1. Start at root
2. At each level, score children by cosine similarity to query
3. Keep top-b candidates (beam width)
4. Return top-k from leaf level
Complexity: O(b · d · c) = O(log n) when balanced
Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental consolidation.
On synthetic hierarchical data, HAT maintains high recall while HNSW (which lacks a hierarchy prior) degrades.
| Scale | HAT Build | HNSW Build | HAT R@10 | HNSW R@10 |
|---|---|---|---|---|
| 500 | 16ms | 1.0s | 100% | 55% |
| 1000 | 25ms | 2.0s | 100% | 44.5% |
| 2000 | 50ms | 4.3s | 100% | 67.5% |
| 5000 | 127ms | 11.9s | 100% | 55% |
Note: These results are on synthetic data where chunks are pre-organized into known hierarchies. Real-world data may not conform as cleanly to this structure. Testing on real conversation datasets is planned.
HAT may enable small-context models to achieve high recall beyond their native context window via hierarchical retrieval.
| Messages | Tokens | Context % | Recall | Latency | Memory |
|---|---|---|---|---|---|
| 1000 | 30K | 33% | 100% | 1.7ms | 1.6MB |
| 2000 | 60K | 17% | 100% | 3.1ms | 3.3MB |
These are retrieval recall figures on synthetic data — not end-to-end LLM task accuracy. A model retrieving the right chunks does not guarantee correct downstream answers. End-to-end evaluation (needle-in-a-haystack, downstream QA accuracy) has not been conducted yet.
from arms_hat import HatIndex
# Create index (1536 dimensions for OpenAI embeddings)
index = HatIndex.cosine(1536)
# Add messages with automatic hierarchy
index.add(embedding) # Returns ID
# Session/document management
index.new_session() # Start new conversation
index.new_document() # Start new topic
# Query
results = index.near(query_embedding, k=10)
for result in results:
print(f"ID: {result.id}, Score: {result.score:.4f}")
# Persistence
index.save("memory.hat")
loaded = HatIndex.load("memory.hat")use hat::{HatIndex, HatConfig};
// Create index
let config = HatConfig::default();
let mut index = HatIndex::new(config, 1536);
// Add points
let id = index.add(&embedding);
// Query
let results = index.search(&query, 10);pip install arms-hatgit clone https://github.com/automate-capture/hat.git
cd hat
cargo build --releasecd python
pip install maturin
maturin develophat/
├── src/ # Rust implementation
│ ├── lib.rs # Library entry point
│ ├── index.rs # HatIndex implementation
│ ├── container.rs # Tree node types
│ ├── consolidation.rs # Background maintenance
│ └── persistence.rs # Save/load functionality
├── python/ # Python bindings (PyO3)
│ └── arms_hat/ # Python package
├── benchmarks/ # Performance comparisons
├── examples/ # Usage examples
├── paper/ # Research paper (PDF)
├── images/ # Figures and diagrams
└── tests/ # Test suite
# Run HAT vs HNSW benchmark
cargo test --test phase31_hat_vs_hnsw -- --nocapture
# Run real embedding dimension tests
cargo test --test phase32_real_embeddings -- --nocapture
# Run persistence tests
cargo test --test phase33_persistence -- --nocapture
# Run end-to-end LLM demo
python examples/demo_hat_memory.pyHAT may work well for:
- AI conversation memory (chatbots, agents) — its primary design target
- Session-based retrieval with clear hierarchy
- Hierarchically-structured vector data
- Cold-start scenarios (no training needed)
Use established solutions (HNSW, etc.) for:
- Unstructured point clouds (random embeddings)
- Static knowledge bases (handbooks, catalogs)
- Production systems requiring battle-tested reliability
- Any workload where approximate recall is acceptable
HAT explores an approach to indexing that exploits known structure rather than learning it. Whether this generalizes beyond AI conversation data is an open research question.
| Database Type | Structure | Semantics |
|---|---|---|
| Relational | Explicit (foreign keys) | None |
| Document | Implicit (nesting) | None |
| Vector (HNSW) | Learned from data | Yes |
| HAT | Explicit + exploited | Yes |
Traditional vector databases treat embeddings as unstructured point clouds, spending compute to discover topology. HAT inverts this: known hierarchy is free information - use it.
Any domain with hierarchical structure + semantic similarity benefits from HAT:
- Legal/Medical Documents: Case → Filing → Paragraph → Sentence
- Code Search: Repository → Module → Function → Line
- IoT/Sensor Networks: Facility → Zone → Device → Reading
- E-commerce: Catalog → Category → Product → Variant
- Research Corpora: Journal → Paper → Section → Citation
"Position IS relationship. No foreign keys needed - proximity defines connection."
HAT combines the structural guarantees of document databases with the semantic power of vector search, without the computational overhead of learning topology from scratch.
@article{hat2026,
title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
author={Young, Lucas and Automate Capture Research},
year={2026},
url={https://research.automate-capture.com/hat}
}MIT License - see LICENSE for details.
- Research Site: research.automate-capture.com/hat
- Main Site: automate-capture.com









