Author: Kristi Gilleland (MXO)
License: CC BY 4.0
Status: Public RFC
Version: 3.1 — Executive Preface + Targeted Upgrades
Date: 2025-09-03
Concept:
Dewey Brain is an architecture for cognitive frugal intelligence — a fast, generalized neocortex (Compressed Core) guided by a hippocampal index (Catalog) to selectively fuse episodic skills/memories (Library). It combines:
- CompactifAI-style tensor network compression for a lean, high-speed core.
- Semantic Dewey-like knowledge indexing for precise retrieval.
- Hot-swappable LoRA/Tensor Network blocks for rare or specialized knowledge.
- Strict continual learning protocol (EWC, replay buffers) to prevent catastrophic forgetting.
- Routing Layer (Sec 2.2) — Lightweight classifier for
fetch_neededusing entropy, log-likelihood variance, top similarity score, and OOD score. Labels come from whether a fetched block improved answers on eval. - Fusion Strategy (Sec 3.2) — Interference-mitigated multi-block merges via TIES-Merging or DARE to avoid cross-domain interference.
- Relearning Protocol (Sec 4) — Add a Canary Set (~100 diverse queries) after each cycle and Multi-Armed Bandit scheduling (UCB/Thompson) to balance performance gain vs. compute cost.
- Compressed Core Model — Tensor networks (MPO, TT, Tucker) for 80–95% reduction; fast, low-power reasoning.
- Dewey-Like Knowledge Index — 4-level ontology with call numbers + dynamic metadata (
relearn_priority,staleness_score, usage, fail rate, last update). - External Knowledge Library — LoRA adapters / TN slices for rare knowledge; stored off-core and fetched on demand.
- Hot-Swap Runtime Merge — Predefined adapter attach points; fusion target <3 ms (aim <2 ms with hybrid decompositions).
- Dual-Encoder — Compressed weights/embeddings; trained on synthetic pairs, clustered shards, gold set, filtered failure trajectories, and prioritized replay (PER).
- Loss — InfoNCE with hard negatives; head→tail curriculum; compression reconstruction regularizer.
- Routing — Sufficiency detector (core entropy + encoder confidence + OOD). Dynamic thresholds post-relearning.
- Latency Budget (cold) — Index ≤20 ms, Fetch ≤60 ms, Merge ≤5 ms (target 3 ms), Total fetch ≤85 ms, End-to-end ≤285–335 ms.
- Block Compression — Default LoRA; hybrid TN (Tucker shallow, TT deep); pre-compression audit for over-compressible params.
- Fusion — Adapter stacking / low-rank fusion; integrity checks (schema/version, signed artifacts, checksums).
- Triggers — Tail acc <85%, missed fetch >8%, high
relearn_priority; external drift checks; manual override. - Steps — Prepare (compress, isolate) → Fine-tune (LoRA deltas, EWC/SI, optional distillation; dynamic EWC λ) → Validate (reject if head forgetting >1% or tail drop >2%; adversarial tests) → Deploy (A/B, shadow, rollback) → Log (≤10% maintenance compute).
- Frequency — Tail blocks bi-weekly; encoder/core monthly; adaptive bandit scheduling.
- Targets — Head ≥99% (no fetch); Tail ≥85% (1 fetch); False fetch <5%; Missed fetch <8%; Head forgetting <1%; Failed-query improvement >90%; drift detection (KS test).
- Dashboards — Accuracy, ECE, latency, cache hit, drift; auto-alerts on budget/metric breaches.
- Compression variability → Hybrid TN + templates
- Data bias in trajectories → Strict filtering + balanced replay
- Drift accumulation → Canary + drift metrics + shadow model
- Edge compute limits → ≤10% maintenance compute budget
- Integration complexity → Tensorly/Hugging Face/PyTorch
- Phase 1 (MVP, 3–6 mo) — 2–3 domains; core compression + manual catalog; static thresholds.
- Phase 2 (Adaptive) — Add
staleness_score, drift triggers, PER; hybrid TN; <3 ms fusion. - Phase 3 (Self-Optimizing) — Bandit scheduling; continuous A/B + shadow; auto-refactor library.
Kristi Gilleland — Primary author, originator of the Dewey Brain concept, lead designer of architecture, and curator of all content. Conducted hand-verification of citations and oversaw all AI-assisted contributions.
Grok 4 — Provided early architectural revisions, proposed strict relearning protocol, and contributed ideas on integrating CompactifAI-inspired compression.
Gemini 2.5 Pro — Evaluated semantic encoding feasibility, recommended cataloging automation approaches, and confirmed industry parallels to the proposed design.
GPT-5 (ChatGPT) — Synthesized architectural documentation, extended design with cognitive and philosophical analogies, and produced the formatted README and visual diagram.
All AI contributions were generated under the direct instruction, curation, and editorial oversight of Kristi Gilleland.
This work is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
You are free to share and adapt for any purpose, even commercially, with proper attribution.
License details: https://creativecommons.org/licenses/by/4.0/
-
CompactifAI – Quantum-Inspired Tensor Network Compression
Tomut, A., Jahromi, S. S., Sarkar, A., et al. (2024). CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks. arXiv preprint.
arXiv:2401.14109v2. First published January 25, 2024.
Access: https://arxiv.org/abs/2401.14109 -
Elastic Weight Consolidation (EWC)
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences.
DOI: 10.1073/pnas.1611835114. Published March 28, 2017.
Access: https://www.pnas.org/doi/10.1073/pnas.1611835114 -
Tensor Decompositions (Tensor Train)
Xu, M., Xu, Y. L., & Mandic, D. P. (2023). TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train Decomposition. arXiv preprint.
arXiv:2307.00526v2. First published July 2, 2023.
Access: https://arxiv.org/abs/2307.00526 -
Pretrained Tensor Compression in Deep Networks
Kossaifi, J., Khanna, A., Lipton, Z. C., et al. (2017). Tensor Contraction Layers for Parsimonious Deep Nets. arXiv preprint.
arXiv:1706.00439. Published June 1, 2017.
Access: https://arxiv.org/abs/1706.00439 -
Post-EWC Analysis (Quadratic Penalties)
Huszár, F. (2018). On Quadratic Penalties in Elastic Weight Consolidation. PNAS-derived note.
arXiv:1712.03847. Published in PNAS February 20, 2018. DOI: 10.1073/pnas.1717042115.
Access: https://arxiv.org/abs/1712.03847 or https://www.pnas.org/doi/10.1073/pnas.1717042115

