The Best Open-Source Atomic Knowledge Base Builder for RAG
Make AI truly learn a book, rather than just read it.
中文: 让AI真正"学会"一本书而非只是"看过"——采用原子化方法构建RAG知识库,使AI能够理解、融会贯通、举一反三。
English: Make AI truly "learn" a book rather than just "read" it — Build RAG knowledge bases using atomic methods, enabling AI to understand, synthesize, and apply knowledge flexibly.
| Pain Point | Traditional Method | Our Atomic Solution |
|---|---|---|
| Hard Cut | Split by 600-800 chars, breaking knowledge | Split by knowledge integrity |
| U-shaped Attention | LLMs only remember 10% of 1M context | Load only relevant atoms |
| Knowledge≠Ability | AI can recite but can't solve problems | Extract executable methodologies |
| STEM Blind Spots | Formulas, charts, derivations ignored | Specialized parsers for STEM |
- 🔬 原子化拆分 - 按知识完整性而非字数硬性切割
- 📚 方法论提炼 - 从描述性内容提取可执行方法
- 🧮 理工农医特化 - 数学公式LaTeX、物理模型、化学反应式、医学诊断逻辑专业处理
- 🔍 多路召回 - 向量检索 + 关键词检索 + 知识图谱三重保障
- 🔄 HERMES集成 - 支持动态学习循环,使用越多越精准
- 🔬 Atomic Chunking - By knowledge integrity, not word count
- 📚 Methodology Extraction - Extract executable methods from text
- 🧮 STEM Specialized - LaTeX formulas, physics models, chemical equations, medical diagnosis
- 🔍 Multi-Recall - Vector + Keyword + Knowledge Graph
- 🔄 HERMES Integration - Dynamic learning loop
from atomic_rag import AtomicRAGBuilder
# Initialize builder (Math domain)
builder = AtomicRAGBuilder(domain="math")
# Process PDF
atoms = builder.process_pdf("calculus.pdf")
# Store to vector DB
builder.store_to_vector_db(atoms, collection_name="math_kb")
print(f"✅ Built {len(atoms)} knowledge atoms")from atomic_rag import MultiRecallRAG
rag = MultiRecallRAG()
answer = rag.ask("How to solve quadratic equations?")
print(answer)Step 1: Format Conversion (Eliminate visual blind spots)
Step 2: Semantic Chunking (By knowledge integrity)
Step 3: Methodology Extraction (Keep methods, remove stories)
Step 4: Metadata Extraction (Multi-dimensional tags)
Step 5: Vector Storage (Ready for retrieval)
- ✅ LaTeX formula extraction
- ✅ Proof step recognition
- ✅ Theorem/definition annotation
- ✅ Physics model extraction
- ✅ Formula derivation preservation
- ✅ Applicable conditions annotation
- ✅ Chemical equation recognition
- ✅ Reaction mechanism analysis
- ✅ Condition parameter recording
- ✅ Diagnosis logic extraction
- ✅ Treatment plan structuring
- ✅ Differential diagnosis annotation
| Scenario | Application |
|---|---|
| 📖 Personal Knowledge Base | Convert books to searchable knowledge |
| 🏢 Enterprise KM | Atomic management of SOPs and manuals |
| 🎓 Education | Personalized learning with textbooks |
| 🔬 Research | Extract core methods from papers |
| 🤖 AI Agents | Inject professional knowledge |
| Metric | Target | Description |
|---|---|---|
| Atom Extraction | >95% | No knowledge loss |
| Methodology Accuracy | >90% | Correct executable methods |
| Retrieval Recall | >85% | Relevant knowledge found |
| Processing Speed | 50 pages/min | PDF processing efficiency |
- 🌐 ClawHub Skill: Search "atomic-rag" on ClawHub
- 📚 Documentation: SKILL.md
- 🎓 Education Platform: XueLaiXueQu
XueLaiXueQu Learning Community - Making AI truly learn
MIT License - Free to use, welcome contributions!
Made with ❤️ by XueLaiXueQu AI Team