GitHub - TencentCloudADP/youtu-rag: Youtu-RAG: Next-Generation Agentic Intelligent Retrieval-Augmented Generation System

✨ Key Features • 📖 Usage Examples • 🚀 Quick Start • 📊 Benchmarks

Youtu-RAG is a next-generation agentic retrieval-augmented generation system built on the "Local Deployment · Autonomous Decision · Memory-Driven" paradigm. With autonomous decision-making and memory learning capabilities, it represents best practices for personal local knowledge base management and Q&A systems.

Core Concepts:

Local Deployment: All components support local deployment with data staying within domain. Integrated with MinIO object storage for large-scale file local management.
Autonomous Decision: Agents autonomously determine whether to retrieve, how to retrieve, and when to call memory, selecting optimal strategies based on question types and historical experience.
Memory-Driven: Dual-layer memory mechanism (short-term conversational memory + long-term knowledge accumulation) enables continuous learning and self-evolution of Q&A experience.

Traditional RAG systems follow a fixed pipeline of "offline chunking - vector retrieval - concatenation generation," facing core bottlenecks such as privacy risks, memory loss, and rigid retrieval. Youtu-RAG aims to upgrade the system from a passive retrieval tool to an intelligent retrieval-augmented generation system with autonomous decision and memory learning capabilities.

✨ Key Features

📁 File-Centric Architecture

File-based knowledge organization, supporting multi-source heterogeneous data including PDF, Excel, Images, and Databases

Supported Formats: PDF/Word/MD Excel IMAGE Database +12 formats

🎯 Adaptive Retrieval Engine

Autonomous decision-making on the optimal retrieval strategy, supporting a variety of tool calls such as web search, vector retrieval, metadata filtering, database queries, and code execution

Retrieval Modes: Autonomous Decision Tool Call Diversified Data Sources

🧠 Dual-Layer Memory

Short-term conversational memory + long-term cross-session knowledge accumulation, achieving Q&A experience learning

Memory Types: Short-Term Memory Long-Term Memory Q&A Learning

🤖 Ready-to-Use Agents

From simple conversations to complex orchestrations, covering a wide range of application-level scenarios. Supports over 8 AI agents including Web Search, KB Search, Meta Retrieval, Excel Agent, and Text2SQL

Application Scenarios: Ready-to-Use Diverse Scenarios Complex Task Collaboration

🎨 Lightweight WebUI

Pure native HTML + CSS + JavaScript implementation, framework-free. Supports file upload, knowledge base management, AI dialogue, document preview, and complete functionality

Technical Features: Zero Dependencies Streaming Response Easy Operation

🔐 Security & Control

All related components support local deployment, data stays within domain. Integrated with MinIO object storage for large-scale file local management

Security Guarantees: Local Deployment Data Isolation MinIO Storage

📖 Usage Examples

1️⃣ File Management

File Upload and Preview

Access frontend interface http://localhost:8000
Click "File Management" in the left sidebar
Click "Upload File"
Based on file type and file management configuration, files will be processed through different paths and generate previewable content

File Upload Example Automatic Metadata extraction and summary generation	PDF File Post-Processing Preview Requires OCR configuration support
01_upload_file.mp4	02_pdf_file_example.mp4
PNG File Post-Processing Preview Requires OCR configuration support	HiChunk Parsing Preview Requires HiChunk configuration support
03_png_file_example.mp4	04_hichunk_example.mp4

File Batch Management

When OCR and HiChunk configurations are enabled, the parsing phase of document uploading will incur additional time consumption. It is recommended to use single-file import for such files (batch import will result in longer waiting times).

File Batch Deletion and Upload It is recommended to batch import files of the same type at once	File Metadata Batch Editing Supports batch export, editing, and import	File Search Supports filename, Metadata, summary, etc.
06_batch_delete_upload.mp4	05_metadata_export_import.mp4	07_file_search.mp4

2️⃣ Knowledge Base Management

Knowledge Base Creation and Deletion

Access frontend interface http://localhost:8000
Click "Knowledge Base" in the left sidebar
Click the "Create Knowledge Base" button
Fill in the knowledge base name (e.g., Technical Documentation)
Click confirm to create

Knowledge Base Creation and Deletion Only supports single knowledge base operation	Knowledge Base Search Supports knowledge base name and Description search
08_kb_create_delete.mp4	09_kb_search.mp4

Knowledge Base Content Association and Vectorization Construction

File Association: Associate uploaded files to knowledge base
Database Association: Associate local database to knowledge base
Example Association: Associate example Q&A pairs to knowledge base (as experience information)

💡 Tips: After completing each association configuration, you need to click the Save Association button to save the association configuration and avoid losing previous selections

File Association Multiple files can be selected for association at once	Database Association Supports Sqlite and MySQL	Example Association Supports association of example Q&A pairs
10_kb_file_association.mp4	11_kb_db_association.mp4	12_kb_qa_example.mp4
Knowledge Base Configuration View View association configuration and construction configuration	Knowledge Base Vectorization Construction Unified construction of different types of associated content	Knowledge Base Association Editing Supports editing and updating of associated content
13_kb_config_show.mp4	14_kb_build.mp4	15_kb_modify.mp4

3️⃣ Intelligent Dialogue

You can select configured Agents for different tasks to conduct dialogues or Q&A:
- Some agents can only be used after selecting a knowledge base or file
- Provides temporary file upload button, supporting temporary file upload for Q&A, but the file will only be automatically associated with the current knowledge base and will not undergo vector construction
In the frontend dialogue interface, turn on the "Memory" switch in the lower right corner to enable the dual-layer memory mechanism. After enabling memory, the Agent will have:
- Short-Term Memory: Remember conversation context to avoid repeated questioning
- Long-Term Memory: Accumulate successful experiences, prioritizing reuse when encountering similar questions next time

💬 Chat Agent Chat Agent It is recommended to enable "Memory" to support multi-turn conversations	🔍 Web Search Agent Supports web search Can access links to explore detailed content and answer
16_chat_agent.mp4	17_websearch_agent.mp4
📚 KB Search Agent Must select knowledge base Supports vector retrieval and reranking	📚 Meta Retrieval Agent Must select knowledge base Supports vector retrieval and reranking Supports question intent parsing and metadata filtering
18_kbsearch_agent.mp4	19_meta_retrieval_agent.mp4
📄 File QA Agent Must select knowledge base and file Supports Python reading and processing file content Supports vector retrieval and reranking	📊 Excel Agent Must select knowledge base and file Question decomposition and data processing step breakdown Python code execution and reflection
20_file_qa_agent.mp4	22_excel_agent.mp4
💻 Text2SQL Agent Must select knowledge base with associated database Question decomposition and SQL code generation and execution SQL query result display and reflection	🧠 Short and Long-Term Memory Short-term memory: Takes effect within Session, used to support multi-turn conversations Long-term memory: Long-term effectiveness, used to accumulate successful experiences
21_text2sql_agent.mp4	23_memory_chat_2.mp4
🧐 Text2SQL Agent with Memory Short-term memory takes effect within Session Long-term memory can avoid additional token consumption for similar questions	🎯 QA Learning Record QA examples Automatically learn Agent routing strategies
25_text2sql_memory.mp4	26_qa_learning.mp4

🚀 Quick Start

Environment Requirements

Python: 3.12+
Package Manager: Recommended to use uv
Operating System: Linux Desktop / macOS / Windows

📦 Object Storage (MinIO) Configuration

MinIO is a high-performance object storage service used to store uploaded document files (still locally managed).

For installation instructions, please refer to the official MinIO repository. Two installation methods are supported:

Install from Source: Build and install MinIO from source code
Build Docker Image: Deploy MinIO using Docker containers

⚙️ Model Deployment

Model	HuggingFace	Deployment Method	Required
Youtu-Embedding	HuggingFace	Deployment Docs	✅ Required, or other Embedding API services
Youtu-Parsing	HuggingFace	Deployment Docs	⭕ Optional
Youtu-HiChunk	HuggingFace	Deployment Docs	⭕ Optional

One-Click Installation of Youtu-RAG System

git clone https://github.com/TencentCloudADP/youtu-rag.git
cd youtu-rag
uv sync
source .venv/bin/activate
cp .env.example .env

Configure Necessary Environment Variables

Edit the .env file and fill in the following core configurations:

# =============================================
# LLM Configuration (Required)
# =============================================
UTU_LLM_TYPE=chat.completions
UTU_LLM_MODEL=deepseek-chat
UTU_LLM_BASE_URL=https://api.deepseek.com/v1
UTU_LLM_API_KEY=your_deepseek_api_key  # Replace with your API Key

# =============================================
# Embedding Configuration (Required)
# =============================================
# Option 1: Local Service (Youtu-Embedding-2B)
UTU_EMBEDDING_URL=http://localhost:8081
UTU_EMBEDDING_MODEL=youtu-embedding-2B

# Option 2: Other Embedding API Services
# UTU_EMBEDDING_URL=https://api.your-embedding-service.com
# UTU_EMBEDDING_API_KEY=your_api_key
# UTU_EMBEDDING_MODEL=model_name

# =============================================
# Reranker Configuration (Optional, improves retrieval accuracy)
# =============================================
UTU_RERANKER_MODEL=jina-reranker-v3
UTU_RERANKER_URL=https://api.jina.ai/v1/rerank
UTU_RERANKER_API_KEY=your_jina_api_key 

# =============================================
# OCR Configuration (Optional, locally deployable Youtu-Parsing)
# =============================================
UTU_OCR_BASE_URL=https://api.ocr.com/ocr
UTU_OCR_MODEL=youtu-ocr

# =============================================
# Chunk Configuration (Optional, locally deployable Youtu-HiChunk)
# =============================================
UTU_CHUNK_BASE_URL=https://api.hichunk.com/chunk
UTU_CHUNK_MODEL=hichunk

# =============================================
# Memory Function (Optional)
# =============================================
memoryEnabled=false  # Set to true to enable dual-layer memory mechanism

Note: If you don't need OCR and Chunk features, you can disable them by setting ocr enabled: false and chunk enabled: false in configs/rag/file_management.yaml.

Start Service

# Method 1: Using startup script (Recommended)
bash start.sh

# Method 2: Directly using uvicorn
uv run uvicorn utu.rag.api.main:app --reload --host 0.0.0.0 --port 8000

After successful startup, access the following addresses:

📱 Frontend Interface: http://localhost:8000
📊 Monitoring Dashboard: http://localhost:8000/monitor

📊 Benchmarks

Youtu-RAG provides a complete evaluation system, supporting multi-dimensional capability verification.

🗄️ Structured Retrieval (Text2SQL)

Capability: Natural language to SQL, Schema understanding, SQL execution
Dataset: Self-built Text2SQL dataset (Multi-table, Complex excel, Domain table)
Metric: Accuracy (LLM Judge)

Dataset Overview	Dataset	Multi-table-mini	Complex Excel	Multi-table	Domain Table
	Data Volume	245	931	1,390	100
	Type	Multi-table	Complex Questions	Multi-table Full	Domain Knowledge
Baseline	Vanna	45.71%	38.64%	35.11%	9.00%
🎯 Youtu-RAG	Text2SQL Agent	69.39% ↑	57.36% ↑	67.27% ↑	27.00% ↑

📊 Semi-Structured Retrieval (Excel)

Capability: Table understanding, data analysis, non-standard table parsing
Dataset: Self-built Excel Q&A dataset (500 test questions)
Metrics: LLM Judge
- Accuracy: Factual correctness of answers
- Analysis Depth: Analysis quality and insight of answers
- Feasibility: Whether generated code/solutions are executable
- Aesthetics: Visual quality of visualization charts

Category	Methods	Accuracy	Analysis Depth	Feasibility	Aesthetics
Baselines	TableGPT2-7B	8.4	5.1	4.3	6.2
	StructGPT	6.22	3.84	3.12	4.5
	TableLLM-7B	4.1	2.1	1.8	2.3
	ST-Raptor	22.4	6.0	7.4	12.4
	TreeThinker	31.0	22.8	21.4	36.8
	Code Loop	27.5	9.5	14.9	20.4
🎯 Youtu-RAG	Excel Agent	37.5 ↑	30.2 ↑	27.6 ↑	42.6 ↑

📖 Reading Comprehension (Long Documents)

FactGuard: Long document single-point fact checking, information extraction, reasoning verification
Sequential-NIAH: Long document multi-point information extraction, sequential information extraction

Dataset Overview	Dataset	FactGuard	Sequential-NIAH
	Data Volume	700	2,000
	Type	Long-text Q&A (Single-point)	Long-text Q&A (Multi-point)
Baselines	Naive Retrieval Top3	79.86%	14.20%
	Naive Retrieval Top5	80.71%	29.75%
	Naive Retrieval Top10	82.71%	57.25%
	Naive Retrieval Top15	83.00%	70.15%
🎯 Youtu-RAG	KB Search Agent	88.27% ↑	85.05% ↑
🎯 Youtu-RAG	File QA Agent	88.29% ↑	60.80% *

Note: *Reading full documents in long context is a known weakness of LLMs, which aligns with the experimental findings in Sequential-NIAH.

🏷️ Metadata Retrieval

Capability: Question preference understanding, metadata filtering and reranking, vector retrieval
Dataset: Self-built metadata retrieval dataset
Metrics:
- Weighted NDCG@5: Metric for recalling truly relevant documents in accurate order within the top 5 retrieval results
- Recall@all: How many of all truly relevant documents are accurately recalled

Dataset	Data Volume	Metric	Baseline (Naive Retrieval)	Youtu-RAG (Meta Retrieval Agent)	Improvement
Timeliness Preference	183	Recall@all	34.52%	41.92%	+7.40% ↑
Timeliness Preference	183	NDCG_w@5	29.91%	43.57%	+13.66% ↑
Popularity Preference	301	Recall@all	26.19%	47.20%	+21.01% ↑
Popularity Preference	301	NDCG_w@5	29.86%	54.31%	+24.45% ↑
Average	483	Recall@all	29.34%	45.21%	+15.87% ↑
Average	483	NDCG_w@5	29.88%	50.25%	+20.37% ↑

Memoria-Bench (Under Review, To Be Released)

Memoria-Bench is the industry's first agent memory evaluation benchmark that distinguishes between semantic memory, episodic memory, and procedural memory, and is adapted to high information density scenarios such as in-depth research, table Q&A, and complex code analysis and completion.

Core Features:

📚 Semantic Memory Evaluation: Knowledge understanding and application
📖 Episodic Memory Evaluation: Historical dialogue retrospection
🔧 Procedural Memory Evaluation: learning and reuse
🎯 Scenario Coverage: Research report generation, data analysis, code completion

💡 Tips: The Memoria-Bench evaluation benchmark is under review, stay tuned!

🤝 Contributing Guidelines

We welcome any form of contribution! Including but not limited to:

🐛 Report Bugs and Issues
💡 Propose New Feature Suggestions
📝 Improve Documentation
🔧 Submit Code Improvements

For detailed development process and specifications, please refer to CONTRIBUTING.md.

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Youtu-RAG builds upon the excellent work of several open-source projects:

Youtu-Agent: Agent framework
Youtu-LLM：LLM model
Youtu-Embedding: Chinese vector encoder model
Youtu-Parsing: Document parsing model
Youtu-HiChunk: Hierarchical document chunking model
FactGuard: Benchmark of long document single-point fact checking, information extraction, reasoning verification
Sequential-NIAH: Benchmark of long document multi-point information extraction, sequential information extraction

Special thanks to all developers who contributed code, suggestions, and reported issues to this project!

📚 Citation

If this project is helpful to your research or work, please cite:

@software{Youtu-RAG,
  author = {Tencent Youtu Lab},
  title = {Youtu-RAG: Next-Generation Agentic Intelligent Retrieval-Augmented Generation System},
  year = {2026},
  url = {https://github.com/TencentCloudADP/youtu-rag}
}

⭐ If this project is helpful to you, please give us a Star!

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
configs		configs
docs		docs
frontend/rag_webui		frontend/rag_webui
integrations/ADG		integrations/ADG
scripts		scripts
tests		tests
utu		utu
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
pyproject.toml		pyproject.toml
start.sh		start.sh
uv.lock		uv.lock

License

TencentCloudADP/youtu-rag

Folders and files

Latest commit

History

Repository files navigation