An LLM-powered documentation automation tool for StarRocks that automatically extracts metadata from source code and generates multi-language technical documentation.
- π Automated Extraction: Automatically extract metadata for configs, variables, and functions from StarRocks source code
- π€ Intelligent Generation: LLM-powered generation of descriptions, parameter explanations, and usage examples
- π Multi-language Support: Support for Chinese, English, and Japanese with intelligent translation routing
- π Consistent Styling: Aligned with official StarRocks documentation style
- π§ Extensible Architecture: Generic Protocol-based Pipeline design for easy extension
- π οΈ Tool-Enhanced: Integrated code search tools for more accurate context
- π¦ Version Tracking: Automatically track when configs/variables/functions were first introduced across branches
| Type | Description | Status |
|---|---|---|
| FE Config | Frontend configuration documentation | β |
| BE Config | Backend configuration documentation | β |
| System Variables | Session/Global variables documentation | β |
| SQL Functions | Scalar/Aggregate/Window functions documentation | β |
DocsAgent adopts a Protocol-based Pipeline architecture that emphasizes:
- Duck Typing: Using Python Protocols instead of inheritance for flexibility
- Generic Pipeline: Type-safe pipeline that works with any
DocumentableItem - Domain Separation: Each document type (config/variable/function) is a separate domain
- 3-Stage Flow: Extractor β Generator β Persister pattern for all domains
graph TB
A[Source Code] --> B[Extractor]
B --> |Meta JSON| C[Meta Files]
C --> D[Version Tracker]
D --> |Version Info| E[Generator]
E --> |LLM| F[English Docs]
F --> G[Translation Agent]
G --> |LLM| H[Multi-language Docs]
H --> I[Persister]
I --> J[Git Commit]
J --> K[Create PR]
subgraph "Stage 1: Extraction"
A
B
C
D
end
subgraph "Stage 2: Generation"
E
F
G
H
end
subgraph "Stage 3: Persistence"
I
J
K
end
- Python 3.10+
- Poetry (package manager)
- StarRocks source code (for metadata extraction)
- LLM API key (OpenAI/Anthropic/Google)
# Clone the repository
git clone https://github.com/StarRocks/DocsAgent.git
cd DocsAgent
# Install dependencies
# `brew install poetry` on mac, or similar on other OS may be needed
pip install poetry
# Activate virtual environment
# two ways:
# 1. manual source the poetry env
# 2. install shell plugin, using poetry shell
$(poetry env activate)
# poetry shell
# Install DocsAgent
poetry installCreate configuration file from template:
cp conf/example.conf conf/agent.confKey configuration options:
# StarRocks source code path (required)
STARROCKS_HOME=/path/to/starrocks
# LLM configuration
# e.g:
# openai:gpt-4
# anthropic:claude-3-sonnet-20240229
# google:gemini-pro
LLM_MODEL=openai:gpt-4o-mini
LLM_API_KEY=your_api_key
# need config if llm isn't OpenAI/Gemini/Claude
# LLM_URL=https://api.openai.com/v1
# LLM_PROVIDER=openai
LLM_TEMPERATURE=0.1
LLM_MAX_TOKENS=5000
# Output configuration
DOCS_OUTPUT_DIR=./output
META_DIR=./meta
TARGET_LANGS=["en", "zh", "ja"]
# StarRocks client (for SQL validation)
SR_HOST=localhost
SR_PORT=9030
SR_USER=root
SR_PASSWORD=
# Logging
LOG_DIR=./logs
LOG_LEVEL=INFO
# Git and GitHub configuration
GITHUB_TOKEN= # GitHub personal access token for creating PRs
GITHUB_REPO=StarRocks/starrocks # GitHub repository in format 'owner/repo' (e.g., 'StarRocks/starrocks')Note: Configuration priority is: Environment variables > Config file > Defaults
| Argument | Description |
|---|---|
-e, --extract |
Extract metadata from source code |
-g, --generate |
Generate documentation |
-m, --meta |
Generate metadata without generating docs |
-t, --type |
Document type (fe_config/be_config/variables/functions) |
--config |
Configuration file path |
-f, --force_search_code |
Force code re-search and update the item's usage |
-i, --ignore_miss_usage |
Ignore variable/config when missing usage in code |
-wl, --without-llm |
Run without LLM (use existing docs) |
-l, --limit |
Limit number of items to process |
--ci |
Enable Git commit |
--pr |
Enable Pull Request creation |
-tv, --track-version |
Track versions for items (first-time use) |
# Incremental Mode:
# 1. Extract meta from documents first, to compute the meta for calculate increments (keep the exists docs)
# 2. Generate documents
# Full Mode:
# 1. Generate docuemnts without extract meta from documents
# Example
# FE/BE configs increments
# 1. Extract FE config meta from documentation
python -m docsagent.main -e -t fe_config
# 2. Generate FE config documentation and create git pr
python -m docsagent.main -g -t fe_config --track-version --pr
# FE/BE configs full
# 1. Generate FE config documentation with limit and create git pr
python -m docsagent.main -g -t fe_config -l 10 --track-version --pr
# Variables
# 1. Extract Variables meta from documentation
python -m docsagent.main -e -t variables
# 2. Generate Variables documentation
python -m docsagent.main -g -t variables -tv --ci
# Functions
# 1. Extract Functions meta from documentation
python -m docsagent.main -e -t variables
# 2. Generate Functions documentation without llm generate
python -m docsagent.main -g -t variables -tv -wlFor detailed usage, see dev-guide.md
Documentation is generated in Markdown format with proper formatting:
## enable_materialized_view
- **Type**: Boolean
- **Default**: true
- **Introduced in**: v3.2.0
- **Description**: Whether to enable materialized view feature...Metadata stored in meta/ directory:
// meta/fe_config.meta
{
"items": [
{
"name": "enable_materialized_view",
"type": "boolean",
"default_value": "true",
"version": ["v3.2.0"],
"catalog": "query-engine",
"documents": {
"en": "...",
"zh": "...",
"ja": "..."
}
}
]
}Version tracking results cached in meta/*.version:
{
"metadata": {
"git_version": "a3f5b2c",
"maintained_branches": ["3.2", "3.3", "3.4", "3.5", "4.0"]
},
"versions": {
"enable_materialized_view": {
"3.2": "3.2.0",
"3.3": "3.3.0",
"3.4": "3.4.0"
}
}
}output/
βββ en/ # English documentation
β βββ FE_configuration.md # FE config consolidated
β βββ BE_configuration.md # BE config consolidated
β βββ System_variable.md # Variables consolidated
β βββ functions/ # Function docs
β βββ array-functions/
β β βββ array_append.md
β β βββ array_concat.md
β βββ string-functions/
β β βββ concat.md
β β βββ substring.md
β βββ mathematical-functions/
β βββ abs.md
β βββ sqrt.md
βββ zh/ # Chinese documentation (same structure)
β βββ FE_configuration.md
β βββ functions/
β βββ ...
βββ ja/ # Japanese documentation (same structure)
βββ FE_configuration.md
βββ functions/
βββ ...
This project is licensed under the Apache 2.0 License - see the LICENSE file for details