Skip to content

StarRocks/DocsAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

57 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DocsAgent

An LLM-powered documentation automation tool for StarRocks that automatically extracts metadata from source code and generates multi-language technical documentation.

Python Poetry License

✨ Features

  • πŸš€ Automated Extraction: Automatically extract metadata for configs, variables, and functions from StarRocks source code
  • πŸ€– Intelligent Generation: LLM-powered generation of descriptions, parameter explanations, and usage examples
  • 🌍 Multi-language Support: Support for Chinese, English, and Japanese with intelligent translation routing
  • πŸ“ Consistent Styling: Aligned with official StarRocks documentation style
  • πŸ”§ Extensible Architecture: Generic Protocol-based Pipeline design for easy extension
  • πŸ› οΈ Tool-Enhanced: Integrated code search tools for more accurate context
  • πŸ“¦ Version Tracking: Automatically track when configs/variables/functions were first introduced across branches

πŸ“‹ Supported Document Types

Type Description Status
FE Config Frontend configuration documentation βœ…
BE Config Backend configuration documentation βœ…
System Variables Session/Global variables documentation βœ…
SQL Functions Scalar/Aggregate/Window functions documentation βœ…

πŸ—οΈ Architecture

Design Philosophy

DocsAgent adopts a Protocol-based Pipeline architecture that emphasizes:

  • Duck Typing: Using Python Protocols instead of inheritance for flexibility
  • Generic Pipeline: Type-safe pipeline that works with any DocumentableItem
  • Domain Separation: Each document type (config/variable/function) is a separate domain
  • 3-Stage Flow: Extractor β†’ Generator β†’ Persister pattern for all domains

Workflow

graph TB
    A[Source Code] --> B[Extractor]
    B --> |Meta JSON| C[Meta Files]
    C --> D[Version Tracker]
    D --> |Version Info| E[Generator]
    E --> |LLM| F[English Docs]
    F --> G[Translation Agent]
    G --> |LLM| H[Multi-language Docs]
    H --> I[Persister]
    I --> J[Git Commit]
    J --> K[Create PR]
    
    subgraph "Stage 1: Extraction"
        A
        B
        C
        D
    end
    
    subgraph "Stage 2: Generation"
        E
        F
        G
        H
    end
    
    subgraph "Stage 3: Persistence"
        I
        J
        K
    end
Loading

πŸš€ Quick Start

Requirements

  • Python 3.10+
  • Poetry (package manager)
  • StarRocks source code (for metadata extraction)
  • LLM API key (OpenAI/Anthropic/Google)

Installation

# Clone the repository
git clone https://github.com/StarRocks/DocsAgent.git
cd DocsAgent

# Install dependencies
# `brew install poetry` on mac, or similar on other OS may be needed
pip install poetry

# Activate virtual environment
# two ways:
# 1. manual source the poetry env
# 2. install shell plugin, using poetry shell 
$(poetry env activate)
# poetry shell


# Install DocsAgent
poetry install

Configuration

Create configuration file from template:

cp conf/example.conf conf/agent.conf

Key configuration options:

# StarRocks source code path (required)
STARROCKS_HOME=/path/to/starrocks

# LLM configuration
# e.g:
#  openai:gpt-4
#  anthropic:claude-3-sonnet-20240229
#  google:gemini-pro
LLM_MODEL=openai:gpt-4o-mini
LLM_API_KEY=your_api_key

# need config if llm isn't OpenAI/Gemini/Claude
# LLM_URL=https://api.openai.com/v1 
# LLM_PROVIDER=openai
LLM_TEMPERATURE=0.1
LLM_MAX_TOKENS=5000

# Output configuration
DOCS_OUTPUT_DIR=./output
META_DIR=./meta
TARGET_LANGS=["en", "zh", "ja"]

# StarRocks client (for SQL validation)
SR_HOST=localhost
SR_PORT=9030
SR_USER=root
SR_PASSWORD=

# Logging
LOG_DIR=./logs
LOG_LEVEL=INFO

# Git and GitHub configuration
GITHUB_TOKEN=  # GitHub personal access token for creating PRs
GITHUB_REPO=StarRocks/starrocks  # GitHub repository in format 'owner/repo' (e.g., 'StarRocks/starrocks')

Note: Configuration priority is: Environment variables > Config file > Defaults

Basic Usage

Command Line Arguments

Argument Description
-e, --extract Extract metadata from source code
-g, --generate Generate documentation
-m, --meta Generate metadata without generating docs
-t, --type Document type (fe_config/be_config/variables/functions)
--config Configuration file path
-f, --force_search_code Force code re-search and update the item's usage
-i, --ignore_miss_usage Ignore variable/config when missing usage in code
-wl, --without-llm Run without LLM (use existing docs)
-l, --limit Limit number of items to process
--ci Enable Git commit
--pr Enable Pull Request creation
-tv, --track-version Track versions for items (first-time use)

Usage Examples

# Incremental Mode: 
# 1. Extract meta from documents first, to compute the meta for calculate increments (keep the exists docs)
# 2. Generate documents

# Full Mode:
# 1. Generate docuemnts without extract meta from documents

# Example
# FE/BE configs increments
# 1. Extract FE config meta from documentation
python -m docsagent.main -e -t fe_config

# 2. Generate FE config documentation and create git pr 
python -m docsagent.main -g -t fe_config --track-version --pr

# FE/BE configs full
# 1. Generate FE config documentation with limit and create git pr 
python -m docsagent.main -g -t fe_config -l 10 --track-version --pr

# Variables
# 1. Extract Variables meta from documentation
python -m docsagent.main -e -t variables

# 2. Generate Variables documentation
python -m docsagent.main -g -t variables -tv --ci

# Functions
# 1. Extract Functions meta from documentation
python -m docsagent.main -e -t variables

# 2. Generate Functions documentation without llm generate
python -m docsagent.main -g -t variables -tv -wl

πŸ”§ Development Guide

For detailed usage, see dev-guide.md

πŸ“Š Output Examples

Generated Documentation

Documentation is generated in Markdown format with proper formatting:

## enable_materialized_view

- **Type**: Boolean
- **Default**: true
- **Introduced in**: v3.2.0
- **Description**: Whether to enable materialized view feature...

Metadata Files

Metadata stored in meta/ directory:

// meta/fe_config.meta
{
  "items": [
    {
      "name": "enable_materialized_view",
      "type": "boolean",
      "default_value": "true",
      "version": ["v3.2.0"],
      "catalog": "query-engine",
      "documents": {
        "en": "...",
        "zh": "...",
        "ja": "..."
      }
    }
  ]
}

Version File Structure

Version tracking results cached in meta/*.version:

{
  "metadata": {
    "git_version": "a3f5b2c",
    "maintained_branches": ["3.2", "3.3", "3.4", "3.5", "4.0"]
  },
  "versions": {
    "enable_materialized_view": {
      "3.2": "3.2.0",
      "3.3": "3.3.0",
      "3.4": "3.4.0"
    }
  }
}

Directory Structure of Output

output/
β”œβ”€β”€ en/                                 # English documentation
β”‚   β”œβ”€β”€ FE_configuration.md             # FE config consolidated
β”‚   β”œβ”€β”€ BE_configuration.md             # BE config consolidated
β”‚   β”œβ”€β”€ System_variable.md              # Variables consolidated
β”‚   └── functions/                      # Function docs
β”‚       β”œβ”€β”€ array-functions/
β”‚       β”‚   β”œβ”€β”€ array_append.md
β”‚       β”‚   └── array_concat.md
β”‚       β”œβ”€β”€ string-functions/
β”‚       β”‚   β”œβ”€β”€ concat.md
β”‚       β”‚   └── substring.md
β”‚       └── mathematical-functions/
β”‚           β”œβ”€β”€ abs.md
β”‚           └── sqrt.md
β”œβ”€β”€ zh/                                 # Chinese documentation (same structure)
β”‚   β”œβ”€β”€ FE_configuration.md
β”‚   └── functions/
β”‚       └── ...
└── ja/                                 # Japanese documentation (same structure)
    β”œβ”€β”€ FE_configuration.md
    └── functions/
        └── ...

πŸ“„ License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •