ML4SE is a RAG-based Multi-Agent System designed to automatically generate comprehensive README.md files for local GitHub repositories. It utilizes a graph-based workflow to orchestrate specialized agents that analyze code, plan documentation structure, write content, and review the output.
- Repository Profiling: Analyzes the codebase structure and extracts key information.
- Intelligent Planning: Creates a tailored outline for the README based on the repository profile.
- Multi-Agent Writing: Uses specialized writers for different sections.
- Automated Review: Reviews generated content to ensure quality.
- Graph-Based Workflow: Orchestrated using LangGraph for robust state management.
-
Clone the repository
git clone https://github.com/Saleh7127/ML4SE.git cd ML4SE -
Install dependencies
pip install -r requirements.txt
-
Configure environment variables
Create a
.envfile in the root directory with your API keys:OPENAI_API_KEY=your_openai_api_key_here
Before generating README files, you need to ingest your repositories into the vector store. The ingestion script supports two modes:
Process all repositories in a directory:
python src/ingestion/ingest_repos.py \
--repos-dir /path/to/repos-directoryOr use the default directory (./data/repositories):
python src/ingestion/ingest_repos.pyProcess a specific repository:
python src/ingestion/ingest_repos.py \
--repos-dir /path/to/single-repo --single-repoOnce repositories are ingested, generate README files using the main workflow:
Provide your own README structure plan:
python src/workflows/main.py \
--repo_name sample_repository \
--plan my_plan.jsonLet the system automatically create the structure:
python src/workflows/main.py \
--repo_name sample_repository| Command | Description |
|---|---|
ingest_repos.py |
Process repositories for ingestion |
--repos-dir <path> |
Path to repository or directory (default: ./data/repositories) |
--single-repo |
Treat path as a single repository instead of a directory |
| Command | Description |
|---|---|
main.py |
Generate README for a repository |
--repo_name <name> |
Name of the repository to process |
--plan <file> |
Optional custom plan JSON file |
ML4SE/
├── data/ # Default location for repositories
├── generated_readmes/ # Output directory for generated READMEs
├── generated_readmes_token_stats/ # Token usage statistics
├── scripts/ # Utility scripts
├── src/
│ ├── agents/ # Agent implementations
│ ├── evaluation/ # Evaluation metrics and tools
│ ├── ingestion/ # Repository ingestion and processing
│ ├── models/ # Data models and schemas
│ ├── prompts/ # Prompt templates
│ ├── vector_store/ # Vector database management
│ └── workflows/ # Main workflow orchestration
└── requirements.txt # Python dependencies