ML4SE

Overview

ML4SE is a RAG-based Multi-Agent System designed to automatically generate comprehensive README.md files for local GitHub repositories. It utilizes a graph-based workflow to orchestrate specialized agents that analyze code, plan documentation structure, write content, and review the output.

Features

Repository Profiling: Analyzes the codebase structure and extracts key information.
Intelligent Planning: Creates a tailored outline for the README based on the repository profile.
Multi-Agent Writing: Uses specialized writers for different sections.
Automated Review: Reviews generated content to ensure quality.
Graph-Based Workflow: Orchestrated using LangGraph for robust state management.

Setup

Installation

Clone the repository

git clone https://github.com/Saleh7127/ML4SE.git
cd ML4SE

Install dependencies
```
pip install -r requirements.txt
```
Configure environment variables

Create a .env file in the root directory with your API keys:
```
OPENAI_API_KEY=your_openai_api_key_here
```

Usage

Step 1: Ingest Repositories

Before generating README files, you need to ingest your repositories into the vector store. The ingestion script supports two modes:

Ingest Multiple Repositories (Default)

Process all repositories in a directory:

python src/ingestion/ingest_repos.py \
--repos-dir /path/to/repos-directory

Or use the default directory (./data/repositories):

python src/ingestion/ingest_repos.py

Ingest a Single Repository

Process a specific repository:

python src/ingestion/ingest_repos.py \
--repos-dir /path/to/single-repo --single-repo

Step 2: Generate README

Once repositories are ingested, generate README files using the main workflow:

With a Custom Plan

Provide your own README structure plan:

python src/workflows/main.py \
--repo_name sample_repository \
--plan my_plan.json

Without a Custom Plan

Let the system automatically create the structure:

python src/workflows/main.py \
--repo_name sample_repository

Command Reference

Ingestion Commands

Command	Description
`ingest_repos.py`	Process repositories for ingestion
`--repos-dir <path>`	Path to repository or directory (default: `./data/repositories`)
`--single-repo`	Treat path as a single repository instead of a directory

Workflow Commands

Command	Description
`main.py`	Generate README for a repository
`--repo_name <name>`	Name of the repository to process
`--plan <file>`	Optional custom plan JSON file

Project Structure

ML4SE/
├── data/                           # Default location for repositories
├── generated_readmes/              # Output directory for generated READMEs
├── generated_readmes_token_stats/  # Token usage statistics
├── scripts/                        # Utility scripts
├── src/
│   ├── agents/                     # Agent implementations
│   ├── evaluation/                 # Evaluation metrics and tools
│   ├── ingestion/                  # Repository ingestion and processing
│   ├── models/                     # Data models and schemas
│   ├── prompts/                    # Prompt templates
│   ├── vector_store/               # Vector database management
│   └── workflows/                  # Main workflow orchestration
└── requirements.txt                # Python dependencies

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
ablation_study		ablation_study
data		data
generated-readmes-token-stats		generated-readmes-token-stats
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluation_results.csv		evaluation_results.csv
my_plan.json		my_plan.json
plan_evaluation_results.csv		plan_evaluation_results.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML4SE

Overview

Features

Setup

Installation

Usage

Step 1: Ingest Repositories

Ingest Multiple Repositories (Default)

Ingest a Single Repository

Step 2: Generate README

With a Custom Plan

Without a Custom Plan

Command Reference

Ingestion Commands

Workflow Commands

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML4SE

Overview

Features

Setup

Installation

Usage

Step 1: Ingest Repositories

Ingest Multiple Repositories (Default)

Ingest a Single Repository

Step 2: Generate README

With a Custom Plan

Without a Custom Plan

Command Reference

Ingestion Commands

Workflow Commands

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages