Repo2Run

A robust tool to configure and run repositories with automated dependency management.

Features

Clone repositories from GitHub or use local repositories
Extract dependencies from various sources (requirements.txt, setup.py, pyproject.toml, etc.)
Unify requirements from multiple sources
Install dependencies using either pip/venv (default) or UV (optional)
Find and run tests automatically
Generate detailed reports
Preserves original repository structure
Support for parallel processing of multiple repositories
Unified pipeline mode for efficient batch processing
Test extraction and execution modes
Skip already processed repositories
Configurable timeouts and resource management

Installation

# Clone the repository
git clone https://github.com/yourusername/repo2run.git
cd repo2run

# Install the package
pip install -e .

Usage

Command Line Interface

Basic Usage

# Single repository mode
repo2run --repo username/repo commit-sha [OPTIONS]
repo2run --local /path/to/local/repo [OPTIONS]

# Batch processing mode
repo2run --repo-list repos.txt [OPTIONS]
repo2run --local-list dirs.txt [OPTIONS]

# Test extraction mode
repo2run --repo username/repo commit-sha --extract-tests [OPTIONS]
repo2run --local /path/to/local/repo --extract-tests [OPTIONS]

# Run tests from extracted tests
repo2run --output-dir output_path --run-tests [OPTIONS]

# Global unified pipeline mode
repo2run --global --repo-list repos.txt [OPTIONS]
repo2run --global --local-list dirs.txt [OPTIONS]

Argument Reference

Argument	Description	Default	Example
`--repo FULL_NAME SHA`	Process a specific GitHub repository	None	`--repo octocat/Hello-World abc123`
`--local PATH`	Process a local repository	None	`--local /home/user/projects/myrepo`
`--repo-list FILE`	Process multiple repositories from a list file	None	`--repo-list repos.txt`
`--local-list FILE`	Process multiple local repositories from a list file	None	`--local-list local_repos.txt`
`--global`	Use the unified global pipeline	Disabled	`--global`
`--output-dir DIR`	Directory to store output files	`output`	`--output-dir ./results`
`--workspace-dir DIR`	Directory to use as workspace	Temporary directory	`--workspace-dir ./workspace`
`--timeout SECONDS`	Maximum execution time	1800 (0.5 hours)	`--timeout 3600`
`--verbose`	Enable detailed logging	Disabled	`--verbose`
`--overwrite`	Overwrite existing output directory	Disabled	`--overwrite`
`--use-uv`	Use UV for dependency management	Disabled (uses pip/venv)	`--use-uv`
`--num-workers N`	Number of parallel processing workers	Number of CPU cores	`--num-workers 4`
`--max-workers N`	Number of worker threads for global mode	4	`--max-workers 8`
`--repo-range START END`	Process only a range of repositories	None (all repos)	`--repo-range 0 100`
`--collect-only`	Only collect test cases without running	Disabled	`--collect-only`
`--skip-processed`	Skip already processed repositories	Disabled	`--skip-processed`
`--extract-tests`	Extract test files without running	Disabled	`--extract-tests`
`--run-tests`	Run tests from extracted test.jsonl	Disabled	`--run-tests`

Unified Pipeline

The Unified Pipeline is a new workflow that:

Analyzes dependencies across all repositories (ignoring versions) and creates a union set
Installs all dependencies in a single virtual environment
Runs tests for each repository and identifies those that pass all tests or have no tests

This approach is more efficient when processing multiple repositories with overlapping dependencies.

Usage

# Process repositories from a list file
repo2run --global --repo-list repos.txt --output-dir output_path [options]

# Process local directories from a list file
repo2run --global --local-list dirs.txt --output-dir output_path [options]

# Run pipeline in separate stages
repo2run --global --repo-list repos.txt --output-dir output_path --extract-dep
repo2run --global --repo-list repos.txt --output-dir output_path --config-venv
repo2run --global --repo-list repos.txt --output-dir output_path --run-test

Output Files

The Unified Pipeline generates the following output files:

requirements.txt: Union of all dependencies across repositories (without version specifiers)
repo_req.json: Mapping of repositories to their required dependencies
install_status.json: Status of dependency installation (success/failure)
records.jsonl: Detailed logs and execution status for each repository
successful_repos.json: List of repositories that pass all tests or have no tests
test.jsonl: Extracted test files and metadata (when using --extract-tests)
test_results.jsonl: Results of running tests (when using --run-tests)

Repository List File Format

For --repo-list and --local-list, use the following format:

# repos.txt or local_repos.txt
# Format: repository_identifier commit_sha
octocat/Hello-World abc123
another/repo def456
# Lines starting with # are comments

Advanced Use Cases

Continuous Integration

# In a CI pipeline, you might want to use verbose logging and collect test cases
repo2run --repo username/repo $CI_COMMIT_SHA --output-dir ./ci_results --verbose --collect-only

Performance Testing

# Process multiple repositories with UV and parallel workers
repo2run --repo-list performance_repos.txt --use-uv --num-workers 8 --output-dir ./perf_results

Distributed Processing

# Process repositories in batches across multiple machines
# Machine 1: Process repos 0-99
repo2run --global --repo-list repos.txt --output-dir ./batch1 --repo-range 0 100

# Machine 2: Process repos 100-199
repo2run --global --repo-list repos.txt --output-dir ./batch2 --repo-range 100 200

Python API

from repo2run.utils.repo_manager import RepoManager
from repo2run.utils.dependency_extractor import DependencyExtractor
from repo2run.utils.dependency_installer import DependencyInstaller
from repo2run.utils.test_runner import TestRunner

# Initialize repository
repo_manager = RepoManager(workspace_dir="./workspace")
repo_path = repo_manager.clone_repository("username/repo", "commit-sha")

# Extract dependencies
extractor = DependencyExtractor(repo_path)
requirements = extractor.extract_all_requirements()
unified_requirements = extractor.unify_requirements(requirements)

# Install dependencies using pip/venv (default)
installer = DependencyInstaller(repo_path, use_uv=False)
venv_path = installer.create_virtual_environment()
installation_results = installer.install_requirements(unified_requirements, venv_path)

# Run tests with pip/venv
test_runner = TestRunner(repo_path, venv_path, use_uv=False)
test_results = test_runner.run_tests()

# Or use UV for dependency management
# installer = DependencyInstaller(repo_path, use_uv=True)
# test_runner = TestRunner(repo_path, venv_path, use_uv=True)

Dependency Management Systems

Repo2Run supports two dependency management systems:

pip/venv (Default): Uses the standard Python venv module to create virtual environments and pip for package installation.
- More compatible with a wide range of repositories
- No additional dependencies required
UV (Optional): A fast Python package installer and resolver.
- Significantly faster installation
- Better dependency resolution in complex cases
- Can be enabled with the --use-uv flag

Directory Structure

When processing repositories, Repo2Run creates the following directory structure:

workspace_dir/
├── github/
│   └── username/
│       └── repo_name/
│           ├── (repository files)
│           └── sha.txt
└── local/
    └── repo_name/
        ├── (repository files)
        └── sha.txt

Supported Dependency Sources

requirements.txt
setup.py
pyproject.toml (Poetry and PEP 621)
Pipfile
environment.yml

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
examples		examples
repo2run		repo2run
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_script.md		README_script.md
add_test_groups.py		add_test_groups.py
analyze_result.py		analyze_result.py
analyze_test_results.py		analyze_test_results.py
compare_test_results.py		compare_test_results.py
dedup_repos.py		dedup_repos.py
example_comparison.py		example_comparison.py
explore_test_format.py		explore_test_format.py
filter_and_analyze_tests.py		filter_and_analyze_tests.py
filtered_repos.jsonl		filtered_repos.jsonl
filtered_requirements.txt		filtered_requirements.txt
find_single_folder_paths.py		find_single_folder_paths.py
find_valid_paths.py		find_valid_paths.py
fix.txt		fix.txt
freq100_repos.jsonl		freq100_repos.jsonl
freq100_requirements.txt		freq100_requirements.txt
freq50_repos.jsonl		freq50_repos.jsonl
freq50_requirements.txt		freq50_requirements.txt
get_repos.py		get_repos.py
get_repos.sh		get_repos.sh
get_reqs.py		get_reqs.py
gpu_memory_utilizer.py		gpu_memory_utilizer.py
hdfs_repo_copy.py		hdfs_repo_copy.py
inject_bugs.py		inject_bugs.py
install_requirements.sh		install_requirements.sh
map_repos_to_requirements.py		map_repos_to_requirements.py
pipreqs_3k_install.sh		pipreqs_3k_install.sh
pipreqs_install.sh		pipreqs_install.sh
pipreqs_requirements.txt		pipreqs_requirements.txt
process_repos.sh		process_repos.sh
repo_names_dedup_20250203_requirements.jsonl		repo_names_dedup_20250203_requirements.jsonl
requirements.txt		requirements.txt
run_test.sh		run_test.sh
run_unified_pipeline.py		run_unified_pipeline.py
sample_test.jsonl		sample_test.jsonl
setup.py		setup.py
split_jsonl.py		split_jsonl.py
test.jsonl		test.jsonl
test_skip_processed.py		test_skip_processed.py
utils.py		utils.py
write_test_files.py		write_test_files.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repo2Run

Features

Installation

Usage

Command Line Interface

Basic Usage

Argument Reference

Unified Pipeline

Usage

Output Files

Repository List File Format

Advanced Use Cases

Continuous Integration

Performance Testing

Distributed Processing

Python API

Dependency Management Systems

Directory Structure

Supported Dependency Sources

License

About

Releases

Packages

Languages

License

terryyz/X-Repo2Run

Folders and files

Latest commit

History

Repository files navigation

Repo2Run

Features

Installation

Usage

Command Line Interface

Basic Usage

Argument Reference

Unified Pipeline

Usage

Output Files

Repository List File Format

Advanced Use Cases

Continuous Integration

Performance Testing

Distributed Processing

Python API

Dependency Management Systems

Directory Structure

Supported Dependency Sources

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages