A comprehensive toolkit for Software Bill of Materials (SBOM) generation, analysis, and vulnerability scanning across multiple programming languages and package managers. The vulnerability report can be seen in here
This toolkit provides a complete suite of tools for:
- 🕷️ Repository Crawling: GitHub repository discovery and cloning
- 🔒 Lock File Generation: Automated dependency lock file creation
- 📊 SBOM Analysis: Comparative analysis between different SBOM tools
- 🔍 Vulnerability Scanning: Security analysis of software dependencies
- 📈 Statistics & Reporting: Comprehensive project statistics and insights
# Clone the repository
git clone <this-repo-url>
cd validation
# Install dependencies
pip install -r requirements.txtDownloads repositories from awesome lists and extracts specific files.
Usage:
# Download all repositories from all languages
python crawler.py --language all
# Download only Python repositories
python crawler.py --language python --save_dir ./my_repos
# Download project files only (not full repos)
python crawler.py --language rust --file_modeKey Features:
- Supports 6 languages: Python, Rust, JavaScript, Ruby, PHP, Go
- Async downloading for better performance
- Selective file extraction mode
Core analysis engine that orchestrates SBOM generation and comparison.
Usage:
# Run complete SBOM analysis for Python projects
python main.py --language python --standard cyclonedx --mode lock
# Analyze project files instead of lock files
python main.py --language javascript --mode projectCapabilities:
- Jaccard similarity analysis
- Accuracy computation
- Multi-tool SBOM comparison
Provides interfaces for Syft and Trivy SBOM generation tools.
Usage (programmatic):
from sbom import SBOMComparer, Trivy, Syft
from utils import SBOMStandard
# Initialize tools
trivy = Trivy(standard=SBOMStandard.cyclonedx)
syft = Syft(standard=SBOMStandard.spdx)
# Compare SBOMs
comparer = SBOMComparer(trivy=trivy, syft=syft, output_dir="./output")
comparer.run_comparison("path/to/project")Generates dependency lock files for various package managers.
Usage:
# Generate lock files for Python projects using typer CLI
python -m typer lock_generate.py run --language python --directory /path/to/projects
# Or use as a module
python lock_generate.pySupported Package Managers:
- 🐍 Python: Poetry, pip-tools, pipenv
- 🦀 Rust: Cargo
- 📦 JavaScript: npm, yarn
- 💎 Ruby: Bundler
- 🐘 PHP: Composer
- 🐹 Go: Go modules
Performs vulnerability scanning on SBOM files.
Usage:
# Scan with Grype
python vulnerability_scan.py scan --scanner grype --input /path/to/sbom --output /path/to/results
# Scan with Trivy
python vulnerability_scan.py scan --scanner trivy --input /path/to/sbom --output /path/to/resultsSupported Scanners:
- 🔍 Grype: Anchore's vulnerability scanner
- 🛡️ Trivy: Aqua Security's scanner
Analyzes repository characteristics and package manager usage.
Usage:
# Analyze Python build systems
python repo_stat.py python-build-system-stat --root-dir /path/to/repos
# General repository statistics
python repo_stat.py general-stats --root-dir /path/to/reposAnalysis Types:
- Build system distribution
- Lock file vs project file statistics
- Package manager usage patterns
Locates Poetry-based Python projects in directory structures.
Usage:
# Find all Poetry projects
python find_poetry.py find /path/to/scan
# With custom root directory
python find_poetry.py find --root-dir /data/projectsExtracts GitHub repository URLs from markdown files (like awesome lists).
Usage (programmatic):
from markdown import MarkdownParser
parser = MarkdownParser("https://raw.githubusercontent.com/vinta/awesome-python/master/README.md")
github_urls = parser.url_listCore utilities and language specifications.
Key Components:
LanguageSpec: Enum defining supported languages and file patternsSBOMStandard: SBOM format specifications- Helper functions for project detection
Replaces symbolic links with actual file copies.
Usage:
# Replace symlinks in a directory
python link_removal.py /path/to/directoryMonitors GitHub API rate limits.
Usage:
# Check current rate limit status
python query_limit.pyUtility script for copying specific files (like package-lock.json).
Usage:
# Copy package-lock.json files
python test.py# 1. Download repositories
python crawler.py --language python --save_dir ./repos
# 2. Generate lock files
python lock_generate.py --directory ./repos/python
# 3. Run SBOM analysis
python main.py --language python --sbom_dir ./sboms --input_dir ./repos
# 4. Scan for vulnerabilities
python vulnerability_scan.py scan --scanner grype --input ./sboms --output ./scan_results
# 5. Generate statistics
python repo_stat.py python-build-system-stat --root-dir ./repos# Check GitHub API limits
python query_limit.py
# Find Poetry projects
python find_poetry.py find ./repos
# Analyze specific language
python main.py --language rust --mode lock- rich: Enhanced terminal formatting and progress bars
- prettytable: Table formatting for statistics
- requests: HTTP client for API calls
- pandas: Data analysis and manipulation
- tqdm: Progress bars
- aiohttp: Async HTTP client
- gitpython: Git repository operations
- numpy: Numerical computations
- typer: Modern CLI framework
validation/
├── 🕷️ crawler.py # Repository discovery & download
├── 🏗️ main.py # Core analysis orchestration
├── 📦 sbom.py # SBOM generation tools
├── 🔒 lock_generate.py # Lock file generation
├── 🛡️ vulnerability_scan.py # Security scanning
├── 📊 repo_stat.py # Statistics generation
├── 🔍 find_poetry.py # Poetry project finder
├── 📝 markdown.py # Markdown parsing
├── 🔧 utils.py # Core utilities
├── 🔗 link_removal.py # Symlink management
├── ⚡ query_limit.py # API monitoring
└── 🧪 test.py # Utility scripts
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request