Create sarif-splitter plugin to split SARIF files by categories #20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements a new sarif-splitter plugin that addresses the need to split large SARIF files into smaller, categorized files for better organization and to overcome GitHub Advanced Security upload restrictions.
Problem Solved
Large SARIF files can exceed GitHub's upload size limits and make it difficult to organize security alerts effectively. The new splitter plugin enables teams to:
Key Features
Path-Based Splitting
Split alerts based on file path patterns using glob matching:
Default path categories:
Tests:**/test/**,**/tests/**,**/*test*App:**/web/**,**/api/**,**/src/**,**/app/**Severity-Based Splitting
Split alerts by security severity levels automatically extracted from SARIF rule properties:
Severity mapping:
Single Splitting Method Restriction
The plugin enforces that only one splitting method can be used at a time. Users must choose either
--split-by-pathOR--split-by-severity, not both, to ensure focused and predictable splitting behavior.GitHub Advanced Security Integration
Each split SARIF file includes proper
runAutomationDetails.idcategories following GitHub's conventions:/language:python/category:Tests,/language:python/filter:none/language:python/severity:Critical,/language:python/severity:High,/language:python/severity:Medium,/language:python/severity:LowSummary Output Table
The plugin provides a comprehensive summary table showing before/after views:
Configurable Rules
Custom splitting rules via JSON configuration files:
{ "path_rules": [ { "name": "Frontend", "patterns": ["**/web/**", "**/*.js", "**/*.jsx"] }, { "name": "Backend", "patterns": ["**/api/**", "**/*.py", "**/*.java"] } ] }Technical Implementation
SARIF Model Enhancement
AutomationDetailsModelto support GitHub Advanced Security categoriesRunsModelto includeautomationDetailsfieldRobust Property Access
The plugin handles various SARIF property formats for security-severity extraction:
No Alert Loss Guarantee
All alerts are preserved through fallback categories:
/language:<lang>/filter:none/language:<lang>/severity:OthersUsage Examples
Basic splitting (single method only):
# Split by severity levels only python -m sariftoolkit --enable-splitter \ --split-by-severity \ --language javascript --sarif scan-results.sarif \ --output ./categorized-resultsCustom configuration:
Bug Fixes
This PR also fixes an existing dataclass configuration bug that was preventing the toolkit from running:
Testing
The implementation has been thoroughly tested with:
runAutomationDetails.idformatting for GitHub Advanced SecurityAll generated SARIF files maintain complete metadata while properly categorizing alerts for improved dashboard organization with zero alert loss.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.