Skip to content

Optimize Gitleaks to Scan Staged Diff Instead of Entire Staged Files #19

Description

@ductrl

Description

CommitGate currently scans each staged file individually using Gitleaks. This means the entire contents of a staged file are scanned, even when only a few lines were modified.

While this approach is simple and reliable, it can become inefficient for large files and does not fully align with CommitGate's "Least Privilege" and "Fast Feedback" design principles.

Goal

Investigate scanning only staged content while preserving accurate secret detection and finding locations.

Potential approaches include:

  • Scanning git diff --cached via Gitleaks stdin mode
  • Scanning the staged version of files directly from the Git index
  • Hybrid approaches that preserve context for multi-line secrets

Challenges

Accurate Location Reporting

The current implementation provides:

  • File path
  • Start line
  • End line
  • Start column
  • End column

If scanning diffs via stdin, findings may be reported relative to the diff instead of the source file.

Need a strategy to map findings back to:

  • Original file
  • Real line numbers
  • Real columns (if possible)

Detection Accuracy

Scanning only changed lines may reduce context available to Gitleaks and impact detection of:

  • Multi-line secrets
  • Split credentials
  • Context-dependent findings

Need to verify detection quality remains acceptable.

Acceptance Criteria

  • Benchmark current file-based scanning against staged-content scanning
  • Preserve or improve scan performance
  • Preserve accurate file and line number reporting
  • Validate that common secret types are still detected correctly
  • Document performance vs accuracy tradeoffs

Priority

Low

Notes

The current implementation intentionally prioritizes detection accuracy over performance.

This issue is an optimization and should be evaluated after MVP and hackathon deliverables are complete.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions