perf(detectors): quick-reject pre-screen on auth detectors (-31% detector CPU) by aksOps · Pull Request #111 · RandomCodeSpace/codeiq

aksOps · 2026-04-29T13:34:48Z

Summary

Three cross-cutting auth detectors (CertificateAuthDetector,
SessionHeaderAuthDetector, LdapAuthDetector) burn 55% of all detector CPU
on real-world polyglot scans because they run a lines × patterns double loop
on every supported-language file — even files with zero auth keywords.

This PR adds a per-detector PRE_SCREEN Pattern: one regex pass over file
content; if no distinctive literal substring of any underlying pattern is
present, the file cannot match — short-circuit before the line loop.

Measured impact

JFR ExecutionSample profile, JDK 25 Temurin, on a kept 30K-file polyglot
fixture (12 repos under ~/projects/polyglot-bench/: spring-petclinic-ms,
airflow, istio, eShop, angular/components, nuxt, actix/examples, ktor-samples,
nlohmann/json, play-samples, PSScriptAnalyzer, terraform-aws-eks; 14 distinct
languages active including Python, TS, Java, Go, C#, Rust, Kotlin, Scala, etc.):

Detector	Before	After	Δ samples	Δ ~CPU
`CertificateAuthDetector`	244	147	-39.8%	-0.97s
`SessionHeaderAuthDetector`	206	43	-79.1%	-1.63s
`LdapAuthDetector`	47	25	-46.8%	-0.22s
Auth subtotal	497	215	-56.7%	-2.82s
All detectors	902	624	-30.8%	-2.78s

(Each sample ≈ 10ms at JFR's profile setting.)

Why this is safe

PRE_SCREEN is constructed as a regex alternation of every distinctive literal
substring drawn from the existing patterns in ALL_PATTERNS / LANGUAGE_PATTERNS.
Files that don't contain any of those substrings cannot match any underlying
pattern by construction — so the early return DetectorResult.empty() is
identical in observable behavior to running the existing line loop and emitting
zero nodes.

Detection semantics unchanged for files that DO contain at least one keyword:
pre-screen passes, the existing line × patterns logic runs unmodified, same
nodes emitted with the same IDs/labels/properties/line numbers.

Tests

3689 / 0 failures / 0 errors / 32 skipped — same as baseline. All 65 auth
detector tests pass without modification (they all use keyword-bearing
fixtures, which pre-screen lets through). The "no match on plain code"
negative tests still pass — pre-screen rejects (faster path), result is the
same empty DetectorResult.

What's NOT in this PR

No changes to non-auth detectors. The other top-15 are either AST-based
(where the bottleneck is the tree walk, not regex) or already use
single-pass Matcher.find(). Pre-screen's gain is small on those and
the regression risk on AST code paths isn't justified.
No abstract base class refactor. Per-detector PRE_SCREEN keeps blast
radius minimal and the optimization explicit at each call site. If a
pattern emerges across many regex detectors, a follow-up PR can hoist
to AbstractRegexDetector.

Test plan

mvn test -Dtest='*Auth*Test,AuthDetectorsCoverageTest' — 65/65 pass
mvn test (full suite) — 3689/0/0/32 skipped
JFR re-profile on polyglot-bench — verifies the -31% detector CPU
CI green
Auto-merge on green

🤖 Generated with Claude Code

…ctor CPU) Profiling on a 30K-file polyglot fixture (kept at ~/projects/polyglot-bench: spring-petclinic-microservices, airflow, istio, eShop, angular/components, nuxt, actix/examples, ktor-samples, nlohmann/json, play-samples, PSScriptAnalyzer, terraform-aws-eks; 14 distinct languages) showed the three cross-cutting auth detectors burning 55% of all detector CPU because they ran the lines × patterns double loop on every supported-language file — even files with zero auth keywords. Fix: per-detector PRE_SCREEN Pattern with all distinctive literal substrings of the underlying patterns. One regex pass over file content; if no keyword present, the file cannot match — short-circuit before the line loop. Measured impact (JFR ExecutionSample, JDK 25, polyglot fixture): CertificateAuthDetector: 244 → 147 samples (-39.8%, -0.97s CPU) SessionHeaderAuthDetector: 206 → 43 samples (-79.1%, -1.63s CPU) LdapAuthDetector: 47 → 25 samples (-46.8%, -0.22s CPU) Auth subtotal: 497 → 215 samples (-56.7%, -2.82s) All detectors total: 902 → 624 samples (-30.8%, -2.78s) Detection semantics unchanged — pre-screen rejects only files where no underlying pattern can match (keyword absent). Tests covering keyword-bearing fixtures pass through pre-screen and run the existing logic byte-for-byte. Tests: 3689 / 0 failures / 0 errors / 32 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aksOps enabled auto-merge (squash) April 29, 2026 13:34

aksOps merged commit 3bc3ebf into main Apr 29, 2026
13 checks passed

aksOps deleted the perf/auth-detector-pre-screen branch April 29, 2026 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(detectors): quick-reject pre-screen on auth detectors (-31% detector CPU)#111

perf(detectors): quick-reject pre-screen on auth detectors (-31% detector CPU)#111
aksOps merged 1 commit into
mainfrom
perf/auth-detector-pre-screen

aksOps commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aksOps commented Apr 29, 2026

Summary

Measured impact

Why this is safe

Tests

What's NOT in this PR

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant