[common] Fix O(2^n) complexity in FileIndexPredicate.getRequiredNames#7332
Open
dubin555 wants to merge 1 commit intoapache:masterfrom
Open
Conversation
Remove redundant child.visit(this) call in getRequiredNames() that caused exponential time complexity for deeply nested OR predicates (e.g. IN clauses). The visitor called child.visit(this) twice per child — once discarding the result, then again using it — doubling work at each tree level. For IN clauses with <= 20 values producing right-nested OR trees of depth N, this caused O(2^N) leaf visits instead of O(N), hanging production CPUs. Closes apache#7230
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #7230
FileIndexPredicate.getRequiredNames()callschild.visit(this)twice per child in itsCompoundPredicatevisitor — once discarding the result, then again to collect it. SincePredicateBuilder.or()produces right-nested binary trees viareduce(), this doubles work at each tree level, resulting in O(2^n) time complexity.For an IN clause with 20 values (which produces a nested OR tree of depth 19), this means ~1,048,576 leaf visits instead of 20. In production, queries with moderately sized IN clauses hang indefinitely.
The fix removes the redundant
child.visit(this)call (line 130), matching the correct pattern already used inPredicateVisitor.FieldNameCollector.The bug was introduced in
ebdfa02bd("[hotfix] Correct visitors for TransformPredicate"), which refactored the visitor to handleTransformPredicateand accidentally left the duplicate call.Tests
FileIndexPredicateTest.testGetRequiredNamesLinearComplexity()— builds a 20-element OR chain, counts leaf visits viaAtomicInteger. Asserts exactly 20 visits (linear). Before fix: 1,048,575 visits (exponential).FileIndexPredicateTest.testGetRequiredNamesPerformance()— builds a 20-element OR chain, asserts completion within 100ms.FileIndexPredicateTest.testGetRequiredNamesBasic()— verifies correctness: all field names are collected from a compound predicate.FileIndexPredicateTest.testGetRequiredNamesSinglePredicate()— verifies single leaf predicate returns the correct field name.API and Format
No.
Documentation
No.
Generative AI tooling
Generated-by: Claude Code 1.0.33