perf(in-filter): Update in filter to use any operator for performance improvements #165
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://app.devrev.ai/devrev/works/ISS-225995
🚀 Optimize IN and NOT IN Filter Query Generation with String Split Approach
Summary
This PR introduces a significant performance optimization for
INandNOT INfilter operations onstringandnumbertypes by replacing the traditional AST node-per-value approach with astring_split+unnest+ANYsubquery pattern. This dramatically reduces AST size and query generation time, especially for large filter value sets.🎯 Problem Statement
Previously, when filtering with
INorNOT INoperators containing many values (e.g., 1000+ items), the query generation created one AST node per value. This resulted in:✨ Solution
Implemented an optimized approach for
stringandnumbertypes that:§§as delimiterstring_splitto split the string back into an arrayunnestto convert the array into rowsANYsubquery for comparisonThis reduces the AST from O(n) nodes to O(1) nodes, where n is the number of filter values.
📊 Performance Improvements
Query Creation Benchmark Results
Key Highlights
🔧 Changes Made
Modified Files
meerkat-core/src/cube-filter-transformer/in/in.tsstringandnumbertypes&&) forstring_arrayandnumber_arraytypesANYsubquery pattern withunnest(string_split())meerkat-core/src/cube-filter-transformer/not-in/not-in.tsOPERATOR_NOTTest Files
meerkat-core/src/cube-filter-transformer/in/in.spec.tsmeerkat-core/src/cube-filter-transformer/not-in/not-in.spec.tsmeerkat-node/src/__tests__/test-data.tsPackage Versions
meerkat-browser,meerkat-core, andmeerkat-node🎨 Technical Details
Example Transformation
Before: IN filter with 1000 values creates 1000+ AST nodes
column IN (value1, value2, ..., value1000)After: Single string split operation
Type Handling
stringandnumber: Uses optimized string_split approachstring_arrayandnumber_array: Uses array overlap operator (&&)COMPARE_INapproachImplementation Details
The optimization is applied in the
inDuckDbConditionandnotInDuckDbConditionfunctions:§§(section sign) as delimiter - uncommon in normal datastring_splitfunction splits the string back into an arrayCAST(...AS DOUBLE)ANYsubquery withCOMPARE_EQUALfor efficient comparisonOPERATOR_NOT✅ Testing
📈 Impact
This optimization is particularly beneficial for:
🔍 Performance Analysis
AST Size Reduction
The optimization achieves dramatic AST size reductions:
Query Generation Time
Query generation time remains consistently low: