-
Notifications
You must be signed in to change notification settings - Fork 103
Predicate Comparison and Merging Utilities for Predicate Push-Down #668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 4f8154e The changes in this PR will be included in the next version bump. This PR includes changesets to release 12 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
More templates
@tanstack/angular-db
@tanstack/db
@tanstack/db-ivm
@tanstack/electric-db-collection
@tanstack/query-db-collection
@tanstack/react-db
@tanstack/rxdb-db-collection
@tanstack/solid-db
@tanstack/svelte-db
@tanstack/trailbase-db-collection
@tanstack/vue-db
commit: |
Size Change: +4.97 kB (+5.95%) 🔍 Total Size: 88.6 kB
ℹ️ View Unchanged
|
Size Change: 0 B Total Size: 2.36 kB ℹ️ View Unchanged
|
Something I considered while implementing this was to normalise the predicates into a DNF form, but it would potentially explode the size of the predicates, and in a real world situation the predicates are simple and repetitive in structure so I think this implementation makes the right tradeoff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a detailed read through the code in predicate-utils.ts
. My main concern is about the semantics that we get from the way how we merge the predicates.
// For (A AND B) ⊆ (C AND D), we need every conjunct in superset to be implied by subset | ||
// For each conjunct in superset, at least one conjunct in subset must be a subset of it | ||
// OR the entire subset implies it | ||
return superset.args.every((superArg) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to handle the case where both the subset and superset are AND clauses explicitly?
I would expect to handle the case where subset is an AND by doing:
return subset.args.some((subArg) => isWhereSubsetInternal(subArg, superset)
And then there would be a separate (distinct) case that checks if superset
is and AND clause like you already have on L80. That should work just fine because when you have an AND clause in both subset and superset, it will first be handled by the case for the subset which splits the AND clause into N calls to isWhereSubsetInternal (1 call per conjunct of the subset). And then in each call the subset is no longer an AND clause but the superset is so that is handled by the case on L80.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in this case we only need L72-74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how you don't special case the cause where both subset and superset are OR clauses so that confirms that for ANDs you should not have to do that either.
} | ||
|
||
// Handle eq vs in | ||
if (subsetFunc.name === `eq` && supersetFunc.name === `in`) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't handle the case where subset is IN (with no elements or a single element) and superset is EQ. Those are corner cases but could also occur e.g. age IN [ 18 ]
is a subset of age = 18
because they are actually the same.
Also, we don't handle subset that is IN/EQ vs superset that is a range (i.e. comparison operators, e.g. >= 18). Should be fairly straightforward to determine that age = 18
is a subset of age >= 18
and age IN [ 18, 19, 20]
is a subset of age >= 18
. For IN you just need to recursively check each element of the array against the superset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, i don't think we should explicitly handle cases where subset is EQ and superset is IN, and cases where both subset and superset use IN. Those should be treated individually. So, if subset is an IN
then we should just recursively call isWhereSubsetInternal
over the elements of the array:
inElements.every(elem => isWhereSubsetInternal(elem, superset))
Again, that will lead to 1 call per element that is part of the IN clause. In fact, a subset where a IN [ x, y , z]
is equivalent to a = x OR a = y OR a = z
so perhaps we should not handle IN specially at all and just have 1 case that handles both OR and IN (in the subset). Similarly, for IN
in the superset (treat it like OR in superset).
superset: number | undefined | ||
): boolean { | ||
// No limit requirement is always satisfied | ||
if (subset === undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the subset has no limit but the superset has a limit, then it should return false
iiuc but this returns true
. I think this function should be:
return superset === undefined || subset <= superset
? undefined // All unlimited = result unlimited | ||
: Math.min(...limits) // Take most restrictive | ||
|
||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about these semantics, whether this is really what we expect. Let me explain with an example. Imagine 2 queries:
-- Query 1
WHERE age >= 18 LIMIT 1
-- Query 2
WHERE age >= 20 LIMIT 3
Imagine we have 3 users: Alice aged 18, Bob aged 19, Charlie aged 20. Based on this data query 1 should return 1 user: Alice. And query 2 should return also 1 user: Charlie.
So:
Query 1: { Alice }
Query 2: { Charlie }
Intersection of Query 1 and Query 2: { } (empty)
But if we look at intersectPredicates
it will generate this predicate:
WHERE age >= 20 LIMIT 1
It takes age >= 20 because that's the most restrictive clause and it takes limit 1 because that's also the most restrictive. Now based on the data from our example, this intersected query will return 1 user: Charlie. So, the result of this intersected query is different from the intersection of the results... Don't think that we want those semantics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed #668 (comment) with @samwillis. It's a problem for both intersected predicates and unioned predicates. The current approach of merging the predicates by combining the where clauses and picking a limit doesn't work as explained in that comment. We will need to find another way to determine whether or not we have previously loaded all the data that is needed to fulfill the query at hand.
We need to track the disjoint loaded sets by the upper and lower bound in columns (exclusive of upper bound when loaded with a limit), then use the indexes to check that within those bounds we have enough rows to answer the query without going to the server. Marking this as draft as it needs rework. The where expression utils are good, anything using a limit needs changing. |
d24585d
to
cbc5baf
Compare
cbc5baf
to
73525a6
Compare
f7ca08d
to
4f8154e
Compare
* subset logic to predicates. | ||
* | ||
* @example | ||
* const dedupe = new DeduplicatedLoadSubset(myLoadSubset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does feel a bit inconsistent that we have to instantiate a class whereas loadSubset would just be a function.
If we want we can create a function that hides this:
function dedup(loadSubset) {
return new DeduplicatedLoadSubset(loadSubset)
}
stacked on #669
Summary
Implements utilities for comparing and merging predicates (
where
clauses,orderBy
, andlimit
) to support predicate push-down in collection sync operations. Provides a complete solution for tracking loaded data and preventing redundant server requests.Key Features:
where
clauses (AND, OR, comparisons, IN)A AND NOT(B)
with simplificationDate
object support (equality, ranges, IN clauses)false
literal for impossible predicates)Motivation
The
onLoadMore
callback needs to:Without these utilities, the sync layer cannot efficiently track loaded data ranges or prevent duplicate network requests.
What This Implements
Core Predicate Functions
Where Clause Operations:
isWhereSubset(subset, superset)
- Checks if one where clause logically implies anotherintersectWherePredicates(predicates)
- Combines predicates with AND logic (most restrictive)unionWherePredicates(predicates)
- Combines predicates with OR logic (least restrictive)minusWherePredicates(from, subtract)
- Computesfrom AND NOT(subtract)
with simplificationOrderBy & Limit Operations:
isOrderBySubset(subset, superset)
- Validates ordering requirements via prefix matchingisLimitSubset(subset, superset)
- Compares limit constraintsComplete Predicate Operations:
isPredicateSubset(subset, superset)
- Checks all components (where + orderBy + limit)intersectPredicates(predicates)
- Merges predicates with intersection semanticsunionPredicates(predicates)
- Merges predicates with union semanticsDeduplicatedLoadSubset Class
A production-ready wrapper that automatically deduplicates
loadSubset
calls:Features:
How it works:
where
predicate via unionorderBy
/limit
) separately for exact matchingExamples
How It Works
Logical Subset Checking
Uses recursive descent to check logical implications:
Range Simplification
Intersection: Takes most restrictive constraints
age > 10 AND age > 20
→age > 20
age = 5 AND age = 6
→false
literalage IN [1,2] AND age IN [2,3]
→age IN [2]
Union: Takes least restrictive constraints
age > 10 OR age > 20
→age > 10
age = 5 OR age = 10
→age IN [5, 10]
Difference: Simplifies same-field predicates
age > 10 MINUS age > 20
→age > 10 AND age <= 20
age IN [1,2,3] MINUS age IN [2,4]
→age IN [1,3]
Value Type Support
✅ Supported:
❌ Not Supported:
Performance Optimizations
For large primitive IN predicates (>10 elements):
Set.has()
instead of array scansareAllPrimitives
andprimitiveSet
on extractioneq = X
vsIN [1000 items]
IN [100]
⊆IN [10000]
IN [5000]
clausesDeduplication Architecture
State Tracking:
unlimitedWhere
- Combined OR of all unlimited predicateslimitedCalls[]
- Array of all limited queries for exact matchinginflightCalls[]
- Active requests with their predicatesgeneration
- Counter to invalidate stale in-flight handlers after resetRequest Flow:
isPredicateSubset
)What This Covers
✅ All operators supported by collection index system:
eq
,gt
,gte
,lt
,lte
,in
,and
,or
✅ Date object support (equality, ranges, IN clauses)
✅ Conflict detection (contradictory equalities, empty IN intersections)
✅ Predicate difference for incremental loading
✅ Production-ready deduplication wrapper
✅ Concurrent request handling with subset matching
✅ State reset with generation counter safety
✅ 149 tests covering edge cases, Date handling, performance optimizations, and deduplication
What This Does NOT Cover
❌ Range contradiction detection -
age > 20 AND age < 10
is preserved as-is (could detect and returnfalse
)❌ Property-to-property comparisons - Assumes pattern:
field op value
❌ Advanced OR simplification - Complex nested OR/AND kept as-is for safety
❌ NOT operator - Not supported by collection index system
❌ State persistence - DeduplicatedLoadSubset is in-memory only (persistence hooks planned for future)
Why Conservative? Correctness over optimization—false negatives (missed optimizations) are better than false positives (incorrect results).
Files Changed
New Files:
packages/db/src/query/predicate-utils.ts
(1,544 lines) - 10 exported functions with JSDocpackages/db/src/query/subset-dedupe.ts
(244 lines) - DeduplicatedLoadSubset classpackages/db/tests/predicate-utils.test.ts
(1,342 lines) - 130 testspackages/db/tests/subset-dedupe.test.ts
(326 lines) - 19 testsModified Files:
packages/db/src/query/index.ts
- Export new utilities and DeduplicatedLoadSubset classUsage Example
Basic Predicate Operations
Automatic Deduplication
Computing Incremental Loads
Testing
149 tests passing:
predicate-utils.test.ts (130 tests):
subset-dedupe.test.ts (19 tests):
Type Safety
null
returns whereBasicExpression<boolean>
is expected{type: 'val', value: false}
undefined
vs constrained predicatesthis
binding issuesBreaking Changes
None - purely additive functionality.