Skip to content

Conversation

samwillis
Copy link
Collaborator

@samwillis samwillis commented Oct 11, 2025

stacked on #669

Summary

Implements utilities for comparing and merging predicates (where clauses, orderBy, and limit) to support predicate push-down in collection sync operations. Provides a complete solution for tracking loaded data and preventing redundant server requests.

Key Features:

  • ✅ Logical subset checking for where clauses (AND, OR, comparisons, IN)
  • ✅ Smart predicate merging with intersection (AND) and union (OR) semantics
  • Predicate difference - Compute A AND NOT(B) with simplification
  • Automatic deduplication wrapper - Eliminates redundant data fetches
  • ✅ Complete Date object support (equality, ranges, IN clauses)
  • ✅ Contradiction detection (returns false literal for impossible predicates)
  • Performance optimized for large primitive IN predicates (100-1250x speedup via Set-based lookups)
  • 149 comprehensive tests with extensive coverage

Motivation

The onLoadMore callback needs to:

  1. Check if data is already loaded - Determine if a new predicate is covered by previously loaded predicates
  2. Track total coverage - Merge multiple load operations to understand complete data coverage
  3. Prevent redundant fetches - Automatically deduplicate concurrent and sequential requests

Without these utilities, the sync layer cannot efficiently track loaded data ranges or prevent duplicate network requests.

What This Implements

Core Predicate Functions

Where Clause Operations:

  • isWhereSubset(subset, superset) - Checks if one where clause logically implies another
  • intersectWherePredicates(predicates) - Combines predicates with AND logic (most restrictive)
  • unionWherePredicates(predicates) - Combines predicates with OR logic (least restrictive)
  • minusWherePredicates(from, subtract) - Computes from AND NOT(subtract) with simplification

OrderBy & Limit Operations:

  • isOrderBySubset(subset, superset) - Validates ordering requirements via prefix matching
  • isLimitSubset(subset, superset) - Compares limit constraints

Complete Predicate Operations:

  • isPredicateSubset(subset, superset) - Checks all components (where + orderBy + limit)
  • intersectPredicates(predicates) - Merges predicates with intersection semantics
  • unionPredicates(predicates) - Merges predicates with union semantics

DeduplicatedLoadSubset Class

A production-ready wrapper that automatically deduplicates loadSubset calls:

Features:

  • Smart subset detection - Uses predicate logic to avoid redundant fetches
  • In-flight request sharing - Concurrent identical/subset requests share the same promise
  • Separate tracking - Handles unlimited vs limited queries differently
  • Reset support - Clear state with generation counter to prevent repopulation
  • Auto-bound methods - Safe to pass as callbacks without binding issues

How it works:

  • Tracks all unlimited queries in a combined where predicate via union
  • Tracks limited queries (with orderBy/limit) separately for exact matching
  • Checks incoming requests against tracked state using subset logic
  • Shares in-flight requests when new requests are subsets of pending ones

Examples

// Subset checking
isWhereSubset(
  gt(ref('age'), val(20)),     // age > 20
  gt(ref('age'), val(10))      // age > 10
) // → true (20 > 10, so more restrictive)

// Intersection (AND logic)
intersectWherePredicates([
  gt(ref('age'), val(10)),
  lt(ref('age'), val(50))
]) // → age > 10 AND age < 50

// Union (OR logic)
unionWherePredicates([
  eq(ref('age'), val(5)),
  eq(ref('age'), val(10))
]) // → age IN [5, 10]

// Predicate difference
minusWherePredicates(
  gt(ref('age'), val(10)),     // Requested: age > 10
  gt(ref('age'), val(20))      // Already loaded: age > 20
) // → age > 10 AND age <= 20 (simplified)

// Contradiction detection
intersectWherePredicates([
  eq(ref('age'), val(5)),
  eq(ref('age'), val(6))
]) // → {type: 'val', value: false}

// Automatic deduplication
const dedupe = new DeduplicatedLoadSubset(myLoadSubset)

// First call - fetches data
await dedupe.loadSubset({ where: gt(ref('age'), val(10)) })

// Second call - returns true immediately (subset of first)
await dedupe.loadSubset({ where: gt(ref('age'), val(20)) })

// Reset state when data store is cleared
dedupe.reset()

How It Works

Logical Subset Checking

Uses recursive descent to check logical implications:

// AND handling
(A AND B)  C  if  (A  C) OR (B  C)

// OR handling
(A OR B)  C  if  (A  C) AND (B  C)

Range Simplification

Intersection: Takes most restrictive constraints

  • age > 10 AND age > 20age > 20
  • age = 5 AND age = 6false literal
  • age IN [1,2] AND age IN [2,3]age IN [2]

Union: Takes least restrictive constraints

  • age > 10 OR age > 20age > 10
  • age = 5 OR age = 10age IN [5, 10]

Difference: Simplifies same-field predicates

  • age > 10 MINUS age > 20age > 10 AND age <= 20
  • age IN [1,2,3] MINUS age IN [2,4]age IN [1,3]

Value Type Support

✅ Supported:

  • Primitives: strings, numbers, booleans, null, undefined
  • Date objects: equality, ranges, IN clauses (compared by timestamp)

❌ Not Supported:

  • Arrays/objects

Performance Optimizations

For large primitive IN predicates (>10 elements):

  • Smart Set Construction - Builds Sets once and reuses them
  • O(1) Lookups - Uses Set.has() instead of array scans
  • Cached Metadata - Stores areAllPrimitives and primitiveSet on extraction
  • Pre-simplified IN values - Removes duplicates when building primitive sets
Operation Without With Speedup
eq = X vs IN [1000 items] O(1000) scan O(1) lookup ~1000x
IN [100]IN [10000] O(1M) comparisons O(10,100) ops ~100x
Intersect 3 IN [5000] clauses O(75M) comparisons O(60K) ops ~1250x

Deduplication Architecture

State Tracking:

  • unlimitedWhere - Combined OR of all unlimited predicates
  • limitedCalls[] - Array of all limited queries for exact matching
  • inflightCalls[] - Active requests with their predicates
  • generation - Counter to invalidate stale in-flight handlers after reset

Request Flow:

  1. Check if data already loaded (via isPredicateSubset)
  2. Check if in-flight request covers this (via subset logic)
  3. If not covered, make request and track it
  4. On completion, update tracking state (unless reset was called)

What This Covers

✅ All operators supported by collection index system: eq, gt, gte, lt, lte, in, and, or
✅ Date object support (equality, ranges, IN clauses)
✅ Conflict detection (contradictory equalities, empty IN intersections)
✅ Predicate difference for incremental loading
✅ Production-ready deduplication wrapper
✅ Concurrent request handling with subset matching
✅ State reset with generation counter safety
149 tests covering edge cases, Date handling, performance optimizations, and deduplication

What This Does NOT Cover

Range contradiction detection - age > 20 AND age < 10 is preserved as-is (could detect and return false)
Property-to-property comparisons - Assumes pattern: field op value
Advanced OR simplification - Complex nested OR/AND kept as-is for safety
NOT operator - Not supported by collection index system
State persistence - DeduplicatedLoadSubset is in-memory only (persistence hooks planned for future)

Why Conservative? Correctness over optimization—false negatives (missed optimizations) are better than false positives (incorrect results).

Files Changed

New Files:

  • packages/db/src/query/predicate-utils.ts (1,544 lines) - 10 exported functions with JSDoc
  • packages/db/src/query/subset-dedupe.ts (244 lines) - DeduplicatedLoadSubset class
  • packages/db/tests/predicate-utils.test.ts (1,342 lines) - 130 tests
  • packages/db/tests/subset-dedupe.test.ts (326 lines) - 19 tests

Modified Files:

  • packages/db/src/query/index.ts - Export new utilities and DeduplicatedLoadSubset class

Usage Example

Basic Predicate Operations

import { isPredicateSubset, intersectPredicates } from '@tanstack/db'

const loadedPredicates: LoadSubsetOptions[] = []

function onLoadMore(requested: LoadSubsetOptions) {
  // Check if already loaded
  const alreadyLoaded = loadedPredicates.some(loaded =>
    isPredicateSubset(requested, loaded)
  )
  
  if (alreadyLoaded) {
    console.log('Data already loaded, using cache')
    return true
  }
  
  // Fetch from server and track
  await fetchFromServer(requested)
  loadedPredicates.push(requested)
  
  // Compute total coverage
  const totalCoverage = intersectPredicates(loadedPredicates)
  
  // Check for contradictions
  if (totalCoverage.where?.type === 'val' && 
      (totalCoverage.where as any).value === false) {
    console.warn('Contradictory predicates detected!')
  }
}

Automatic Deduplication

import { DeduplicatedLoadSubset } from '@tanstack/db'

// Wrap your loadSubset function
const dedupe = new DeduplicatedLoadSubset(
  async (options: LoadSubsetOptions) => {
    const data = await fetchFromServer(options)
    updateLocalCache(data)
  }
)

// Use in sync config - sync is a function that returns an object
export const myCollection = db.collection({
  name: 'users',
  sync: () => ({
    // Pass the auto-bound method directly
    loadSubset: dedupe.loadSubset,
  })
})

// Concurrent requests automatically deduplicated:
await Promise.all([
  dedupe.loadSubset({ where: gt(ref('age'), val(10)) }), // Fetches
  dedupe.loadSubset({ where: gt(ref('age'), val(20)) }), // Waits for first
  dedupe.loadSubset({ where: gt(ref('age'), val(30)) }), // Waits for first
]) // Only one network request made!

// Clear state when data store is reset
function clearAllData() {
  clearLocalCache()
  dedupe.reset() // Clear deduplication state
}

Computing Incremental Loads

import { minusWherePredicates } from '@tanstack/db'

const alreadyLoaded = gt(ref('age'), val(20)) // age > 20
const requested = gt(ref('age'), val(10))      // age > 10

const stillNeeded = minusWherePredicates(requested, alreadyLoaded)
// Result: age > 10 AND age <= 20

if (stillNeeded.type === 'val' && stillNeeded.value === false) {
  console.log('All requested data already loaded!')
} else {
  await fetchFromServer({ where: stillNeeded })
}

Testing

149 tests passing:

predicate-utils.test.ts (130 tests):

  • Basic subset comparisons (5 tests)
  • Comparison operators (12 tests)
  • IN operator edge cases (4 tests)
  • AND/OR combinations (9 tests)
  • Date support (12 tests)
  • Conflict detection (6 tests)
  • Range simplifications (14 tests)
  • OrderBy/Limit (11 tests)
  • Complete predicate operations (28 tests)
  • Predicate difference operations (29 tests)

subset-dedupe.test.ts (19 tests):

  • Basic deduplication (10 tests)
  • Concurrent request handling (3 tests)
  • Options mutation protection (2 tests)
  • Failed request retry (1 test)
  • Reset behavior (2 tests)
  • Unbound callback safety (1 test)

Type Safety

  • No null returns where BasicExpression<boolean> is expected
  • Empty sets represented as concrete false literals: {type: 'val', value: false}
  • Proper handling of undefined vs constrained predicates
  • Auto-bound methods for safe callback usage without this binding issues
  • All type assertions validated by TypeScript strict mode

Breaking Changes

None - purely additive functionality.

Copy link

changeset-bot bot commented Oct 11, 2025

🦋 Changeset detected

Latest commit: 4f8154e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 12 packages
Name Type
@tanstack/db Patch
@tanstack/angular-db Patch
@tanstack/electric-db-collection Patch
@tanstack/query-db-collection Patch
@tanstack/react-db Patch
@tanstack/rxdb-db-collection Patch
@tanstack/solid-db Patch
@tanstack/svelte-db Patch
@tanstack/trailbase-db-collection Patch
@tanstack/vue-db Patch
todos Patch
@tanstack/db-example-react-todo Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

pkg-pr-new bot commented Oct 11, 2025

More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@668

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@668

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@668

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@668

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@668

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@668

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@668

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@668

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@668

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@668

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@668

commit: 4f8154e

Copy link
Contributor

github-actions bot commented Oct 11, 2025

Size Change: +4.97 kB (+5.95%) 🔍

Total Size: 88.6 kB

Filename Size Change
./packages/db/dist/esm/index.js 1.74 kB +92 B (+5.59%) 🔍
./packages/db/dist/esm/query/predicate-utils.js 3.83 kB +3.83 kB (new file) 🆕
./packages/db/dist/esm/query/subset-dedupe.js 1.06 kB +1.06 kB (new file) 🆕
ℹ️ View Unchanged
Filename Size
./packages/db/dist/esm/collection/change-events.js 963 B
./packages/db/dist/esm/collection/changes.js 1.01 kB
./packages/db/dist/esm/collection/events.js 413 B
./packages/db/dist/esm/collection/index.js 3.23 kB
./packages/db/dist/esm/collection/indexes.js 1.16 kB
./packages/db/dist/esm/collection/lifecycle.js 1.8 kB
./packages/db/dist/esm/collection/mutations.js 2.52 kB
./packages/db/dist/esm/collection/state.js 3.79 kB
./packages/db/dist/esm/collection/subscription.js 2.2 kB
./packages/db/dist/esm/collection/sync.js 2.2 kB
./packages/db/dist/esm/deferred.js 230 B
./packages/db/dist/esm/errors.js 3.57 kB
./packages/db/dist/esm/event-emitter.js 798 B
./packages/db/dist/esm/indexes/auto-index.js 794 B
./packages/db/dist/esm/indexes/base-index.js 835 B
./packages/db/dist/esm/indexes/btree-index.js 2 kB
./packages/db/dist/esm/indexes/lazy-index.js 1.21 kB
./packages/db/dist/esm/indexes/reverse-index.js 577 B
./packages/db/dist/esm/local-only.js 967 B
./packages/db/dist/esm/local-storage.js 2.33 kB
./packages/db/dist/esm/optimistic-action.js 294 B
./packages/db/dist/esm/proxy.js 3.86 kB
./packages/db/dist/esm/query/builder/functions.js 615 B
./packages/db/dist/esm/query/builder/index.js 4.04 kB
./packages/db/dist/esm/query/builder/ref-proxy.js 938 B
./packages/db/dist/esm/query/compiler/evaluators.js 1.55 kB
./packages/db/dist/esm/query/compiler/expressions.js 760 B
./packages/db/dist/esm/query/compiler/group-by.js 2.04 kB
./packages/db/dist/esm/query/compiler/index.js 2.21 kB
./packages/db/dist/esm/query/compiler/joins.js 2.65 kB
./packages/db/dist/esm/query/compiler/order-by.js 1.43 kB
./packages/db/dist/esm/query/compiler/select.js 1.28 kB
./packages/db/dist/esm/query/ir.js 785 B
./packages/db/dist/esm/query/live-query-collection.js 404 B
./packages/db/dist/esm/query/live/collection-config-builder.js 5.49 kB
./packages/db/dist/esm/query/live/collection-registry.js 233 B
./packages/db/dist/esm/query/live/collection-subscriber.js 2.11 kB
./packages/db/dist/esm/query/optimizer.js 3.26 kB
./packages/db/dist/esm/scheduler.js 1.29 kB
./packages/db/dist/esm/SortedMap.js 1.24 kB
./packages/db/dist/esm/transactions.js 3.05 kB
./packages/db/dist/esm/utils.js 1.01 kB
./packages/db/dist/esm/utils/browser-polyfills.js 365 B
./packages/db/dist/esm/utils/btree.js 6.01 kB
./packages/db/dist/esm/utils/comparison.js 754 B
./packages/db/dist/esm/utils/index-optimization.js 1.73 kB

compressed-size-action::db-package-size

Copy link
Contributor

github-actions bot commented Oct 11, 2025

Size Change: 0 B

Total Size: 2.36 kB

ℹ️ View Unchanged
Filename Size
./packages/react-db/dist/esm/index.js 168 B
./packages/react-db/dist/esm/useLiveInfiniteQuery.js 885 B
./packages/react-db/dist/esm/useLiveQuery.js 1.31 kB

compressed-size-action::react-db-package-size

@samwillis
Copy link
Collaborator Author

samwillis commented Oct 13, 2025

Something I considered while implementing this was to normalise the predicates into a DNF form, but it would potentially explode the size of the predicates, and in a real world situation the predicates are simple and repetitive in structure so I think this implementation makes the right tradeoff.

Copy link
Contributor

@kevin-dp kevin-dp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a detailed read through the code in predicate-utils.ts. My main concern is about the semantics that we get from the way how we merge the predicates.

// For (A AND B) ⊆ (C AND D), we need every conjunct in superset to be implied by subset
// For each conjunct in superset, at least one conjunct in subset must be a subset of it
// OR the entire subset implies it
return superset.args.every((superArg) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to handle the case where both the subset and superset are AND clauses explicitly?
I would expect to handle the case where subset is an AND by doing:

return subset.args.some((subArg) => isWhereSubsetInternal(subArg, superset)

And then there would be a separate (distinct) case that checks if superset is and AND clause like you already have on L80. That should work just fine because when you have an AND clause in both subset and superset, it will first be handled by the case for the subset which splits the AND clause into N calls to isWhereSubsetInternal (1 call per conjunct of the subset). And then in each call the subset is no longer an AND clause but the superset is so that is handled by the case on L80.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this case we only need L72-74

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note how you don't special case the cause where both subset and superset are OR clauses so that confirms that for ANDs you should not have to do that either.

}

// Handle eq vs in
if (subsetFunc.name === `eq` && supersetFunc.name === `in`) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle the case where subset is IN (with no elements or a single element) and superset is EQ. Those are corner cases but could also occur e.g. age IN [ 18 ] is a subset of age = 18 because they are actually the same.

Also, we don't handle subset that is IN/EQ vs superset that is a range (i.e. comparison operators, e.g. >= 18). Should be fairly straightforward to determine that age = 18 is a subset of age >= 18 and age IN [ 18, 19, 20] is a subset of age >= 18. For IN you just need to recursively check each element of the array against the superset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, i don't think we should explicitly handle cases where subset is EQ and superset is IN, and cases where both subset and superset use IN. Those should be treated individually. So, if subset is an IN then we should just recursively call isWhereSubsetInternal over the elements of the array:

inElements.every(elem => isWhereSubsetInternal(elem, superset))

Again, that will lead to 1 call per element that is part of the IN clause. In fact, a subset where a IN [ x, y , z] is equivalent to a = x OR a = y OR a = z so perhaps we should not handle IN specially at all and just have 1 case that handles both OR and IN (in the subset). Similarly, for IN in the superset (treat it like OR in superset).

superset: number | undefined
): boolean {
// No limit requirement is always satisfied
if (subset === undefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the subset has no limit but the superset has a limit, then it should return false iiuc but this returns true. I think this function should be:

return superset === undefined || subset <= superset

? undefined // All unlimited = result unlimited
: Math.min(...limits) // Take most restrictive

return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about these semantics, whether this is really what we expect. Let me explain with an example. Imagine 2 queries:

-- Query 1
WHERE age >= 18 LIMIT 1
-- Query 2
WHERE age >= 20 LIMIT 3

Imagine we have 3 users: Alice aged 18, Bob aged 19, Charlie aged 20. Based on this data query 1 should return 1 user: Alice. And query 2 should return also 1 user: Charlie.
So:

Query 1: { Alice }
Query 2: { Charlie }
Intersection of Query 1 and Query 2: { } (empty)

But if we look at intersectPredicates it will generate this predicate:

WHERE age >= 20 LIMIT 1

It takes age >= 20 because that's the most restrictive clause and it takes limit 1 because that's also the most restrictive. Now based on the data from our example, this intersected query will return 1 user: Charlie. So, the result of this intersected query is different from the intersection of the results... Don't think that we want those semantics?

Copy link
Contributor

@kevin-dp kevin-dp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed #668 (comment) with @samwillis. It's a problem for both intersected predicates and unioned predicates. The current approach of merging the predicates by combining the where clauses and picking a limit doesn't work as explained in that comment. We will need to find another way to determine whether or not we have previously loaded all the data that is needed to fulfill the query at hand.

@samwillis samwillis marked this pull request as draft October 13, 2025 14:02
@samwillis
Copy link
Collaborator Author

We need to track the disjoint loaded sets by the upper and lower bound in columns (exclusive of upper bound when loaded with a limit), then use the indexes to check that within those bounds we have enough rows to answer the query without going to the server.

Marking this as draft as it needs rework. The where expression utils are good, anything using a limit needs changing.

@samwillis samwillis force-pushed the samwillis/predicate-utils branch from d24585d to cbc5baf Compare October 15, 2025 17:24
@samwillis samwillis changed the base branch from main to samwillis/load-more-tracking October 15, 2025 17:30
@samwillis samwillis marked this pull request as ready for review October 15, 2025 17:30
Base automatically changed from samwillis/load-more-tracking to main October 15, 2025 17:49
@samwillis samwillis force-pushed the samwillis/predicate-utils branch from cbc5baf to 73525a6 Compare October 15, 2025 18:10
@kevin-dp kevin-dp self-requested a review October 16, 2025 07:37
@kevin-dp kevin-dp force-pushed the samwillis/predicate-utils branch from f7ca08d to 4f8154e Compare October 16, 2025 13:51
* subset logic to predicates.
*
* @example
* const dedupe = new DeduplicatedLoadSubset(myLoadSubset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does feel a bit inconsistent that we have to instantiate a class whereas loadSubset would just be a function.
If we want we can create a function that hides this:

function dedup(loadSubset) {
  return new DeduplicatedLoadSubset(loadSubset)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants