Predicate Comparison and Merging Utilities for Predicate Push-Down #668

samwillis · 2025-10-11T19:18:44Z

stacked on #669

Summary

Implements utilities for comparing and merging predicates (where clauses, orderBy, and limit) to support predicate push-down in collection sync operations. Provides a complete solution for tracking loaded data and preventing redundant server requests.

Key Features:

✅ Logical subset checking for where clauses (AND, OR, comparisons, IN)
✅ Smart predicate merging with intersection (AND) and union (OR) semantics
✅ Predicate difference - Compute A AND NOT(B) with simplification
✅ Automatic deduplication wrapper - Eliminates redundant data fetches
✅ Complete Date object support (equality, ranges, IN clauses)
✅ Contradiction detection (returns false literal for impossible predicates)
✅ Performance optimized for large primitive IN predicates (100-1250x speedup via Set-based lookups)
✅ 149 comprehensive tests with extensive coverage

Motivation

The onLoadMore callback needs to:

Check if data is already loaded - Determine if a new predicate is covered by previously loaded predicates
Track total coverage - Merge multiple load operations to understand complete data coverage
Prevent redundant fetches - Automatically deduplicate concurrent and sequential requests

Without these utilities, the sync layer cannot efficiently track loaded data ranges or prevent duplicate network requests.

What This Implements

Core Predicate Functions

Where Clause Operations:

isWhereSubset(subset, superset) - Checks if one where clause logically implies another
intersectWherePredicates(predicates) - Combines predicates with AND logic (most restrictive)
unionWherePredicates(predicates) - Combines predicates with OR logic (least restrictive)
minusWherePredicates(from, subtract) - Computes from AND NOT(subtract) with simplification

OrderBy & Limit Operations:

isOrderBySubset(subset, superset) - Validates ordering requirements via prefix matching
isLimitSubset(subset, superset) - Compares limit constraints

Complete Predicate Operations:

isPredicateSubset(subset, superset) - Checks all components (where + orderBy + limit)
intersectPredicates(predicates) - Merges predicates with intersection semantics
unionPredicates(predicates) - Merges predicates with union semantics

DeduplicatedLoadSubset Class

A production-ready wrapper that automatically deduplicates loadSubset calls:

Features:

Smart subset detection - Uses predicate logic to avoid redundant fetches
In-flight request sharing - Concurrent identical/subset requests share the same promise
Separate tracking - Handles unlimited vs limited queries differently
Reset support - Clear state with generation counter to prevent repopulation
Auto-bound methods - Safe to pass as callbacks without binding issues

How it works:

Tracks all unlimited queries in a combined where predicate via union
Tracks limited queries (with orderBy/limit) separately for exact matching
Checks incoming requests against tracked state using subset logic
Shares in-flight requests when new requests are subsets of pending ones

Examples

// Subset checking
isWhereSubset(
  gt(ref('age'), val(20)),     // age > 20
  gt(ref('age'), val(10))      // age > 10
) // → true (20 > 10, so more restrictive)

// Intersection (AND logic)
intersectWherePredicates([
  gt(ref('age'), val(10)),
  lt(ref('age'), val(50))
]) // → age > 10 AND age < 50

// Union (OR logic)
unionWherePredicates([
  eq(ref('age'), val(5)),
  eq(ref('age'), val(10))
]) // → age IN [5, 10]

// Predicate difference
minusWherePredicates(
  gt(ref('age'), val(10)),     // Requested: age > 10
  gt(ref('age'), val(20))      // Already loaded: age > 20
) // → age > 10 AND age <= 20 (simplified)

// Contradiction detection
intersectWherePredicates([
  eq(ref('age'), val(5)),
  eq(ref('age'), val(6))
]) // → {type: 'val', value: false}

// Automatic deduplication
const dedupe = new DeduplicatedLoadSubset(myLoadSubset)

// First call - fetches data
await dedupe.loadSubset({ where: gt(ref('age'), val(10)) })

// Second call - returns true immediately (subset of first)
await dedupe.loadSubset({ where: gt(ref('age'), val(20)) })

// Reset state when data store is cleared
dedupe.reset()

How It Works

Logical Subset Checking

Uses recursive descent to check logical implications:

// AND handling
(A AND B) ⊆ C  if  (A ⊆ C) OR (B ⊆ C)

// OR handling
(A OR B) ⊆ C  if  (A ⊆ C) AND (B ⊆ C)

Range Simplification

Intersection: Takes most restrictive constraints

age > 10 AND age > 20 → age > 20
age = 5 AND age = 6 → false literal
age IN [1,2] AND age IN [2,3] → age IN [2]

Union: Takes least restrictive constraints

age > 10 OR age > 20 → age > 10
age = 5 OR age = 10 → age IN [5, 10]

Difference: Simplifies same-field predicates

age > 10 MINUS age > 20 → age > 10 AND age <= 20
age IN [1,2,3] MINUS age IN [2,4] → age IN [1,3]

Value Type Support

✅ Supported:

Primitives: strings, numbers, booleans, null, undefined
Date objects: equality, ranges, IN clauses (compared by timestamp)

❌ Not Supported:

Arrays/objects

Performance Optimizations

For large primitive IN predicates (>10 elements):

Smart Set Construction - Builds Sets once and reuses them
O(1) Lookups - Uses Set.has() instead of array scans
Cached Metadata - Stores areAllPrimitives and primitiveSet on extraction
Pre-simplified IN values - Removes duplicates when building primitive sets

Operation	Without	With	Speedup
`eq = X` vs `IN [1000 items]`	O(1000) scan	O(1) lookup	~1000x
`IN [100]` ⊆ `IN [10000]`	O(1M) comparisons	O(10,100) ops	~100x
Intersect 3 `IN [5000]` clauses	O(75M) comparisons	O(60K) ops	~1250x

Deduplication Architecture

State Tracking:

unlimitedWhere - Combined OR of all unlimited predicates
limitedCalls[] - Array of all limited queries for exact matching
inflightCalls[] - Active requests with their predicates
generation - Counter to invalidate stale in-flight handlers after reset

Request Flow:

Check if data already loaded (via isPredicateSubset)
Check if in-flight request covers this (via subset logic)
If not covered, make request and track it
On completion, update tracking state (unless reset was called)

What This Covers

✅ All operators supported by collection index system: eq, gt, gte, lt, lte, in, and, or
✅ Date object support (equality, ranges, IN clauses)
✅ Conflict detection (contradictory equalities, empty IN intersections)
✅ Predicate difference for incremental loading
✅ Production-ready deduplication wrapper
✅ Concurrent request handling with subset matching
✅ State reset with generation counter safety
✅ 149 tests covering edge cases, Date handling, performance optimizations, and deduplication

What This Does NOT Cover

❌ Range contradiction detection - age > 20 AND age < 10 is preserved as-is (could detect and return false)
❌ Property-to-property comparisons - Assumes pattern: field op value
❌ Advanced OR simplification - Complex nested OR/AND kept as-is for safety
❌ NOT operator - Not supported by collection index system
❌ State persistence - DeduplicatedLoadSubset is in-memory only (persistence hooks planned for future)

Why Conservative? Correctness over optimization—false negatives (missed optimizations) are better than false positives (incorrect results).

Files Changed

New Files:

packages/db/src/query/predicate-utils.ts (1,544 lines) - 10 exported functions with JSDoc
packages/db/src/query/subset-dedupe.ts (244 lines) - DeduplicatedLoadSubset class
packages/db/tests/predicate-utils.test.ts (1,342 lines) - 130 tests
packages/db/tests/subset-dedupe.test.ts (326 lines) - 19 tests

Modified Files:

packages/db/src/query/index.ts - Export new utilities and DeduplicatedLoadSubset class

Usage Example

Basic Predicate Operations

import { isPredicateSubset, intersectPredicates } from '@tanstack/db'

const loadedPredicates: LoadSubsetOptions[] = []

function onLoadMore(requested: LoadSubsetOptions) {
  // Check if already loaded
  const alreadyLoaded = loadedPredicates.some(loaded =>
    isPredicateSubset(requested, loaded)
  )
  
  if (alreadyLoaded) {
    console.log('Data already loaded, using cache')
    return true
  }
  
  // Fetch from server and track
  await fetchFromServer(requested)
  loadedPredicates.push(requested)
  
  // Compute total coverage
  const totalCoverage = intersectPredicates(loadedPredicates)
  
  // Check for contradictions
  if (totalCoverage.where?.type === 'val' && 
      (totalCoverage.where as any).value === false) {
    console.warn('Contradictory predicates detected!')
  }
}

Automatic Deduplication

import { DeduplicatedLoadSubset } from '@tanstack/db'

// Wrap your loadSubset function
const dedupe = new DeduplicatedLoadSubset(
  async (options: LoadSubsetOptions) => {
    const data = await fetchFromServer(options)
    updateLocalCache(data)
  }
)

// Use in sync config - sync is a function that returns an object
export const myCollection = db.collection({
  name: 'users',
  sync: () => ({
    // Pass the auto-bound method directly
    loadSubset: dedupe.loadSubset,
  })
})

// Concurrent requests automatically deduplicated:
await Promise.all([
  dedupe.loadSubset({ where: gt(ref('age'), val(10)) }), // Fetches
  dedupe.loadSubset({ where: gt(ref('age'), val(20)) }), // Waits for first
  dedupe.loadSubset({ where: gt(ref('age'), val(30)) }), // Waits for first
]) // Only one network request made!

// Clear state when data store is reset
function clearAllData() {
  clearLocalCache()
  dedupe.reset() // Clear deduplication state
}

Computing Incremental Loads

import { minusWherePredicates } from '@tanstack/db'

const alreadyLoaded = gt(ref('age'), val(20)) // age > 20
const requested = gt(ref('age'), val(10))      // age > 10

const stillNeeded = minusWherePredicates(requested, alreadyLoaded)
// Result: age > 10 AND age <= 20

if (stillNeeded.type === 'val' && stillNeeded.value === false) {
  console.log('All requested data already loaded!')
} else {
  await fetchFromServer({ where: stillNeeded })
}

Testing

149 tests passing:

predicate-utils.test.ts (130 tests):

Basic subset comparisons (5 tests)
Comparison operators (12 tests)
IN operator edge cases (4 tests)
AND/OR combinations (9 tests)
Date support (12 tests)
Conflict detection (6 tests)
Range simplifications (14 tests)
OrderBy/Limit (11 tests)
Complete predicate operations (28 tests)
Predicate difference operations (29 tests)

subset-dedupe.test.ts (19 tests):

Basic deduplication (10 tests)
Concurrent request handling (3 tests)
Options mutation protection (2 tests)
Failed request retry (1 test)
Reset behavior (2 tests)
Unbound callback safety (1 test)

Type Safety

No null returns where BasicExpression<boolean> is expected
Empty sets represented as concrete false literals: {type: 'val', value: false}
Proper handling of undefined vs constrained predicates
Auto-bound methods for safe callback usage without this binding issues
All type assertions validated by TypeScript strict mode

Breaking Changes

None - purely additive functionality.

changeset-bot · 2025-10-11T19:18:48Z

🦋 Changeset detected

Latest commit: 4f8154e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 12 packages

Name	Type
@tanstack/db	Patch
@tanstack/angular-db	Patch
@tanstack/electric-db-collection	Patch
@tanstack/query-db-collection	Patch
@tanstack/react-db	Patch
@tanstack/rxdb-db-collection	Patch
@tanstack/solid-db	Patch
@tanstack/svelte-db	Patch
@tanstack/trailbase-db-collection	Patch
@tanstack/vue-db	Patch
todos	Patch
@tanstack/db-example-react-todo	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2025-10-11T19:20:16Z

More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@668

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@668

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@668

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@668

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@668

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@668

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@668

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@668

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@668

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@668

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@668

commit: 4f8154e

github-actions · 2025-10-11T19:21:45Z

Size Change: +4.97 kB (+5.95%) 🔍

Total Size: 88.6 kB

Filename	Size	Change
`./packages/db/dist/esm/index.js`	1.74 kB	+92 B (+5.59%)	🔍
`./packages/db/dist/esm/query/predicate-utils.js`	3.83 kB	+3.83 kB (new file)	🆕
`./packages/db/dist/esm/query/subset-dedupe.js`	1.06 kB	+1.06 kB (new file)	🆕

ℹ️ View Unchanged

Filename	Size
`./packages/db/dist/esm/collection/change-events.js`	963 B
`./packages/db/dist/esm/collection/changes.js`	1.01 kB
`./packages/db/dist/esm/collection/events.js`	413 B
`./packages/db/dist/esm/collection/index.js`	3.23 kB
`./packages/db/dist/esm/collection/indexes.js`	1.16 kB
`./packages/db/dist/esm/collection/lifecycle.js`	1.8 kB
`./packages/db/dist/esm/collection/mutations.js`	2.52 kB
`./packages/db/dist/esm/collection/state.js`	3.79 kB
`./packages/db/dist/esm/collection/subscription.js`	2.2 kB
`./packages/db/dist/esm/collection/sync.js`	2.2 kB
`./packages/db/dist/esm/deferred.js`	230 B
`./packages/db/dist/esm/errors.js`	3.57 kB
`./packages/db/dist/esm/event-emitter.js`	798 B
`./packages/db/dist/esm/indexes/auto-index.js`	794 B
`./packages/db/dist/esm/indexes/base-index.js`	835 B
`./packages/db/dist/esm/indexes/btree-index.js`	2 kB
`./packages/db/dist/esm/indexes/lazy-index.js`	1.21 kB
`./packages/db/dist/esm/indexes/reverse-index.js`	577 B
`./packages/db/dist/esm/local-only.js`	967 B
`./packages/db/dist/esm/local-storage.js`	2.33 kB
`./packages/db/dist/esm/optimistic-action.js`	294 B
`./packages/db/dist/esm/proxy.js`	3.86 kB
`./packages/db/dist/esm/query/builder/functions.js`	615 B
`./packages/db/dist/esm/query/builder/index.js`	4.04 kB
`./packages/db/dist/esm/query/builder/ref-proxy.js`	938 B
`./packages/db/dist/esm/query/compiler/evaluators.js`	1.55 kB
`./packages/db/dist/esm/query/compiler/expressions.js`	760 B
`./packages/db/dist/esm/query/compiler/group-by.js`	2.04 kB
`./packages/db/dist/esm/query/compiler/index.js`	2.21 kB
`./packages/db/dist/esm/query/compiler/joins.js`	2.65 kB
`./packages/db/dist/esm/query/compiler/order-by.js`	1.43 kB
`./packages/db/dist/esm/query/compiler/select.js`	1.28 kB
`./packages/db/dist/esm/query/ir.js`	785 B
`./packages/db/dist/esm/query/live-query-collection.js`	404 B
`./packages/db/dist/esm/query/live/collection-config-builder.js`	5.49 kB
`./packages/db/dist/esm/query/live/collection-registry.js`	233 B
`./packages/db/dist/esm/query/live/collection-subscriber.js`	2.11 kB
`./packages/db/dist/esm/query/optimizer.js`	3.26 kB
`./packages/db/dist/esm/scheduler.js`	1.29 kB
`./packages/db/dist/esm/SortedMap.js`	1.24 kB
`./packages/db/dist/esm/transactions.js`	3.05 kB
`./packages/db/dist/esm/utils.js`	1.01 kB
`./packages/db/dist/esm/utils/browser-polyfills.js`	365 B
`./packages/db/dist/esm/utils/btree.js`	6.01 kB
`./packages/db/dist/esm/utils/comparison.js`	754 B
`./packages/db/dist/esm/utils/index-optimization.js`	1.73 kB

_{compressed-size-action::db-package-size}

github-actions · 2025-10-11T19:23:12Z

Size Change: 0 B

Total Size: 2.36 kB

ℹ️ View Unchanged

Filename	Size
`./packages/react-db/dist/esm/index.js`	168 B
`./packages/react-db/dist/esm/useLiveInfiniteQuery.js`	885 B
`./packages/react-db/dist/esm/useLiveQuery.js`	1.31 kB

_{compressed-size-action::react-db-package-size}

samwillis · 2025-10-13T11:51:23Z

Something I considered while implementing this was to normalise the predicates into a DNF form, but it would potentially explode the size of the predicates, and in a real world situation the predicates are simple and repetitive in structure so I think this implementation makes the right tradeoff.

kevin-dp

I had a detailed read through the code in predicate-utils.ts. My main concern is about the semantics that we get from the way how we merge the predicates.

.changeset/light-phones-flash.md

kevin-dp · 2025-10-13T12:12:28Z