RFC: On-Demand Collection Loading via loadSubset #676
KyleAMathews
started this conversation in
Ideas
Replies: 2 comments 5 replies
-
Stoked for this, sounds really great. One thing that I may be missing - will the current “eager” mode with the built in collections still be supported, eg Electric one? |
Beta Was this translation helpful? Give feedback.
4 replies
-
If collections can load data in subsets, could this unlock a sort of SSR with Tanstack db? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
TanStack DB collections currently perform full dataset synchronization before becoming ready, limiting their ability to scale to large datasets. This RFC introduces on-demand subset loading through an optional
loadSubset
function that collections can provide, enabling efficient pagination, filtered loading, and progressive sync patterns. Collections withoutloadSubset
remain in eager mode with full sync behavior, while collections implementingloadSubset
can load specific data subsets based on live query predicates, allowing immediate rendering with synchronous data access when subsets are already loaded.Background
TanStack DB collections materialize subsets of database data in memory on the client. Collections manage data synchronization through a
sync
function provided inCollectionConfig
, which is called once when the collection is created.The current sync function signature accepts parameters for managing collection state:
Collections track active subscriptions through reference counting. When the first subscription is created via
subscribeChanges()
, the collection callsstartSync()
to begin data synchronization. The collection transitions through loading states untilmarkReady()
is called, at which pointcollection.status === 'ready'
and subscriptions receive the initial state. When the last subscription is removed andgcTime
has elapsed, any cleanup function is called.Live queries execute through
useLiveQuery()
, which subscribes to collections and returns data synchronously when available or enters a loading state when waiting for data. Thepreload()
method allows awaiting collection readiness before rendering.QueryCollection integrates TanStack Query by calling a
queryFn
with query parameters and managing the resulting data as a collection. In the current implementation, thequeryFn
is called once with the full query definition.Electric collections sync data from Postgres via HTTP Shape subscriptions, materializing the shape log into collection state. Shapes support filtering via WHERE clauses, ordering, and column selection. Other collection types like Trailbase, RxDB, and similar systems follow comparable patterns of syncing data from backend sources into local collections.
Problem
The current eager-only synchronization model creates five significant limitations that prevent TanStack DB from scaling to real-world application data sizes:
1. Cannot scale to large datasets
Collections must sync their entire dataset before
collection.status === 'ready'
, making them impractical for large tables. A collection of millions of posts cannot reasonably sync everything to the client. Applications need the ability to load only the subset required for the current view—such as the first 20 posts ordered by creation date—without waiting for complete dataset synchronization.2. Predicates don't influence initial sync
Subscription predicates cannot affect initial synchronization because
startSync()
executes before predicates are available. Even subscriptions with narrow predicates likewhere(status === 'active').limit(10)
trigger full collection sync before returning any data.3. No progressive loading patterns
Some use cases benefit from both immediate subset loading and background full syncing. An infinite scroll feed might want to immediately load the first page while syncing the full dataset in the background for offline support. The current model forces a binary choice between immediate readiness (local-only collections) or full sync (eager mode).
4. Pagination cannot load from backend
The
useLiveInfiniteQuery
prototype demonstrates this limitation clearly. The current implementation can only paginate through data already present in the collection. When a user clicks "load more," the hook slices the next page from already-synced data. True infinite loading requires fetching additional pages from the backend on demand, but collections lack the API to support this pattern.5. Inefficient for live query collections
Live query collections subscribe to all source collections, forcing each source to perform full synchronization even when the query only needs a small subset. For example:
This query needs 10 posts for a specific user, but
postsCollection
syncs its entire dataset before the query can execute.The lack of on-demand loading prevents TanStack DB from handling application data at realistic scale across all collection types—whether Electric, QueryCollection, Trailbase, RxDB, or others.
Proposal
Sync Function Return Type
The
sync
function inCollectionConfig
will be extended to optionally return aSyncConfigRes
object:Collections can return:
void
: Eager mode with no cleanupCleanupFn
: Eager mode with cleanupSyncConfigRes
: On-demand mode ifloadSubset
is provided, otherwise eager modeSync Modes
Collections operate in one of two modes based on their
sync
function return value:Eager Mode (current behavior, default):
loadSubset
functionmarkReady()
only after entire dataset is syncedcollection.status === 'ready'
after full sync completespreload()
waits for full initial sync to completeOn-Demand Mode:
loadSubset
function inSyncConfigRes
loadSubset()
preload()
is typically a no-op since collections don't perform background syncingCollections can implement additional loading patterns on top of on-demand mode. A common pattern is progressive mode, where the collection returns a
loadSubset
function (making it on-demand) but also initiates a background full sync:loadSubset()
calls can abort and return early since data is already presentcollection.status === 'ready'
after full background sync completespreload()
waits for full background sync to completeCollections will only expose
syncMode
configuration if they support both eager and on-demand modes. Otherwise, they default to their single supported mode.LoadSubset Function Contract
When a collection provides a
loadSubset
function, it accepts the following options:The function returns:
true
: Data is already present in the collection, subscription can proceed synchronouslyPromise<void>
: Data is being loaded, resolves when subset is availableThe
loadSubset
function is responsible for:write()
andcommit()
true
for immediate access or a Promise that resolves when loading completesLive Query Integration
Live queries interact with
loadSubset
through the subscription lifecycle:useLiveQuery()
creates a subscription with predicates, the subscription system callsloadSubset()
if the collection provides oneloadSubset()
promises resolveloadSubset()
calls returntrue
, the live query executes completely synchronously with no loading stateFor pagination via
useLiveInfiniteQuery
:fetchNextPage()
updates the subscription's limit predicateloadSubset()
call with the increased limitFor joins in live query collections:
loadSubset()
on the right collection with predicates likewhere: inArray(ref('userId'), ['abc', 'xyz])
Synchronous Data Access
The
true
return value enables flicker-free rendering when navigating between components:status === 'active'
limited to 20loadSubset()
returnstrue
immediatelyuseLiveQuery()
executes completely synchronously in the same renderCollection-Specific Implementations
Electric Collections:
loadSubset()
is called, compare requested predicate against loaded rangesQueryCollection:
LoadSubsetOptions
into TanStack Query'smeta
objectqueryKey
dynamically based on predicate parameterstrue
if non-stale data covers the predicatequeryFn
with predicate parameters in meta for new data fetchingLive Query Collections:
loadSubset()
functions with translated predicatesfrom({ post: postsCollection }).where(({ post }) => eq(post.userId, '123')).limit(10)
, callpostsCollection.loadSubset({ where: eq(ref('userId'), '123'), limit: 10 })
loadSubset()
on right-side collectionPredicate Deduplication
A predicate deduplication library will provide set operations on
PredicateObject
instances to help collections track loaded subsets:Collections use these tools to:
loadSubset()
requests are already satisfied by loaded dataThe exact mechanics of predicate set operations are outside the scope of this RFC and will be documented separately. Collections are free to implement their own tracking mechanisms based on their specific requirements (e.g., staleness policies, cache eviction strategies).
Error Handling
When
loadSubset()
encounters an error:setError()
to record the error in collection stateloadSubset()
promise rejects with the erroruseLiveQuery().error
loadSubset()
call site are also caught and recordedRetry logic is collection-dependent. Collections may implement automatic retries, exponential backoff, or leave retry decisions to the application layer.
Preload Behavior
The
preload()
method behavior varies by sync mode:Eager Mode:
preload()
callsstartSync()
and waits forcollection.status === 'ready'
On-Demand Mode:
preload()
is typically a no-op since collections don't perform background syncingawait liveQuery.preload()
instead to ensure specific query data is loadedProgressive Mode (collection-implemented):
preload()
waits for background full sync to completeAreas Requiring Prototyping
Pagination State Tracking: How collections track pagination state (cursors, offsets, keyset values) across multiple subscriptions with different pagination requirements will be determined through implementation. Collections must match new
loadSubset()
calls against previous calls to determine what data to fetch, but the exact mechanism for tracking this state needs prototyping.Definition of Success
This proposal succeeds if it enables the following outcomes:
1. Large Dataset Support
Collections can handle tables with millions of rows without requiring full synchronization. A live query displaying the first 20 posts from a 10-million-row table loads only those 20 rows and transitions to ready state in under 2 seconds (network-dependent).
2. Predicate-Driven Loading
Subscription predicates directly control initial data loading. A live query with
where(({ post }) => eq(post.status, 'active')).limit(10)
loads exactly 10 active records, not the entire collection. The collection'sloadSubset()
function is called with the subscription's exact predicate parameters.3. Synchronous Data Access
When navigating between components that request identical or already-loaded subsets,
useLiveQuery()
returns data synchronously with zero loading state flicker.loadSubset()
returnstrue
for 100% of navigation cases where data is already present.4. True Infinite Scroll
useLiveInfiniteQuery
can fetch additional pages from the backend by callingfetchNextPage()
. Each call increases the limit predicate, triggersloadSubset()
, and fetches only the incremental data not already loaded. A 1000-row feed loads in chunks of 20 rows on demand rather than syncing all 1000 rows upfront.5. Efficient Query Collections
Live queries over multiple source collections only trigger subset loading on those sources. A query
from({ post: postsCollection }).where(({ post }) => eq(post.userId, '123')).limit(10)
loads 10 posts and only the related users, not the full posts and users collections. Query execution time is proportional to result set size, not source collection size.6. Progressive Mode Viability
Collections can implement progressive loading patterns where subset requests resolve immediately while background full sync continues. An infinite scroll feed displays the first page in under 1 second while the full dataset syncs for offline support over the next 30 seconds.
7. Backward Compatibility
Existing collections continue working without changes. Collections without
loadSubset()
functions maintain current eager-mode behavior. No breaking changes to collection API or live query usage patterns.8. QueryCollection Integration
QueryCollection successfully integrates
LoadSubsetOptions
into TanStack Query'squeryFn
calls. Developers can accessmeta.where
,meta.orderBy
, andmeta.limit
to construct backend API requests. Pagination continuations correctly pass previous cursors to subsequentqueryFn
calls.Beta Was this translation helpful? Give feedback.
All reactions