E2E Tests for Query-Driven Sync with Predicate Push-Down

## Overview

Implement comprehensive end-to-end tests for the query-driven sync feature with on-demand collection loading (PR #763, RFC #676). These tests will verify that predicate push-down, deduplication, pagination, joins, and other distributed system behaviors work correctly across different collection types and syncModes.

**Related Links:**
- PR: https://github.com/TanStack/db/pull/763
- RFC: https://github.com/TanStack/db/discussions/676
- Electric e2e setup reference: `~/programs/electric/packages/typescript-client` (see generated docs in this repo)

These new e2e tests will branch off PR #763 and be merged into it.

## Goals

1. Create a shared e2e test suite that can be reused across different collection implementations
2. Test critical distributed systems scenarios: concurrent loadSubset calls, race conditions, deduplication
3. Verify predicate push-down works correctly with various query patterns
4. Test pagination, ordering, and multi-collection joins
5. Catch integration bugs like those found during early testing (see Known Bugs section)
6. Keep execution time under 5 minutes for CI/CD

## Architecture

### Package Structure

Create a new package: **`@tanstack/db-collection-e2e`**

This package will export:
- Individual test scenario groups organized by feature
- Standard seed data schema and fixtures
- Utility functions for common assertions
- Configuration types/interfaces

### Test Organization

Tests will be organized into **feature-based scenario groups**:

1. **Predicates Suite** - Basic where clause functionality
2. **Pagination Suite** - Order by, limit, offset, setWindow
3. **Joins Suite** - Single and multi-collection joins
4. **Deduplication Suite** - Concurrent loadSubset scenarios
5. **Collation Suite** - String comparison configuration
6. **Mutations Suite** - Data mutations with on-demand mode
7. **Live Updates Suite** (optional) - Reactive updates for sync-enabled collections
8. **Regression Suite** - Explicit tests for known bugs

### Collection Integration

Each collection package (e.g., `electric-db-collection`, `query-db-collection`) will:
- Import the shared test suite
- Provide collection instances configured for both eager and on-demand syncModes
- Implement collection-specific setup/teardown hooks
- Choose which optional test suites to run

Example structure:
```
packages/
  db-collection-e2e/          # Shared test suite package
    src/
      suites/
        predicates.test.ts
        pagination.test.ts
        joins.test.ts
        deduplication.test.ts
        collation.test.ts
        mutations.test.ts
        live-updates.test.ts (optional)
        regressions.test.ts
      fixtures/
        seed-data.ts
        test-schema.ts
      utils/
        assertions.ts
      types.ts
      index.ts
  electric-db-collection/
    e2e/
      setup.ts              # Docker, Postgres, Electric setup
      electric.e2e.test.ts  # Imports and runs shared suites
  query-db-collection/
    e2e/
      setup.ts              # Mock backend setup
      query.e2e.test.ts     # Imports and runs shared suites
```

## Standard Test Data Schema

### Entities and Relationships

Design a schema that exercises edge cases across various data types:

```typescript
// Users table
interface User {
  id: string;              // UUID
  name: string;            // For collation testing
  email: string | null;    // Nullable field
  age: number;             // Numeric comparisons
  isActive: boolean;       // Boolean predicates
  createdAt: Date;         // Date comparisons
  metadata: object | null; // JSON field (if supported)
  deletedAt: Date | null;  // Soft delete pattern
}

// Posts table
interface Post {
  id: string;
  userId: string;          // Foreign key to User
  title: string;
  content: string | null;
  viewCount: number;
  publishedAt: Date | null;
  deletedAt: Date | null;
}

// Comments table
interface Comment {
  id: string;
  postId: string;          // Foreign key to Post
  userId: string;          // Foreign key to User
  text: string;
  createdAt: Date;
  deletedAt: Date | null;
}
```

### Seed Data Volume

- **Users**: ~100 records
- **Posts**: ~100 records (distributed across users)
- **Comments**: ~100 records (distributed across posts)

This provides enough data to test pagination effectively while keeping tests fast.

### Data Distribution

Ensure data includes:
- Mix of null and non-null values
- Various string cases for collation testing (uppercase, lowercase, special chars)
- Date ranges (past, present, future)
- Boolean distributions (true/false/null if applicable)
- Numeric ranges (negative, zero, positive, large numbers)
- Some soft-deleted records (deletedAt not null)

## Test Configuration Interface

```typescript
interface E2ETestConfig {
  // Collection instances configured for testing
  collections: {
    eager: {
      users: Collection<User>;
      posts: Collection<Post>;
      comments: Collection<Comment>;
    };
    onDemand: {
      users: Collection<User>;
      posts: Collection<Post>;
      comments: Collection<Comment>;
    };
  };

  // Lifecycle hooks
  setup: () => Promise<void>;
  teardown: () => Promise<void>;

  // Per-test hooks (optional)
  beforeEach?: () => Promise<void>;
  afterEach?: () => Promise<void>;
}
```

**Note**: Collections handle their own mutation logic internally, so no separate mutator is needed in the config.

## Test Scenario Groups

### 1. Predicates Suite

Test basic predicate functionality across all data types:

**Test cases:**
- `eq()` with various types (string, number, boolean, date, UUID, null)
- `ne()` with various types
- `gt()`, `gte()`, `lt()`, `lte()` with numbers and dates
- `in()` with arrays
- `isNull()` and `isNotNull()`
- Complex boolean logic (AND, OR combinations)
- Nested predicates

**Assertions:**
- Query returns correct data matching predicates
- Only necessary data is loaded (check collection state)
- No errors thrown

### 2. Pagination Suite

Test ordering, limits, offsets, and window management:

**Test cases:**
- Basic orderBy (ascending/descending)
- Multiple orderBy fields
- Limit without offset
- Limit with offset
- `liveQuery.utils.setWindow()` - changing windows
- `setWindow()` while data is loading
- Overlapping windows (page 2 before page 1 completes)
- Edge cases: limit=0, offset beyond dataset, negative values

**Assertions:**
- Correct page of data returned
- Proper ordering maintained
- Only requested data loaded (not entire dataset)

### 3. Joins Suite

Test multi-collection joins with various syncMode combinations:

**Test cases:**
- Two-collection join (Users + Posts)
- Three-collection join (Users + Posts + Comments)
- Mixed syncModes: one on-demand, one eager
- Both collections on-demand
- Predicates on joined collections (verify pushdown)
- Ordering across joined collections
- Pagination on joined results
- `setWindow()` on joins requiring loads from multiple collections

**Assertions:**
- Correct joined data returned
- Each collection only loads required subset (predicate pushdown working)
- Query result matches expected join
- No extra data loaded

**Example test:**
```typescript
// Join users and posts where userId = 123
// Verify only user 123's posts loaded, not all posts
```

### 4. Deduplication Suite

Test concurrent loadSubset calls and deduplication behavior:

**Test cases:**
- Two queries with identical predicates calling loadSubset simultaneously
- Overlapping predicates (one is subset of another)
- Queries arriving while data is still loading
- Multiple concurrent queries with different predicates
- Deduplication with limit/offset variations

**Assertions:**
- Use deduplication callback to count actual vs deduplicated loads
- Verify expected number of backend requests
- All queries receive correct data
- No race conditions or data corruption

### 5. Collation Suite

Test string collation configuration:

**Test cases:**
- Default collation behavior
- Custom `defaultStringCollation` at collection level
- Custom collation at query level
- Collation inheritance in nested queries
- String comparisons with different collations (case-sensitive vs case-insensitive)

**Assertions:**
- String predicates respect collation settings
- Correct data returned based on collation
- Query-level collation overrides collection-level

### 6. Mutations Suite

Test mutations with on-demand syncMode:

**Test cases:**
- Mutate loaded data (verify sync back)
- Create new record (verify appears in matching queries)
- Update record to match/unmatch query predicates
- Delete record
- Concurrent mutations

**Assertions:**
- Mutations to loaded records sync correctly
- Query results update reactively after mutations
- Cannot mutate records that aren't loaded (verify error/behavior)

**Note**: Mutation logic is collection-specific, handled internally.

### 7. Live Updates Suite (Optional)

For collections that support sync, test reactive updates:

**Test cases:**
- Load subset via query
- Mutate data on backend (outside client)
- Verify query reactively updates
- Updates during active loadSubset
- Multiple queries watching same data

**Assertions:**
- Query data updates when backend changes
- Updates don't trigger unnecessary reloads
- Correct data maintained throughout

### 8. Regression Suite

Explicit tests for known bugs found during development:

**Critical bugs to test:**

1. **Missing subset_ params bug**
   - Query with no `where` or `limit` should include proper params
   - Server shouldn't treat as normal shape request
   - Test: Create query without predicates, verify URL params

2. **eq(deletedAt, null) SQL syntax error**
   - Test: `eq(deletedAt, null)` should work without SQL errors
   - Verify correct records returned

3. **eq(id, uuid) syntax error**
   - Test: `eq(id, '{uuid-value}')` should work
   - Verify UUID fields work in predicates

4. **Unnecessary offset in URL**
   - Verify offset only added when needed
   - Test URL construction

5. **JSON parse inefficiency**
   - Ensure snapshot responses aren't parsed multiple times
   - (May need internal inspection or performance monitoring)

### 9. Progressive Mode (Deferred)

**Not included in initial implementation**, but placeholder for future:
- Test mode transition from on-demand → eager when full sync completes
- Verify subset loads work while background sync proceeds

## Infrastructure Setup

### Docker Orchestration

Follow Electric's e2e pattern (see generated docs):

**Services needed:**
- PostgreSQL (port 54321)
- Electric server (port 3000)
- tmpfs for performance
- Health checks with proper timeouts

**docker-compose.yml example:**
```yaml
services:
  postgres:
    image: postgres:14-alpine
    environment:
      POSTGRES_DB: electric
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
    ports:
      - "54321:5432"
    tmpfs: /var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 2s
      timeout: 10s
      retries: 5

  electric:
    image: electricsql/electric:latest
    environment:
      DATABASE_URL: postgresql://postgres:password@postgres:5432/electric
      ELECTRIC_WRITE_TO_PG_MODE: direct_writes
      PG_PROXY_PORT: "65432"
    ports:
      - "3000:3000"
    depends_on:
      postgres:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
      interval: 2s
      timeout: 10s
      retries: 5
```

### Database Isolation

Use **schema-based isolation** with unique table names per test:

```typescript
// Shared database: electric_test schema
// Unique table names: `table_for_{taskId}_{randomSuffix}`
```

This prevents test collisions while allowing parallel execution within Vitest's constraints.

### Vitest Configuration

**Critical settings:**
```typescript
export default defineConfig({
  test: {
    fileParallelism: false, // Serial execution for shared DB
    globalSetup: './e2e/global-setup.ts',
    timeout: 30000, // Extended for Docker operations
  }
})
```

### Global Setup

Implement health check for Docker services:

```typescript
// global-setup.ts
export async function setup() {
  // Wait for Postgres and Electric to be healthy
  await waitForHealthCheck('http://localhost:3000/health')
  await waitForPostgres('postgresql://postgres:password@localhost:54321/electric')
}
```

### Test Fixtures

Use Vitest's `test.extend()` for composable fixtures:

```typescript
const testWithDb = test.extend({
  db: async ({}, use) => {
    const db = await setupDatabase()
    await use(db)
    await cleanupDatabase(db)
  }
})

const testWithCollections = testWithDb.extend({
  collections: async ({ db }, use) => {
    const collections = await seedAndCreateCollections(db)
    await use(collections)
  }
})
```

## Test Flow Pattern

Typical test structure:

```typescript
test('should load correct data with predicates', async ({ collections }) => {
  // 1. Seed data happens in fixture setup

  // 2. Create query with predicates
  const query = collections.onDemand.users.liveQuery({
    where: eq(users.age, 25)
  })

  // 3. Await preload
  await query.preload()

  // 4. Assert correct data
  const result = query.getResult()
  expect(result).toHaveLength(expectedCount)
  expect(result.every(u => u.age === 25)).toBe(true)

  // 5. Verify only necessary data loaded
  const loadedIds = getLoadedUserIds(collections.onDemand.users)
  expect(loadedIds).toEqual(expectedUserIds)
})
```

## Mock Backend (Query Collection)

For `query-db-collection`, mock the backend fetch:

```typescript
// Mock TanStack Query backend
const mockBackend = {
  fetchUsers: vi.fn(async ({ where, orderBy, limit, offset }) => {
    // Return filtered/paginated seed data
    return filterData(seedData.users, { where, orderBy, limit, offset })
  })
}
```

## Implementation Checklist

### Phase 1: Infrastructure
- [ ] Create `@tanstack/db-collection-e2e` package
- [ ] Set up Docker Compose for Postgres + Electric
- [ ] Implement global setup with health checks
- [ ] Create test fixtures for DB and collections
- [ ] Define standard schema and seed data
- [ ] Create config interface and types

### Phase 2: Core Test Suites
- [ ] Implement Predicates Suite
- [ ] Implement Pagination Suite
- [ ] Implement Joins Suite
- [ ] Implement Deduplication Suite

### Phase 3: Additional Suites
- [ ] Implement Collation Suite
- [ ] Implement Mutations Suite
- [ ] Implement Regression Suite (known bugs)
- [ ] Implement Live Updates Suite (optional)

### Phase 4: Collection Integration
- [ ] Set up e2e tests for `electric-db-collection`
  - [ ] Electric-specific setup/teardown
  - [ ] Run all applicable suites
- [ ] Set up e2e tests for `query-db-collection`
  - [ ] Mock backend setup
  - [ ] Run all applicable suites (skip Electric-specific)

### Phase 5: CI/CD
- [ ] Verify execution time < 5 minutes
- [ ] Add to CI pipeline
- [ ] Document how to run locally

## Success Criteria

- [ ] All test suites pass for both `electric-db-collection` and `query-db-collection`
- [ ] Known bugs from early testing are caught by regression tests
- [ ] Deduplication verified via callback assertions
- [ ] Predicate pushdown verified (collections don't load extra data)
- [ ] Joins work correctly with mixed syncModes
- [ ] Pagination and ordering work correctly
- [ ] String collation respected
- [ ] Total execution time < 5 minutes
- [ ] Tests are reliable (no flakes)
- [ ] New collections can easily adopt the test suite

## Future Enhancements

- Progressive mode transition testing
- Performance benchmarks
- Subscription lifecycle edge cases (unsubscribe during load)
- Network failure scenarios
- More complex join patterns (self-joins, multiple paths)

## References

- **Electric e2e documentation**: See `README_E2E_TESTS.md`, `ELECTRIC_E2E_PATTERNS.md`, etc. in this repo
- **RFC #676**: https://github.com/TanStack/db/discussions/676
- **PR #763**: https://github.com/TanStack/db/pull/763
- **Electric TypeScript Client**: `~/programs/electric/packages/typescript-client`

---

## Notes for Implementation

1. **Start small**: Implement infrastructure + Predicates Suite first to validate approach
2. **Copy Electric patterns**: Leverage proven patterns from Electric's e2e setup
3. **Keep tests strict**: No retries, fast timeouts - rely on setup hooks for consistency
4. **Test through public API**: Don't expose internal state unless necessary
5. **Document as you go**: Add examples for future collections to reference


E2E Tests for Query-Driven Sync with Predicate Push-Down #772

Description

Overview

Goals

Architecture

Package Structure

Test Organization

Collection Integration

Standard Test Data Schema

Entities and Relationships

Seed Data Volume

Data Distribution

Test Configuration Interface

Test Scenario Groups

1. Predicates Suite

2. Pagination Suite

3. Joins Suite

4. Deduplication Suite

5. Collation Suite

6. Mutations Suite

7. Live Updates Suite (Optional)

8. Regression Suite

9. Progressive Mode (Deferred)

Infrastructure Setup

Docker Orchestration

Database Isolation

Vitest Configuration

Global Setup

Test Fixtures

Test Flow Pattern

Mock Backend (Query Collection)

Implementation Checklist

Phase 1: Infrastructure

Phase 2: Core Test Suites

Phase 3: Additional Suites

Phase 4: Collection Integration

Phase 5: CI/CD

Success Criteria

Future Enhancements

References

Notes for Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions