Skip to content

Pagination.applyFilterAndPage surplus growth is too slow for low-match-rate filters #82

@jtnelson

Description

@jtnelson

Problem

Pagination.applyFilterAndPage uses a surplusFactor that grows linearly (++surplusFactor) when a database page yields zero filter matches. This means the page size grows as limit * 1, limit * 2, limit * 3, etc. When the filter match rate is very low (e.g., 1-5%), this linear ramp-up requires many round trips to the database before the fetch window is large enough to capture any matches. Each empty round trip is wasted I/O.

The inverse problem also exists: when a page does contain matches, the factor decreases by exactly 1 (--surplusFactor), ignoring how many matches were actually found. A page that matched 1 out of 100 records and a page that matched 99 out of 100 receive the same adjustment.

Concrete scenario

  • 10,000 records in the database
  • Filter match rate: 2% (200 records pass)
  • Requested page size (limit): 20

To fill one page of 20 matching records, the algorithm needs ~1,000 database records. With surplusFactor incrementing by 1 each empty page and page size = limit * surplusFactor:

  • Iteration 1: fetches 20 records, ~0 matches, factor becomes 2
  • Iteration 2: fetches 40 records, ~1 match, factor stays 2
  • Iteration 3: fetches 40 records, ~1 match, factor stays 2
  • ...many more iterations needed

This produces excessive database round trips that could be avoided.

Where it matters

This affects every code path that composes a client-side filter with pagination:

  • Audience framework: $checkIfVisible() is composed via .and() into the filter predicate for every find/load call, making every paginated Audience query subject to this behavior.
  • Runway.filter() / Runway.filterAny(): used when Criteria cannot be resolved server-side.
  • DatabaseInterface default methods: any find/load overload accepting both a Predicate filter and a Page.

Proposed Fix

Replace the linear surplusFactor increment with a match-rate-based calculation that estimates the page size needed to fill the remaining slots in one fetch.

Algorithm

After processing each database page, compute:

matchRate = matchesThisPage / unfilteredPageSize
remaining = limit - count
nextPageSize = ceil(remaining / matchRate)

This jumps directly to the right ballpark instead of crawling toward it.

Edge cases

Condition Handling
Zero matches on a page (matchRate = 0) Cannot compute a rate. Double the current page size as a geometric fallback. This is still significantly faster than +1.
Overshoot (calculated size is very large) Cap at a configurable maximum (e.g., limit * MAX_SURPLUS_FACTOR where MAX_SURPLUS_FACTOR could be 50 or similar) to prevent pathological memory/network usage.
Very small unfiltered page (e.g., 1 record) The rate estimate from a tiny sample is unreliable. The cap handles the worst case; subsequent pages will refine the estimate.

Pseudocode

Set<T> records = new LinkedHashSet<>();
int offset = page.offset();
int limit = page.limit();
int count = 0;
int skipped = 0;
int nextSize = limit;
page = Page.of(0, nextSize);

outer: while (count < limit) {
    Set<T> unfiltered = function.apply(page);
    if(unfiltered.isEmpty()) {
        break;
    }
    int matched = 0;
    for (T record : unfiltered) {
        if(filter.test(record)) {
            if(skipped < offset) {
                ++skipped;
            }
            else {
                records.add(record);
                ++count;
                ++matched;
                if(count == limit) {
                    break outer;
                }
            }
        }
    }
    int remaining = limit - count;
    if(matched == 0) {
        // No signal -- geometric growth as fallback
        nextSize = Math.min(nextSize * 2, limit * MAX_SURPLUS_FACTOR);
    }
    else {
        // Estimate based on observed match rate
        double matchRate = (double) matched / unfiltered.size();
        nextSize = (int) Math.ceil(remaining / matchRate);
        nextSize = Math.min(nextSize, limit * MAX_SURPLUS_FACTOR);
        nextSize = Math.max(nextSize, limit);
    }
    page = page.next().size(nextSize);
}
return records;

Semantic change

surplusFactor (a multiplier on limit) is replaced by nextSize (the actual page size to request). This is cleaner since the factor was only ever used in page.size(limit * surplusFactor).

Unit Test Plan

The existing PaginationTest.testApplyFilterAndPage covers the basic correctness case (50% match rate, iterating through all pages). The following additional test cases should be added:

1. testApplyFilterAndPageWithLowMatchRate

  • Setup: 1,000 items, filter passes ~2% (e.g., item % 50 == 0).
  • Verify: Page of 10 returns exactly the correct 10 items for each page. All pages are filled correctly across the full dataset.
  • Purpose: Exercises the match-rate estimation and geometric fallback.

2. testApplyFilterAndPageWithNoMatches

  • Setup: 100 items, filter rejects everything.
  • Verify: Returns an empty set (does not loop infinitely).
  • Purpose: Ensures the unfiltered.isEmpty() termination condition works correctly when the filter eliminates all records.

3. testApplyFilterAndPageWithAllMatches

  • Setup: 100 items, filter passes everything.
  • Verify: Pagination returns the same results as un-filtered pagination.
  • Purpose: Confirms no off-by-one errors when the filter is a no-op.

4. testApplyFilterAndPageMatchRateDropsMidStream

  • Setup: 200 items where the first 100 have a 50% match rate and the last 100 have a 2% match rate (e.g., items 1-100 match if even, items 101-200 match if divisible by 50).
  • Verify: Pages that span the boundary are still correctly filled.
  • Purpose: Validates that the rate-based estimate adapts when the match rate changes across pages.

5. testApplyFilterAndPageWithOffsetBeyondMatches

  • Setup: 50 items, filter passes 10. Page with offset > 10.
  • Verify: Returns an empty set.
  • Purpose: Ensures offset skipping terminates correctly when there are not enough matches.

6. testApplyFilterAndPageSingleItemPages

  • Setup: 20 items, limit = 1, filter passes ~50%.
  • Verify: Each single-item page returns the correct next matching item.
  • Purpose: Edge case where limit is 1 and the algorithm must advance correctly one match at a time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions