Inconsistent Page Size Handling with Large Documents #447

Skylertodd · 2024-05-31T17:01:09Z

Apparent Behavior:
Continuation token skipping records when fetching page 2.

When setting a larger page size and dealing with large documents, there is an edge case where Cosmos DB returns fewer items than the desired page size due to size limitations (believed to be 4MB). This results in an issue where the iterator fetches another page of items, causing the total fetched items to exceed the desired amount. The continuation token reflects this higher number, but when iterating through the results, extra records are discarded to match the exact requested count. Consequently, when fetching subsequent pages, records that were discarded are skipped.

Steps to Reproduce / Example Scenario:

Set a large page size (e.g., 50 items).
Fetch documents where the combined size might exceed Cosmos DB's response size limit (4MB).
Observe that Cosmos DB may return fewer items than requested (e.g., 43 instead of 50).
The iterator fetches additional items (e.g., another 50), resulting in a total that exceeds the desired count.
Discard extra items to match the requested count (e.g., discard 34 items to return exactly 50).
The continuation token reflects the total fetched items (e.g., 84), causing subsequent fetches to skip the discarded items.

Expected Behavior / Recommendation:

Drain the entire iterator and potentially return more items than requested.
Dynamically set MaxItemCount based on the remaining number of records needed after continuing.

Reference Code:
DefaultRepository.cs#L50

IEvangelist · 2024-06-01T00:28:25Z

That's a really interesting edge case, thank you very much for reporting here! Is this something you'd be interested in attempting to fix? I'd be more than happy to review a PR. Just let me know, thanks again @Skylertodd for such a thorough bug report.

Skylertodd · 2024-06-03T22:47:02Z

When I was initially investigating, I thought it would be easy to dynamically set page size based on remaining records... however it appears that the feedIterator is immutable and it would require a bit of a refactor to separate out the continuation token, query, and paging information. If I find some time, I might toss something together, for now I decided to work around it by lowering the count of items being returned.

IEvangelist added bug Something isn't working help wanted Extra attention is needed up-for-grabs 🙏🏽 Happy to consider a pull review to address this issue labels Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Page Size Handling with Large Documents #447

Inconsistent Page Size Handling with Large Documents #447

Skylertodd commented May 31, 2024

IEvangelist commented Jun 1, 2024

Skylertodd commented Jun 3, 2024

Inconsistent Page Size Handling with Large Documents #447

Inconsistent Page Size Handling with Large Documents #447

Comments

Skylertodd commented May 31, 2024

IEvangelist commented Jun 1, 2024

Skylertodd commented Jun 3, 2024