Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Page Size Handling with Large Documents #447

Open
Skylertodd opened this issue May 31, 2024 · 2 comments
Open

Inconsistent Page Size Handling with Large Documents #447

Skylertodd opened this issue May 31, 2024 · 2 comments
Labels
bug Something isn't working help wanted Extra attention is needed up-for-grabs 🙏🏽 Happy to consider a pull review to address this issue

Comments

@Skylertodd
Copy link

Apparent Behavior:
Continuation token skipping records when fetching page 2.

When setting a larger page size and dealing with large documents, there is an edge case where Cosmos DB returns fewer items than the desired page size due to size limitations (believed to be 4MB). This results in an issue where the iterator fetches another page of items, causing the total fetched items to exceed the desired amount. The continuation token reflects this higher number, but when iterating through the results, extra records are discarded to match the exact requested count. Consequently, when fetching subsequent pages, records that were discarded are skipped.

Steps to Reproduce / Example Scenario:

  1. Set a large page size (e.g., 50 items).
  2. Fetch documents where the combined size might exceed Cosmos DB's response size limit (4MB).
  3. Observe that Cosmos DB may return fewer items than requested (e.g., 43 instead of 50).
  4. The iterator fetches additional items (e.g., another 50), resulting in a total that exceeds the desired count.
  5. Discard extra items to match the requested count (e.g., discard 34 items to return exactly 50).
  6. The continuation token reflects the total fetched items (e.g., 84), causing subsequent fetches to skip the discarded items.

Expected Behavior / Recommendation:

  1. Drain the entire iterator and potentially return more items than requested.
  2. Dynamically set MaxItemCount based on the remaining number of records needed after continuing.

Reference Code:
DefaultRepository.cs#L50

@IEvangelist
Copy link
Owner

That's a really interesting edge case, thank you very much for reporting here! Is this something you'd be interested in attempting to fix? I'd be more than happy to review a PR. Just let me know, thanks again @Skylertodd for such a thorough bug report.

@IEvangelist IEvangelist added bug Something isn't working help wanted Extra attention is needed up-for-grabs 🙏🏽 Happy to consider a pull review to address this issue labels Jun 1, 2024
@Skylertodd
Copy link
Author

When I was initially investigating, I thought it would be easy to dynamically set page size based on remaining records... however it appears that the feedIterator is immutable and it would require a bit of a refactor to separate out the continuation token, query, and paging information. If I find some time, I might toss something together, for now I decided to work around it by lowering the count of items being returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed up-for-grabs 🙏🏽 Happy to consider a pull review to address this issue
Projects
None yet
Development

No branches or pull requests

2 participants