Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_table_by_scope's pagination scheme can lead to endless request loop when scope contains over limit tables #615

Open
spoonincode opened this issue Aug 22, 2024 · 3 comments
Labels
discussion enhancement New feature or request

Comments

@spoonincode
Copy link
Member

get_table_by_scope interface has a deficiency it how it handles its more pagination scheme. Without a table filter, the returned more will cause the next request to start at first table in that scope again. This means if a scope contains over limit tables (which is capped at 1000) the more returned will always be the lower_bound requested. Repeat forever.

To illustrate this consider scopes and tables of,

Scope Table
apple balance
banana balance
banana foo
banana foobar
carrot balance

Now consider requests are made with limit=2. The first request (with no lower_bound) will return apple:balance and banana:balance with more=banana. The next response will return banana:balance1 and banana:foo with more=banana. And.. repeat. It's impossible to make further progress.

Using a table filter as part of the request doesn't entirely resolve the problem. Even in the above example, limit=2,lower_bound=banana,table=balance will cause an endless pagination loop returning more=banana each time. This is because the limit (appropriately) applies to the number of walked tables, not the number of matched tables.

This might be fixable in the current exposed API by having more expose a longer (possibly opaque) value representing scope+table, and then having lower_bound understand that value as being a scope+table instead of just a scope. lower_bound and more are strings so this is allowed from that standpoint; but it's possible clients out there are treating these as names instead.

Footnotes

  1. Notice how banana:balance was returned twice: once in first request and once in second request. That's undesirable behavior of the pagination scheme as well, but a client would need to be prepared for changes between pagination requests anyways due to possible state changes between requests.

@spoonincode spoonincode changed the title get_table_by_scope's pagination scheme can lead to endless request loop when scope contains over limit rows get_table_by_scope's pagination scheme can lead to endless request loop when scope contains over limit tables Aug 22, 2024
@bhazzard bhazzard added this to the Spring v1.1.0 milestone Aug 22, 2024
@bhazzard bhazzard added discussion enhancement New feature or request and removed triage labels Aug 22, 2024
@bhazzard
Copy link

bhazzard commented Sep 5, 2024

We think that prior to leap v5.0.0 it was likely capped by time. So it is possible this was introduced in v5.0.0.

Kevin mentioned an alternative approach that may be easiest would be to always finish out a scope once it is started. This could present potential memory issues though.

Also, it is worth discussing whether this can/should be superceded by a solution based on read-only transactions.

The chosen solution targeting 1.1.x should be a non-breaking change, or if a breaking change is preferred we should target 2.0.0.

@heifner
Copy link
Member

heifner commented Oct 29, 2024

get_table_by_scope returns 4 names (uint64_t) and 1 uint32_t. Since this is bounded, is there really any harm in always including complete scope? Current RAM cost is 0.009547. Overhead per code,scope,table billed is 108 bytes. 100K rows would cost over 100K. Returning 100K rows does not seem like a lot. The conversion to JSON is done in the http thread pool. This would only tie up the main thread for creating the 100K entries into a vector<uin64_t, uint64_t, uint64_t, uint64_t, uint32_t>.

@spoonincode
Copy link
Member Author

I think 256MiB of RAM is around 1 million tables though. Do we know the CPU time and memory overhead of dealing with something on that scale?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

4 participants