Use database instead of direct ACS for beginning of WS query, when enabled #6312

S11001001 · 2020-06-11T15:50:45Z

When Postgres is enabled, for standard HTTP query we:

update the database for that template ID and party (contractsFromOffsetIo) by either
a. filling with (ACS ++ txStream-to-ledger-end) acsFollowingAndBoundary if no offset exists in database, or
b. filling with txStream-to-ledger-end transactionsFollowingBoundary if offset exists;
save the ending offset to DB for future queries;
run the query against the database.

We ensure that the tx stream terminates by reading only to ledger end.

For WS query, we currently ignore whether Postgres is configured, and always use acsFollowingAndBoundary, which conveniently lets us stream from ledger API, through JSON API, to the user, with no intermediate reification.

We could do something similar for WS query:

Catch-up the database to the initial ledger-end just as in (1), (2) above.
Initially return results by querying against DB, treating it as ACS for purposes of the JSON API semantics.
Start a tx stream from the offset that was just saved to DB, proceeding normally.

The client gets a uniform stream of (ACS-from-DB ++ (begin live data) ++ tx-from-ledger-API); there are no semantic changes from its perspective. This issue is merely a performance change.

Several potential variations arise with different implications. For example,

you could never write to DB, only using it if data were present already; this would avoid the catch-up delay, but a JSON API deployment where only WS queries were used would never use DB.
You could write to DB the unfiltered stream as you filtered the stream in-memory and returned the results directly, "prefiring" subsequent queries on the same template/party, so to speak, but you'd have to choose reasonable commit points arbitrarily still permitting retry (we currently deal with constraint violations, which we expect, by just running the update again, which happens to always be the best way to satisfy them; that isn't appropriate here).
&c.

These alter one informal invariant: in any given HTTP or WS request, we use either the in-memory contract query interpreter or the SQL predicate contract query interpreter, never both. This would use the latter for the ACS and the former for the transaction stream to follow.

Also to be considered: whether this is worth doing.

The text was updated successfully, but these errors were encountered:

leo-da · 2020-07-23T15:31:07Z

We will get back to this issue after we review the results of #6675

cocreature · 2020-10-30T11:19:43Z

We have the benchmarks now to show that the postgresql backend is indeed faster. There are too further factors which are only hinted at in the benchmarks that make this even more important:

We have seen a few users with very large (> 100k) ACS and queries that only target a very small subset of that. Postgres with the right indices will do a much better job here than loading the entire ACS in memory and querying it here
Our benchmarks focus on single queries. For setups with more clients, you can easily put a lot of load on the ledger and slow everything down. Using the DB as a cache to avoid that significantly reduces load on the ledger.

As for the 2 questions above, I think we should do the catchup at the beginning. Most usecases I’ve seen so far have relatively few changes but large ACS sizes so this isn’t too expensive and it avoids falling too far behind. As for writing while streaming I don’t have a great answer. I would probably opt to do this is a separate step if at all. I think you can make a reasonable argument that in web applications you will commonly have relatively short lived connections so even if you don’t persist while streaming you don’t fall far behind. But in principle, persisting while streaming would be nice so if we can figure out how to make it work without turning it into a mess, I’m happy to add it.

S11001001 added discussion Things to be discussed and decided component/json-api HTTP JSON API labels Jun 11, 2020

S11001001 added this to the HTTP JSON API Maintenance milestone Jun 11, 2020

S11001001 self-assigned this Nov 30, 2020

S11001001 added the wip-issue This issue is being worked on. Use draft PRs for work in progress PRs. label Nov 30, 2020

S11001001 mentioned this issue Dec 9, 2020

hybrid database-then-gRPC websocket query #8226

Merged

7 tasks

S11001001 closed this as completed in #8226 Jan 14, 2021

stefanobaghino-da added the team/ledger-clients Related to the Ledger Clients team's components. label Aug 31, 2021

S11001001 removed the wip-issue This issue is being worked on. Use draft PRs for work in progress PRs. label Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use database instead of direct ACS for beginning of WS query, when enabled #6312

Use database instead of direct ACS for beginning of WS query, when enabled #6312

S11001001 commented Jun 11, 2020 •

edited

leo-da commented Jul 23, 2020

cocreature commented Oct 30, 2020

Use database instead of direct ACS for beginning of WS query, when enabled #6312

Use database instead of direct ACS for beginning of WS query, when enabled #6312

Comments

S11001001 commented Jun 11, 2020 • edited

leo-da commented Jul 23, 2020

cocreature commented Oct 30, 2020

S11001001 commented Jun 11, 2020 •

edited