Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use database instead of direct ACS for beginning of WS query, when enabled #6312

Closed
S11001001 opened this issue Jun 11, 2020 · 2 comments · Fixed by #8226
Closed

Use database instead of direct ACS for beginning of WS query, when enabled #6312

S11001001 opened this issue Jun 11, 2020 · 2 comments · Fixed by #8226
Assignees
Labels
component/json-api HTTP JSON API discussion Things to be discussed and decided team/ledger-clients Related to the Ledger Clients team's components.

Comments

@S11001001
Copy link
Contributor

S11001001 commented Jun 11, 2020

When Postgres is enabled, for standard HTTP query we:

  1. update the database for that template ID and party (contractsFromOffsetIo) by either
    a. filling with (ACS ++ txStream-to-ledger-end) acsFollowingAndBoundary if no offset exists in database, or
    b. filling with txStream-to-ledger-end transactionsFollowingBoundary if offset exists;
  2. save the ending offset to DB for future queries;
  3. run the query against the database.

We ensure that the tx stream terminates by reading only to ledger end.

For WS query, we currently ignore whether Postgres is configured, and always use acsFollowingAndBoundary, which conveniently lets us stream from ledger API, through JSON API, to the user, with no intermediate reification.

We could do something similar for WS query:

  1. Catch-up the database to the initial ledger-end just as in (1), (2) above.
  2. Initially return results by querying against DB, treating it as ACS for purposes of the JSON API semantics.
  3. Start a tx stream from the offset that was just saved to DB, proceeding normally.

The client gets a uniform stream of (ACS-from-DB ++ (begin live data) ++ tx-from-ledger-API); there are no semantic changes from its perspective. This issue is merely a performance change.

Several potential variations arise with different implications. For example,

  • you could never write to DB, only using it if data were present already; this would avoid the catch-up delay, but a JSON API deployment where only WS queries were used would never use DB.
  • You could write to DB the unfiltered stream as you filtered the stream in-memory and returned the results directly, "prefiring" subsequent queries on the same template/party, so to speak, but you'd have to choose reasonable commit points arbitrarily still permitting retry (we currently deal with constraint violations, which we expect, by just running the update again, which happens to always be the best way to satisfy them; that isn't appropriate here).
  • &c.

These alter one informal invariant: in any given HTTP or WS request, we use either the in-memory contract query interpreter or the SQL predicate contract query interpreter, never both. This would use the latter for the ACS and the former for the transaction stream to follow.

Also to be considered: whether this is worth doing.

@S11001001 S11001001 added discussion Things to be discussed and decided component/json-api HTTP JSON API labels Jun 11, 2020
@S11001001 S11001001 added this to the HTTP JSON API Maintenance milestone Jun 11, 2020
@leo-da
Copy link
Contributor

leo-da commented Jul 23, 2020

We will get back to this issue after we review the results of #6675

@cocreature
Copy link
Contributor

We have the benchmarks now to show that the postgresql backend is indeed faster. There are too further factors which are only hinted at in the benchmarks that make this even more important:

  1. We have seen a few users with very large (> 100k) ACS and queries that only target a very small subset of that. Postgres with the right indices will do a much better job here than loading the entire ACS in memory and querying it here

  2. Our benchmarks focus on single queries. For setups with more clients, you can easily put a lot of load on the ledger and slow everything down. Using the DB as a cache to avoid that significantly reduces load on the ledger.

As for the 2 questions above, I think we should do the catchup at the beginning. Most usecases I’ve seen so far have relatively few changes but large ACS sizes so this isn’t too expensive and it avoids falling too far behind. As for writing while streaming I don’t have a great answer. I would probably opt to do this is a separate step if at all. I think you can make a reasonable argument that in web applications you will commonly have relatively short lived connections so even if you don’t persist while streaming you don’t fall far behind. But in principle, persisting while streaming would be nice so if we can figure out how to make it work without turning it into a mess, I’m happy to add it.

@S11001001 S11001001 self-assigned this Nov 30, 2020
@S11001001 S11001001 added the wip-issue This issue is being worked on. Use draft PRs for work in progress PRs. label Nov 30, 2020
@stefanobaghino-da stefanobaghino-da added the team/ledger-clients Related to the Ledger Clients team's components. label Aug 31, 2021
@S11001001 S11001001 removed the wip-issue This issue is being worked on. Use draft PRs for work in progress PRs. label Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/json-api HTTP JSON API discussion Things to be discussed and decided team/ledger-clients Related to the Ledger Clients team's components.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants