Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Storing Client Data #2901

Open
wants to merge 35 commits into
base: develop
from

Conversation

@wu-s-john
Copy link
Contributor

wu-s-john commented Jul 18, 2019

Checklist:

  • Tests were added for the new behavior
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? List them:

Closes #0000
Closes #0000

@enolan

This comment has been minimized.

Copy link
Contributor

enolan commented Jul 18, 2019

looking

@psteckler

This comment has been minimized.

Copy link
Contributor

psteckler commented Jul 18, 2019

I wonder if this should live outside the client, because it adds complexity that not all users will want.

@bkase

This comment has been minimized.

Copy link
Contributor

bkase commented Jul 21, 2019

This is a good idea @psteckler ! It would force us to expose a powerful enough API to build this thing as well

@psteckler

This comment has been minimized.

Copy link
Contributor

psteckler commented Jul 22, 2019

This is a good idea @psteckler ! It would force us to expose a powerful enough API to build this thing as well

In fact, I wonder if the current executable should be split into codad and coda-cli. There could be an API in the daemon for loading plugins, like one to store client data. That way, the daemon proper could be narrowly focused on enforcing the protocol.

@wu-s-john

This comment has been minimized.

Copy link
Contributor Author

wu-s-john commented Jul 22, 2019

This is a good idea @psteckler ! It would force us to expose a powerful enough API to build this thing as well

In fact, I wonder if the current executable should be split into codad and coda-cli. There could be an API in the daemon for loading plugins, like one to store client data. That way, the daemon proper could be narrowly focused on enforcing the protocol.

I agree. We should have a different process to do client queries and storing data. However, it makes the code more complicated. Namely, it's nice to do these queries on the same process because we do queries on the in-memory transition frontier and transaction pool. I am thinking about how to overcome these issues and other tradeoffs.

@psteckler

This comment has been minimized.

Copy link
Contributor

psteckler commented Jul 22, 2019

it's nice to do these queries on the same process

Right, that's a reason a plugin API might be helpful here.

@nholland94

This comment has been minimized.

Copy link
Contributor

nholland94 commented Jul 22, 2019

+1 for this living outside

| Unknown (* Could not compute status of transaction i.e. We don't have a record of it *)
```

Notice that `consensus_status` is a block's position in Nakamoto consensus. This position has nothing to do with a block being snarked or not, so `Snarked` is not a constructor of this variant.

This comment has been minimized.

Copy link
@cmr

cmr Jul 22, 2019

Contributor

Nakamoto consensus? What?

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Jul 22, 2019

Author Contributor

Lol. Nakamoto consensus is basically having your consensus state represented as tree, like in Bitcoin and our protocol. Other distributed systems consensus algorithms use some flavor of Byzantine consensus, which is a pointer of the current state of the system and a pointer to the next candidate state.

So in the context of this RFC, Nakamoto consensus represents the node in the consensus tree.

@bkase bkase force-pushed the master branch from f0cc344 to 056d020 Jul 30, 2019
@bkase bkase changed the base branch from master to develop Jul 30, 2019
@wu-s-john wu-s-john requested review from figitaki and Schmavery as code owners Aug 1, 2019
wu-s-john added 5 commits Aug 1, 2019
Copy link
Contributor

nholland94 left a comment

You have two different pngs for the client_process diagram. Can you delete client_process.png so that we only have the one that is generated through the makefile command?

Copy link
Contributor

nholland94 left a comment

I have a few issues with the approach outlined in the RFC, as well as the structure of this RFC for tackling the problem at hand. Firstly, this RFC is too large in scope. There are multiple decisions being outlined and detailed in this RFC, and I think that it will be easier to discuss this if we break this up into two parts. Tentatively, for this comment, I am going to break up my feedback and criticism into two sections: Architecture and API (I believe this would also be the best division for splitting up this RFC into two discussions as well).

Architecture

Premature Optimization/Overengineering

The implementation that is layed out for solving this problem is extremely complex. Caching, in particular, is very hard to get correct, even harder to debug, and in the scope of this problem, is very complex. And, at this point in time, we don't actually know if we need caching for this. Who is to say that reading from a db is too slow and that we need caching here? Many APIs and webserver exist and scale without adding in-process caching, and there are many more modular solutions for adding caching in front of database requests that are typically used before considering adding caching to the server itself. For us in particular, this API isn't meant to scale nearly as much as a web API since, under the current model, each node will only have 1 wallet client communicating to it.

I'm not saying the model you have outlined is wrong or won't work, but rather that it's very complex and likely overkill for what you are trying to achieve. It's better to do this kind of thing incrementally: start with a system that just persists to a relational database, and add caching only when you need it and only where you need it. Who knows, maybe non of the stuff you have implemented caching for right now actually needs it, but later, as uses cases change, we will find something new that really needs caching. It will be easier to add that later if we keep it simple now.

Modularity/Generalization of Architecture

The design you have speced out is very specific towards only surfacing a GraphQL API for live queries to the state of a node. However, I would suggest generalizing this architecture some and making it more modular. For instance, this can be seen and broken up into two parts: an archiver process (which subscribes to information from the daemon and persists it to disk) and an API process (which serves an API format from the on disk database). There are a number of advantages to this method of architecting this, which I will outline below.

Abstraction Reuse

Right now, our goal is to get an API for our wallet. However, we have other long term goals related to archiving information from a node and providing interfaces to query that information. We want to provide services for long term storage of network activity (for monitoring, analysis, block exploring, etc...), and we want to build something for providing receipts as well. The archiver process can be thought of generically among all these use cases, and can easily be made configurable through eviction rules and data filtering. For instance, for a long term archiver, there would be no eviction rules, and no data would be filtered. For the wallet API, the eviction rules would be such that it evicts information that is invalidated and not important to the wallet user, and would filter only information relevant to any wallet keys it knows about.

Community Control

Under this model, where the API server is separate and only serving information from a persistent database, anyone can make an API in any format surfacing the information in our wallet. They can even directly access the underlying DB if they wish to. This is great for our community members because, even though GraphQL is popular and I think we should focus on providing a GraphQL API early on, different developers may have different needs or want. This also creates a great avenue for easy work that new first time devs can do, so that's a plus as well.

Incremental Implementability

With this separation, we can focus on implementing and testing the archiver first and individually. If we want, we can even write applications against this early on to test this lower level layer of our data architecture. If the underlying DB we use is SQL based, then we can even just use a tool like https://github.com/rexxars/sql-to-graphql to generate a GraphQL database for us from the SQL schema we have, meaning we get a GraphQL API for free. If we want more custom queries, we can always implement something ourselves, but using a generated server as a basis will probably make this a lot easier. We can just make a dead simple node app that extends a pregenerated GraphQL to SQL server with some extra functionality, making the code for the custom GraphQL API very small.

Integration with Existing Abstractions

This RFC does not discuss much detail around how the "client process" receives information from the daemon process. In particular, I think it the "client process" should receive information from the daemon through a transition frontier extension, similar to how the transition frontier persistence works. This is hinted at the in document, but I would appreciate a more explicit design. Should it share an extension with persistence? Should it live behind a diff buffer, similar to persistence? What are the flushing rules for that buffer? Do they match the persistence rules, or do we need two diff buffers in memory? etc...

API

Subscriptions

A use case for our GraphQL API that has been discussed in the past has been subscriptions. Do you intend to support that under you model? That decision could have ramifications on my suggestion to split the archiver from the API server.

Database Considerations

This is mentioned in one of my comments (we should have a discussion on this there instead of on this comment thread), but I think you should make a stronger argument for the SQLite choice. I personally think we should choose a more robust and performant SQL DB, such as PostgresSQL or MariaDB/MySQL.

Other criticism

Most of my other questions and criticism for the API are covered in individual comments on the RFC document.

Etc Nitpicks

  • "client process" is a rather generic name that I find confusing (client is very overloaded); consider other names such as "wallet API process"
  • you usually use the word "client" where what I think you mean is "server clients talk to"
  • it should be made clear that all of this is optional to run and should not be required as part of running a coda daemon since this is not necessarily following the O(1) data growth rule
state_hash: string;
creator: Public_key.Compressed.t
protocol_state: Protocol_state.t [not null]
is_snarked bool [not null]

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

This field does not make sense to me. If you are referring to the protocol state being snarked, all of them have already been snarked since you need to snark it before you submit the external transition for it. If you are referring to the transactions in an external transition being included in a ledger proof that makes it into the blockchain proof, then this field still doesn't make sense because one set of transactions from a block is not necessarily included in the same proof (some txns may be proven from the external transition while others are not).

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 2, 2019

Contributor

FWIW, you could attempt to quantify transactions into "proof batches" based on how they are added to the staged ledger diff, and keep track of the current height of "completed proof batches" based on proofs that have been included into the blockchain proof via an emitted ledger proof. That would help you to easily determine whether a transaction has been proven or not in a simple O(1) queryable way.

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Aug 2, 2019

Author Contributor

Yes, I meant snarked transactions (i.e. the transactions are snarked when a ledger proof is emitted). Sorry for the confusion.

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 2, 2019

Contributor

In that case, you will need to change this structure to something like what I described above since the transactions in a transition are not necessarily snarked in the same proof.

}
```

Additionally, clients should be able to know which `external_transition`s contains a transaction. These transitions can give us useful information about a transaction, like the number of block confirmations that is on top of a certain transaction and the "status" of a transaction. Therefore, a client should be able to store a record of these interested external_transitions and look them up easily in a container. Here is the type of an external_transition:

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

What is your plan for tracking/querying block confirmations in O(1)?

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 2, 2019

Contributor

To be more specific here, I see that you are tracking consensus status on each transition, but how do you plan on updating that consensus status without paying at least O(k) on every transition you add to the database?

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Aug 2, 2019

Author Contributor

I forgot to mention that. Thanks for pointing that out. Every time a block is added onto the transition_frontier, each ancestor block will get their block confirmation number incrementally. This block confirmation number will be added as a mutable field to the client breadcrumb.

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 2, 2019

Contributor

Does that mean that you are paying O(n) for the entire chain every time you add a transition? That won't scale.

| Failed (* frontier removed transition and never reached finality *)
| Confirmed (* Transition reached finality by passing the root *)
| Pending of int (* Transition is somewhere in the frontier and has a block confirmation number *)
| Unknown (* Could not compute status of transaction i.e. We don't have a record of it *)

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

I don't understand this unknown state. Also, I think you mean external transition here, not transaction.

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Aug 2, 2019

Author Contributor

Yes, I mean transition. Unknown means that we do not know the state of the transition. We can offline for a long time and then bootstrap. When we bootstrap, we do not know the states of the transitions in our transition_frontier. So, their status is set as Unkown.

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 2, 2019

Contributor

I see, so it's for transitions that you can't determine whether they are connected to the current chain or not?

protocol_state: Protocol_state.t [not null]
is_snarked bool [not null]
status consensus_status [not null]
time time [not null]

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

Is this time received? We already have time created on the blockchain state object.

module External_transition = struct
state_hash: string;
creator: Public_key.Compressed.t

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

This is not currently accessible without changes to the protocol, afaik. (doesn't mean we shouldn't add it, just that we should specify it as a dependency of this work)

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
| Unknown
```

We also have the `receipt_chain_hash` object and is used to prove a `user_command` made it into a block. It does this by forming a Merkle list of `user_command`s from some `proving_receipt` to a `resulting_receipt`. The `resulting_receipt` is typically the `receipt_chain_hash` of a user's account on the best tip ledger. More info about `receipt_chain_hash` can be found on this [RFC](rfcs/0006-receipt-chain-proving.md) and this [OCaml file](../src/lib/receipt_chain_database_lib/intf.ml).

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

This is just a minor context nit, but it's a little confusing to describe the resulting receipt as "typically the receipt_chain_hash of a user's account on the best tip ledger" because it may exist in different ledgers that were all the best tip ledger at one point of a local proposer's node.

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Aug 2, 2019

Author Contributor

Yes, that was a mistake that accidentally wrote. I meant to say it is computed from some ledger.

end
```

These containers should also have performant pagination queries, which is essentially a single contiguous range of sorted elements. `external_transitions` and `transactions` should be ordered by block length. Previously, we sorted these objects by time. Block length can arguably be a better ordering for transitions because clients can receive transitions at different times, but what remains constant is the block length that the transition appears in the blockchain. Additionally, we can sort transactions by the minimum block length of the `external_transition`s that they appear in. Typically, a client would expect to receive a transaction at the soonest block, which is the block with the smallest block length. If they do not appear in an `external_transition`, they will not appear in the sorting. A drawback with sorting `external_transitions` and transaction based on date is that if the client wants to add an arbitrary transaction and `external_transition` and persist them, then they would not have the extra burden of tagging these objects with a timestamp for pagination. Block length alleviates this use.

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

I'm not sure if I understand how you want to order transactions for the pagination. Transactions can be included in multiple block heights since they can be included in different forks. Would you analyze the current best chain and show transactions in order of their included block height only in relation to members of that chain? (could be tricky to compute the in a non-expensive way).

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

Should we consider supporting multiple orderings for external transitions and transactions rather than optimizing for both? We should be able to achieve good performance on multiple types of pagination queries by just using some creatives indices in our database schema. I just wouldn't want to presume too much about how developers want to surface this information in their applications. Ideally, it would be nice to support arbitrary ordering, but I'm pretty sure that's impossible to do performantly, so we still need to chose what queries we want to support.


This section will discuss a proposal to make the querying done on a different process, which will be called the client process, and it will take advantage of in-memory data structures as well as persistent databases.

A persistent database that we would like to introduce into our codebase is `SQLite`. `SQLite` would be the best relational database for our use case because it has an Apache 2 license and its a very small binary (3.7MB on my Macbook). `SQLite` has a considerably good amount of support for OCaml compared to the other databases that will be discussed and it would be relatively easy to get `SQLite` working off. There is even a nice OCaml library called [ocaml-sqlexpr](https://github.com/mfp/ocaml-sqlexpr) that enables us to embed type-safe `SQLite` queries onto our OCaml code.

This comment has been minimized.

Copy link
@nholland94

nholland94 Aug 1, 2019

Contributor

Why are other databases not an option? I don't think SQLite is a terrible choice, but I don't find the argument for why we should do it over something like PostgreSQL that convincing. SQLite is more commonly chosen in circumstances where you want to embed a DB into an application that you want to distribute widely (for instance, including SQLite in browsers or IOS, where you can't get a heavyweight DB locally). But there is a very significant performance cost to choosing SQLite, and in our case it seems like performance is a priority. The small size of SQLite does not seem like a major advantage for us, and the licensing for DBs we don't embed doesn't actually matter since we won't be compiling or distributing it, so I don't see the benefit it gives us. I would like to see some more details arguing for why SQLite is worth it.

This comment has been minimized.

Copy link
@bkase

bkase Aug 2, 2019

Contributor

I think the original idea around SQLite was thinking that we would have this in process or at least part of the distributable we ship of coda.

Given that this will live out of process and out of that distributable, you’re right that we should consider other databases

wu-s-john added 3 commits Aug 22, 2019
@bkase
bkase approved these changes Aug 26, 2019
@@ -0,0 +1,4470 @@
<!doctype html>

This comment has been minimized.

Copy link
@bkase

bkase Aug 26, 2019

Contributor

nit: added_time should be addedTime

input AddPaymentReceiptInput {
# Time that a payment gets added to another clients transaction database
# (stringified Unix time - number of milliseconds since January 1, 1970)
added_time: String!

This comment has been minimized.

Copy link
@bkase

bkase Aug 26, 2019

Contributor

here


The relational databases that we should use should be agnostic. There are different use cases for using a light database, such as SQLite, and heavier database, such as Postgres. Postgres should be used when we have an archive node or a block explorer trying to make efficient writes to store everything that it hears from the network. We would use SQLite when a user is downloading a wallet for the first time and only stores information about their wallets and their friends. SQLite would also be good to use if we are running a light version of Coda and they would only want to store a small amount of data. If a client wants to upgrade the database from SQLite to Postgres, then use a migration tool to support this.
If a client wants to upgrade the database from SQLite to Postgres, then a migration tool would support this.

This comment has been minimized.

Copy link
@bkase

bkase Sep 5, 2019

Contributor

Don't think this is necessary to commit to building -- though maybe would be a nice grant


The yellow node represent the entire component that the node is in.
We would use SQLite when a user is downloading a wallet for the first time and only stores information about their wallets and their friends. SQLite would also be good to use if we are running a light version of Coda and they would only want to store a small amount of data. The downside of using SQLite is that subscriptions are not offered. To circumvent this issue, we can use a polling system that gets data in a short amount of time (about 5 seconds). We can also create a wrapper process that enables clients to write subscriptions on mutations in SQLite.

This comment has been minimized.

Copy link
@bkase

bkase Sep 5, 2019

Contributor

It's probably not worth committing to the choice of sqlite over postgres for the wallet gui here either -- it won't affect the design choices we make for this system and I think we still need further discussion there

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

Whenever the daemon produces a new `Transition_frontier_diff`, it will batch them into the buffer, `transition_frontier_diff_pipe`. The buffer will have a max size. Once the size of the buffer exceeds the max size, it will flush the diffs to the archive process. To make sure that the archive process applies the diff with a low amount of latency, the buffer will flush the diffs in a short period amount of time (like 5 seconds). There will be a similar buffering process for the `transaction_pool`.
The archive process will then filter transactions in the `Transaction_pool_diff` and `Transition_frontier.Breadcrumb_added` based on what a client is interested in following. It will also compute the ids of each of these transactions, which are hashes.

Below is psuedocode for persisting a transaction when the archive process receives a `Transaction_pool_diff`:

This comment has been minimized.

Copy link
@bkase

bkase Sep 5, 2019

Contributor

I don't think you need to update the pseudocode, but the following is probably worth mentioning -- since the sql server we'll be interacting with is out-of-process all of the calls will be through Deferreds right? And for consistency's sake, we'll want to surround the sql actions in sql transactions to avoid bad db states (like around 339-343)

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 5, 2019

Author Contributor

Right, all of the reads and writes will be defers. All the SQL actions will be SQL transactions. After performing all the commands for the transaction, the entire transaction will be committed to the database

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 5, 2019

Author Contributor

Around 339-343, the transactions are committed in the end to avoid consistency issues.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

#### Client Queries
The API process will also make subscription to changes in the daemon and changes in the database. The various subscriptions will be discussed in a later part of this RFC.

This comment has been minimized.

Copy link
@bkase

bkase Sep 5, 2019

Contributor

I think we wanted to avoid subscriptions directly to the daemon process if possible

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 5, 2019

Author Contributor

The only thing that we need to subscribe from the daemon is it's sync status. The daemon can send updates of its sync status into the archive node and the archive node can write it into a table.

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 5, 2019

Contributor

Daemon subscriptions are fine as long as we are carefully when we scale the API wrt how many subscriptions will be open to the daemon process. @Schmavery had a good point about doing GraphQL subscription proxying to API servers to keep this cheap from the deamon. We should still be careful about what we put subscriptions on, i.e. only put subscriptions on things that update semi-infrequently. Sync status is a good example of something that matches this criteria.

As mentioned in the [Running Bootstrap section](#bootstrap), we would not be able to fully determine the `consensus_status` of the block that were in the transition_frontier. Worse, we would not be able to determine which blocks became snarked.

We can have a third-party archived node tell a client the consensus_state of the blocks. Another option that we have is that we can create a P2P RPC call where peers can give a client a Merkle list proof of a state_hash and all the state_hashes of the blocks on top of the state_hash. The Merkle list should be length `K`, the number of block confirmations to gaurantee finality with a very high probability. If the Merkle list proof is size `K`, then the block should be fully confirmed. Otherwise, the block failed.

This comment has been minimized.

Copy link
@bkase

bkase Sep 5, 2019

Contributor

One thing I just realized we didn't think about is what happens when there is a large reorg or your node goes offline for awhile and comes back later. What would happen to the database? How do we handle this?

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 5, 2019

Author Contributor

When you mean large reorgs, you mean reorgs larger than K, right?

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 5, 2019

Contributor

The database schema should be defined in an incremental fashion such that bootstraps (which I think is what @bkase means by large reorgs) can just be handled as new diffs without reasoning about the old tables and trying to update them. This is to say that all tables should be designed with the concept that should the information required to further update that table be lost, the information in that table is still consistent from the view of the queries we plan on supporting. This is tricky though for some things like block finalization, so some thought needs to be put in to make that work in this context. I think the ability to recognize "UNKNOWN" or "LOST" states without rewriting old historical table data is important. Just something to think about wrt to our SQL schema.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
transactions
WHERE status = ADDED AND to = $RECEIVER
```
https://www.notion.so/codaprotocol/Proposed-Graphql-Commands-57a434c59bf24b649d8df7b10befa2a0

This comment has been minimized.

Copy link
@bkase

bkase Sep 6, 2019

Contributor

notion is private, we want this discussion to be public -- can you print this as a pdf and attach the pdf to this PR if you want to avoid the markdown

This comment has been minimized.

Copy link
@bkase

bkase Sep 6, 2019

Contributor

Comments on the notion:

  1. Doesn't blocks query need to be on the database not on the wallet?
  2. followUpdate doesn't this have to be where the filter lives? Doesn't the filter live on the daemon?

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 6, 2019

Author Contributor

notion is private, we want this discussion to be public -- can you print this as a pdf and attach the pdf to this PR if you want to avoid the markdown

Planning to do this. I just wanted to get my rough draft looked at before I share it to the public.

This comment has been minimized.

Copy link
@Schmavery

Schmavery Sep 6, 2019

Contributor

We should probably discuss at some point whether we want to be using a bunch of unions in simple status queries and how this will affect the queries on the consumer's end. For now, it seems like none of these changes are necessary in order for the scope of this rfc (storing client data) so we don't need to discuss it now necessarily.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
Copy link
Contributor

Schmavery left a comment

As a takeaway of our meeting today this seems mostly ok to me, though I think it should concern itself more with the design of the archive node than the requirements of scaling the api since it's unclear what the plans should be there long term anyway (whereas we seem to clearly understand the requirements of an archive node). This might mean we can also put off implementing the filter logic for now, though the approach seems fine to me if some day we find we need such functionality.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

## Requirements

__NOTE: The types presented in OCaml code are disk representations of the types__

This comment has been minimized.

Copy link
@Schmavery

Schmavery Sep 6, 2019

Contributor

I'm probably missing something, but I don't know why this section of ocaml types exists in the context of this rfc. What are these data structures supposed to be used for?

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 10, 2019

Author Contributor

The types are used to give a better illustration of the fields of each type of data that we are storing.

}
// Consensus_status is an ADT, but with it's small state spaces, it can be coerced into an integer to make it easy to store it in a database. Here are the following states:
// - pending : The block is still in the transition frontier. Value[0, (k-1)]

This comment has been minimized.

Copy link
@Schmavery

Schmavery Sep 6, 2019

Contributor

I guess this means that every node in the transition frontier get updated whenever we add to the best tip / get a new root? Is Value=0 the root or the leaf? What do you picture this update query looking like?

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 10, 2019

Author Contributor

Yes, exactly. The value represents number of block confirmations. What do you mean what does this query look like?


### Deleting data

For this design, we are assuming that objects will never get deleted because the client will only be following transactions involving a small number of public keys. These transactions cannot be deleted because they are components used to prove the existence of transaction. We could delete blocks that failed to reach finality and the transaction involving them. However, the concept is outside of the scope of this RFC.

This comment has been minimized.

Copy link
@Schmavery

Schmavery Sep 6, 2019

Contributor

It doesn't sound like this assumption that the client will only be following transactions involving a small number of public keys makes sense, since this is for an archive node. Is some form of "garbage collection" of "failed" blocks in scope in that case or do we expect the size of that data to be comparatively small?

I don't understand "These transactions cannot be deleted because they are components used to prove the existence of transaction."

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
transactions
WHERE status = ADDED AND to = $RECEIVER
```
https://www.notion.so/codaprotocol/Proposed-Graphql-Commands-57a434c59bf24b649d8df7b10befa2a0

This comment has been minimized.

Copy link
@Schmavery

Schmavery Sep 6, 2019

Contributor

We should probably discuss at some point whether we want to be using a bunch of unions in simple status queries and how this will affect the queries on the consumer's end. For now, it seems like none of these changes are necessary in order for the scope of this rfc (storing client data) so we don't need to discuss it now necessarily.


A drawback with sorting blocks and transaction based on the time that they are first seen is that if the client wants to store an arbitrary transaction and block, then they would not have the extra burden of tagging these objects with a timestamp for pagination. `block_compare` alleviates this issue.

## SQL Database

This comment has been minimized.

Copy link
@Schmavery

Schmavery Sep 6, 2019

Contributor

Is the general idea here that the current transition frontier data lives alongside historical data? Are we worried about any performance implications there? (since the access/update patterns would presumably be very different between the two)

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 10, 2019

Author Contributor

No. I am advocating for sorting transactions based on block_compare because a transaction can be received ambiguously at different times. So if we order transactions by first_received, different people can receive inconsistent results if they have the same set of transactions.

- The current implementation takes a lot of work and it is quite limited for a certain use case.
- Other alternatives are discussed in the previous commits of this [RFC](https://github.com/CodaProtocol/coda/pull/2901)
- In an ideal world where we have an infinite amount of time, we would implement our own native graph database leverage the `transition_frontier` as an in-memory cache. Namely, we would have all the blocks in the `transition_frontier` to be in the in-memory cache along with nodes that have a 1 to 2 degree of seperation

This comment has been minimized.

Copy link
@Schmavery

Schmavery Sep 6, 2019

Contributor

I feel like another alternative is integrating the persistence system with the archival system in some way, or at least describing how they're related since they both involve saving transition frontier information to disk.

I think one or two reasons have been mentioned in person why this shouldn't be the solution we take but I always forget so it could be nice to call it out.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

```ocaml
module Transition_frontier_diff = struct
type t =
| Breadcrumb_added of {block: (block, State_hash.t) With_hash.t; sender_receipt_chains_from_parent_ledger: Receipt_chain_hash.t Public_key.Compressed.Map.t}
| Root_transitioned: of {new_: state_hash; garbage: state_hash.t list}
| Bootstrap of {lost_blocks : state_hash list}

This comment has been minimized.

Copy link
@bkase

bkase Sep 7, 2019

Contributor

Do you know which blocks are lost in a bootstrap? What if you Ctrl-Ced coda and then restarted it later? What is ensuring the view of the transition frontier is consistent between the SQL database and the in memory daemon?

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 10, 2019

Author Contributor

Yes, we would know which blocks are lost during bootstrapping. Responded around line #333

end
```

`Breadcrumb_added` has the field `sender_receipt_chains_from_parent_ledger` because we will use the previous `receipt_chain_hashes` from senders involved in sending a `user_command` in the new added breadcrumb to compute the `receipt_chain_hashes` of the transactions that appear in the breadcrumb. More of this will be discussed later.

The `transaction_pool` diffs will essentially list the transactions that got added and removed from the `transaction_pool` after an operation has been done onto it
Whenever a node is bootstrapping, it will produce a diff containing the `state_hash` of all the blocks it has. These blocks would be lost and their consensus state in the database would be labeled as UNKNOWN.

This comment has been minimized.

Copy link
@bkase

bkase Sep 7, 2019

Contributor

see above

Copy link
Contributor

mrmr1993 left a comment

Nice. Left a few nits as comments.


# Summary

The Coda protocol contains many different types of objects (blocks, transactions, accounts) and these objects have relationships with each other. A client communicating with the protocol would often make queries involving the relationship of these objects (i.e. Find the 10 most recent transactions that Alice sent after sending transactions, TXN). This RFC will discuss how we can take optionally advantage of these relationships by using relational databases to answer these queries. Note that the proposed architecture does not limit a node to one particular database. On the contrary, it moves the current existing database storages that we have and move them to an architecutre that allows a user to optionally use a database and change the database.

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

s/architecutre/architecture


The Coda protocol contains many different types of objects (blocks, transactions, accounts) and these objects have relationships with each other. A client communicating with the protocol would often make queries involving the relationship of these objects (i.e. Find the 10 most recent transactions that Alice sent after sending transactions, TXN). This RFC will discuss how we can take optionally advantage of these relationships by using relational databases to answer these queries. Note that the proposed architecture does not limit a node to one particular database. On the contrary, it moves the current existing database storages that we have and move them to an architecutre that allows a user to optionally use a database and change the database.

This will accelerate our process in writing concise and maintainable queries that are extremely performant. We will further more discuss the data flow that connects the daemon to these optional storage components and make efficient queries. This architecture entails ripping out the API and archive related work into separate processes. By keeping the API separate, it can be scaled in load balanced, as can the database, without having to scale the number of nodes that are speaking to the archiver (unless you want higher network consistency). We will also discuss the pros and cons of using this new design.

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

I think that scaled in load balanced should probably be scaled and load balanced


# Motivation

The primary database that we use to store transactions and blocks is Rocksdb. When we load the Rocksdb databases, we read all of the key-value pairs and load them into an in-memory cache that is designed for making fast in-memory reads and pagination queries. The downside of this is that this design limits us from making complex queries. Namely, we make a pagination query that is sent from or received by a public key, but not strictly one or the other. Additionally, the indexes for the current design is limited to the keys of the database. This enables us to answer queries about which transactions are in an block but not the other way around (i.e. for a given transaction, which blocks have the transaction). We can put this auxiliary information that tells us which blocks contain a certain transaction in the value of the transaction database. However, this results in verbose code and extra invariants to take care, leading to potential bugs. Relational databases are good at taking care of these type of details. Relational databases can give us more indexes and they can grant us more flexible and fast queries with less code.

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

take care -> take care of

| Unknown (* Could not compute status of block i.e. This can happen after we bootstrap and we do not have a `transition_frontier` anymore *)
```

Notice that `PENDING` in `consensus_status` is the number of block confirmations for a block. This position has nothing to do with a block being snarked or not, so `Snarked` is not a constructor of this variant.

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

s/PENDING/Pending

| Unknown (* Could not compute status of block i.e. This can happen after we bootstrap and we do not have a `transition_frontier` anymore *)
```

Notice that `PENDING` in `consensus_status` is the number of block confirmations for a block. This position has nothing to do with a block being snarked or not, so `Snarked` is not a constructor of this variant.

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

I think this needs to be tweaked to say that Pending carries the number of confirmations, instead of is that number.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

For adding arbitrary transactions, the client could simply add it to the `transactions` database. We can also include the previous receipt chain hashes involving that transaction and use these components and add them to the `receipt_chain` database. The user can add arbitrary blocks along with their transactions into the `blocks_to_transactions` table, `block` and `transactions` database. When we add a block whose block length is greater than root length, the block will be added through the `transition_frontier_controller`. Namely, the block will be added to the `network_transition_writer` pipe in `Coda_networking.ml`.

<a href="bootstrap"></a>

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

I think this should be <a id="bootstrap"></a> for an internal link to work.


Whenever the daemon produces a new `Transition_frontier_diff`, it will batch the transactions into the buffer, `transition_frontier_diff_pipe`. The buffer will have a max size. Once the size of the buffer exceeds the max size, it will flush the diffs to the archive process. To make sure that the archive process applies the diff with a low amount of latency, the buffer will flush the diffs in a short interval (like every 5 seconds). There will be a similar buffering process for the `transaction_pool`. The `transaction_pool` diffs can be seen as a monoid and it would be trivial to concate them into one diff. This would send less memory to the archive process and ultimately reduce the amount of writes that have to hit to the databases.

It is worthy to note that whenever the archive process makes a write to the databases using deffereds. The archive process will write the diffs it receives in a sequential manner and will not context switch to another task. This will prevent any consistency issues.

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

The first sentence is a little confusing. Should it be read as

It is worth noting that the archive process writes to the databases using `Deferred.t`s.

or some other way?

Sql.commit sql
```

We can also use `RPC_parallel` to ensure that the daemon will die if the archive process will die and vice versa using RPC parallel.

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

Redundant using RPC parallel

# Rationale and Alternatives

- Makes it easy to perform difficult queries for our protocol
- Makes the client code more module

This comment has been minimized.

Copy link
@mrmr1993

mrmr1993 Sep 9, 2019

Contributor

s/module/modular

Copy link
Contributor

nholland94 left a comment

I did a quick review of the SQL schema. I think there is some more thought that needs to be put into the relational model of these tables and to the optimization of the queries we want to perform. The current schema is very wasteful of space and duplicates a lot of information. It also doesn't use SQL ids for referencing tables, which destroys your relational query performance.

staged_ledger_hash string [not null]
ledger_hash string [not null]
epoch int [not null]
slot int [not null]

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

In the interest of making queries easier, we should probably turn these two fields into a global slot. It's arbitrary to convert a global slot back into a slot and epoch if you know what k is.

ledger_proof_nonce int [not null]
status int [not null]
block_length int [not null]
block_time date [not null]

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

This should be called time received. The actual block time can be computed from the epoch:slot of the block.

Also, might want to consider using a Unix timestamp instead of a date. Dates are rather messy in SQL, and not all SQL databases treat them the same. Often times comparisons on dates in SQL databases are much less efficient as dates are stored with compactness and generality in mind. Just storing a Unix epoch timestamp in UTC will make queries easy and consistent across databases.

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 13, 2019

Author Contributor

Okay sure, we just standardize all of dates as unsigned64 integers. In most databases, the type will be a bit string with a length of 64.

is_delegation bool [not null]
nonce int64 [not null]
sender string [not null]
receiver string [not null]

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

Sender and receiver should probably be their own table with ids rather than a string with an index. Representing it this way is rather wasteful data wise. Public keys are not small.

fee int64 [not null]
memo string [not null]
first_seen date
transaction_pool_membership bool [not null]

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

If you want to track whether or not it's in the transaction pool, you need to handle updating old members of the transaction pool whenever bootstrap or restart occurs. I would recommend either choosing to not track this an infer it from the state of the blocks it is included wrt finalization, or coming up with a more complex way to track this so it can be easily updated in the cases I mentioned before.

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 13, 2019

Author Contributor

We will handle the tracking of whenever a node bootstraps or restarts. We need to have the transaction_pool_membership because we need a way to track whether a transaction has a scheduled status or not. I don't there is an easy way to do that...

memo string [not null]
first_seen date
transaction_pool_membership bool [not null]
is_added_manually bool [not null]

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

What does this mean?

fee int
work_id int64 // the list of work_ids are coerced to a single int64 work_id because the list will hold at most two jobs. There will be two booleans representing the number of jobs are in the `snark_job`
has_one_job bool
has_two_jobs bool

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

Why not just num_jobs? I know it can only be two or one, but having two xor fields is messy and can be screwed up. If you want to keep the possible states compact, just use one bool to determine which of the two states you are in.

Table snark_job {
prover int
fee int
work_id int64 // the list of work_ids are coerced to a single int64 work_id because the list will hold at most two jobs. There will be two booleans representing the number of jobs are in the `snark_job`

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

ids are guaranteed to only be 32 bits?

state_hash string [ref: > block.state_hash]
work_id int64 [ref: > snark_job.work_id]
has_one_job bool
has_two_jobs bool

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

Once again, shouldn't duplicate fields on join tables.

Table receipt_chain_hashes {
hash string [pk]
previous_hash string
user_command_id string [not null, ref: > user_commands.id]

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 13, 2019

Contributor

You should connect the receipt_chain_hashes together relationally to make querying easier. At minimum, you need to index these two string fields to keep lookups fast.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
wu-s-john added 6 commits Sep 13, 2019
Copy link
Contributor

nholland94 left a comment

SQL looks better now, still have a few minor suggestions, but nothing too major. Would like to have a slightly extended conversation wrt the fee transfers table though.

}
}
Table fee_transfers {

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 19, 2019

Contributor

In general, I think it's more important to optimize for your relational model, not the fields you store. Having an extra NULL field on a table is not a great reason to make an extra relationship when the tables are otherwise related in the same fashion. I would still suggest consolidating the fee transfers and user commands. I buy the argument for coinbases though since there is only ever 1 coinbase per block and it's associated with the creator of that block.

Table block_to_user_commands {
block_id int [not null, ref: > blocks.id]
transaction_id int [not null, ref: > user_commands.id]
receipt_chain_id int [ref: > receipt_chain_hashes.id]

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 19, 2019

Contributor

This model seems to imply that you will always have a block for an associated receipt chain, but I don't think that's actually true. We are planning on implementing the ability to go out and fetch receipt chains for blocks you missed, and we wouldn't want the requirement to be that you have to download the whole block. I think a trinary join table is not the right way to model this. Instead, make one binary join table between blocks and transactions, and another binary join table between transactions and receipt chains. This has an added advantage that queries don't need to be aware of the NULLability of the receipt chain field in the join table, which generally makes writing complex joins painful.

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 25, 2019

Author Contributor

Sorry if this is not clear, but the model is trying to state that each receipt chain is associated with a block and a transaction. A transaction can technically appear in multiple blocks, but each of these blocks can produce a different receipt chain hash. This is namely because the previous receipt chain hash would be different for each block. I think this tri-association is necessary because of this reason.

I would prefer having this tri-association but we can also split this table into two tables. One table is a join between blocks and transactions where each association has an id. Then, we can have a join table between the blocks-transactions join table and receipt chain table. The latter seems more complicated and I would prefer not to implement that.

Indexes {
block_id [name:"block_to_user_command.block"]
transaction_id [name: "block_to_user_command.transaction"]
}

This comment has been minimized.

Copy link
@nholland94

nholland94 Sep 19, 2019

Contributor

IMO, we should not specify indices for our join tables yet. Indices on join tables are not free (as all indices are not), and the way you want to set them up on join tables is very dependent on the queries you are planning on doing on that table. Indices are always easy to add later. I generally err on the side of defining very few indices initially and adding them to optimize later. It's easier to identify which indices you need later than it is to identify which ones you don't.

This comment has been minimized.

Copy link
@wu-s-john

wu-s-john Sep 25, 2019

Author Contributor

I can omit the indices but I am highly confident that we would make a lot queries that uses these join tables as look ups, like the blocks query to get the transactions in a block. If we do not have these indices for the join tables, these lookups are going to be slow.

Usually indices will impact write performance. However, our system does not require us to make a lot of writes. The write transaction that has the most amount of latency is writing a block. When we launch mainnet, we would probably receive at most 30 blocks per slot and the slot time would be at the minimum 30 seconds (given that we optimize the heck out of the protocol). Writing a block transaction to the database would only be milliseconds with this database. I don't think these indices on the join tables would kill performance given our system requirements.

@jkrauska jkrauska changed the title RFC for Storing Client Data [RFC] Storing Client Data Nov 4, 2019
@jkrauska jkrauska added the RFC label Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.