Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Storing Client Data #2901

Closed
wants to merge 35 commits into from
Closed

[RFC] Storing Client Data #2901

wants to merge 35 commits into from

Conversation

wu-s-john
Copy link
Contributor

Checklist:

  • Tests were added for the new behavior
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? List them:

Closes #0000
Closes #0000

@enolan
Copy link
Contributor

enolan commented Jul 18, 2019

looking

@psteckler
Copy link
Member

I wonder if this should live outside the client, because it adds complexity that not all users will want.

@bkase
Copy link
Member

bkase commented Jul 21, 2019

This is a good idea @psteckler ! It would force us to expose a powerful enough API to build this thing as well

@psteckler
Copy link
Member

This is a good idea @psteckler ! It would force us to expose a powerful enough API to build this thing as well

In fact, I wonder if the current executable should be split into codad and coda-cli. There could be an API in the daemon for loading plugins, like one to store client data. That way, the daemon proper could be narrowly focused on enforcing the protocol.

@wu-s-john
Copy link
Contributor Author

wu-s-john commented Jul 22, 2019

This is a good idea @psteckler ! It would force us to expose a powerful enough API to build this thing as well

In fact, I wonder if the current executable should be split into codad and coda-cli. There could be an API in the daemon for loading plugins, like one to store client data. That way, the daemon proper could be narrowly focused on enforcing the protocol.

I agree. We should have a different process to do client queries and storing data. However, it makes the code more complicated. Namely, it's nice to do these queries on the same process because we do queries on the in-memory transition frontier and transaction pool. I am thinking about how to overcome these issues and other tradeoffs.

@psteckler
Copy link
Member

it's nice to do these queries on the same process

Right, that's a reason a plugin API might be helpful here.

@nholland94
Copy link
Member

+1 for this living outside

| Unknown (* Could not compute status of transaction i.e. We don't have a record of it *)
```

Notice that `consensus_status` is a block's position in Nakamoto consensus. This position has nothing to do with a block being snarked or not, so `Snarked` is not a constructor of this variant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nakamoto consensus? What?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol. Nakamoto consensus is basically having your consensus state represented as tree, like in Bitcoin and our protocol. Other distributed systems consensus algorithms use some flavor of Byzantine consensus, which is a pointer of the current state of the system and a pointer to the next candidate state.

So in the context of this RFC, Nakamoto consensus represents the node in the consensus tree.

@bkase bkase changed the base branch from master to develop July 30, 2019 19:33
Copy link
Member

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have two different pngs for the client_process diagram. Can you delete client_process.png so that we only have the one that is generated through the makefile command?

Copy link
Member

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few issues with the approach outlined in the RFC, as well as the structure of this RFC for tackling the problem at hand. Firstly, this RFC is too large in scope. There are multiple decisions being outlined and detailed in this RFC, and I think that it will be easier to discuss this if we break this up into two parts. Tentatively, for this comment, I am going to break up my feedback and criticism into two sections: Architecture and API (I believe this would also be the best division for splitting up this RFC into two discussions as well).

Architecture

Premature Optimization/Overengineering

The implementation that is layed out for solving this problem is extremely complex. Caching, in particular, is very hard to get correct, even harder to debug, and in the scope of this problem, is very complex. And, at this point in time, we don't actually know if we need caching for this. Who is to say that reading from a db is too slow and that we need caching here? Many APIs and webserver exist and scale without adding in-process caching, and there are many more modular solutions for adding caching in front of database requests that are typically used before considering adding caching to the server itself. For us in particular, this API isn't meant to scale nearly as much as a web API since, under the current model, each node will only have 1 wallet client communicating to it.

I'm not saying the model you have outlined is wrong or won't work, but rather that it's very complex and likely overkill for what you are trying to achieve. It's better to do this kind of thing incrementally: start with a system that just persists to a relational database, and add caching only when you need it and only where you need it. Who knows, maybe non of the stuff you have implemented caching for right now actually needs it, but later, as uses cases change, we will find something new that really needs caching. It will be easier to add that later if we keep it simple now.

Modularity/Generalization of Architecture

The design you have speced out is very specific towards only surfacing a GraphQL API for live queries to the state of a node. However, I would suggest generalizing this architecture some and making it more modular. For instance, this can be seen and broken up into two parts: an archiver process (which subscribes to information from the daemon and persists it to disk) and an API process (which serves an API format from the on disk database). There are a number of advantages to this method of architecting this, which I will outline below.

Abstraction Reuse

Right now, our goal is to get an API for our wallet. However, we have other long term goals related to archiving information from a node and providing interfaces to query that information. We want to provide services for long term storage of network activity (for monitoring, analysis, block exploring, etc...), and we want to build something for providing receipts as well. The archiver process can be thought of generically among all these use cases, and can easily be made configurable through eviction rules and data filtering. For instance, for a long term archiver, there would be no eviction rules, and no data would be filtered. For the wallet API, the eviction rules would be such that it evicts information that is invalidated and not important to the wallet user, and would filter only information relevant to any wallet keys it knows about.

Community Control

Under this model, where the API server is separate and only serving information from a persistent database, anyone can make an API in any format surfacing the information in our wallet. They can even directly access the underlying DB if they wish to. This is great for our community members because, even though GraphQL is popular and I think we should focus on providing a GraphQL API early on, different developers may have different needs or want. This also creates a great avenue for easy work that new first time devs can do, so that's a plus as well.

Incremental Implementability

With this separation, we can focus on implementing and testing the archiver first and individually. If we want, we can even write applications against this early on to test this lower level layer of our data architecture. If the underlying DB we use is SQL based, then we can even just use a tool like https://github.com/rexxars/sql-to-graphql to generate a GraphQL database for us from the SQL schema we have, meaning we get a GraphQL API for free. If we want more custom queries, we can always implement something ourselves, but using a generated server as a basis will probably make this a lot easier. We can just make a dead simple node app that extends a pregenerated GraphQL to SQL server with some extra functionality, making the code for the custom GraphQL API very small.

Integration with Existing Abstractions

This RFC does not discuss much detail around how the "client process" receives information from the daemon process. In particular, I think it the "client process" should receive information from the daemon through a transition frontier extension, similar to how the transition frontier persistence works. This is hinted at the in document, but I would appreciate a more explicit design. Should it share an extension with persistence? Should it live behind a diff buffer, similar to persistence? What are the flushing rules for that buffer? Do they match the persistence rules, or do we need two diff buffers in memory? etc...

API

Subscriptions

A use case for our GraphQL API that has been discussed in the past has been subscriptions. Do you intend to support that under you model? That decision could have ramifications on my suggestion to split the archiver from the API server.

Database Considerations

This is mentioned in one of my comments (we should have a discussion on this there instead of on this comment thread), but I think you should make a stronger argument for the SQLite choice. I personally think we should choose a more robust and performant SQL DB, such as PostgresSQL or MariaDB/MySQL.

Other criticism

Most of my other questions and criticism for the API are covered in individual comments on the RFC document.

Etc Nitpicks

  • "client process" is a rather generic name that I find confusing (client is very overloaded); consider other names such as "wallet API process"
  • you usually use the word "client" where what I think you mean is "server clients talk to"
  • it should be made clear that all of this is optional to run and should not be required as part of running a coda daemon since this is not necessarily following the O(1) data growth rule

state_hash: string;
creator: Public_key.Compressed.t
protocol_state: Protocol_state.t [not null]
is_snarked bool [not null]
Copy link
Member

@nholland94 nholland94 Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field does not make sense to me. If you are referring to the protocol state being snarked, all of them have already been snarked since you need to snark it before you submit the external transition for it. If you are referring to the transactions in an external transition being included in a ledger proof that makes it into the blockchain proof, then this field still doesn't make sense because one set of transactions from a block is not necessarily included in the same proof (some txns may be proven from the external transition while others are not).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you could attempt to quantify transactions into "proof batches" based on how they are added to the staged ledger diff, and keep track of the current height of "completed proof batches" based on proofs that have been included into the blockchain proof via an emitted ledger proof. That would help you to easily determine whether a transaction has been proven or not in a simple O(1) queryable way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant snarked transactions (i.e. the transactions are snarked when a ledger proof is emitted). Sorry for the confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, you will need to change this structure to something like what I described above since the transactions in a transition are not necessarily snarked in the same proof.

}
```

Additionally, clients should be able to know which `external_transition`s contains a transaction. These transitions can give us useful information about a transaction, like the number of block confirmations that is on top of a certain transaction and the "status" of a transaction. Therefore, a client should be able to store a record of these interested external_transitions and look them up easily in a container. Here is the type of an external_transition:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your plan for tracking/querying block confirmations in O(1)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more specific here, I see that you are tracking consensus status on each transition, but how do you plan on updating that consensus status without paying at least O(k) on every transition you add to the database?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention that. Thanks for pointing that out. Every time a block is added onto the transition_frontier, each ancestor block will get their block confirmation number incrementally. This block confirmation number will be added as a mutable field to the client breadcrumb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that you are paying O(n) for the entire chain every time you add a transition? That won't scale.

| Failed (* frontier removed transition and never reached finality *)
| Confirmed (* Transition reached finality by passing the root *)
| Pending of int (* Transition is somewhere in the frontier and has a block confirmation number *)
| Unknown (* Could not compute status of transaction i.e. We don't have a record of it *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this unknown state. Also, I think you mean external transition here, not transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean transition. Unknown means that we do not know the state of the transition. We can offline for a long time and then bootstrap. When we bootstrap, we do not know the states of the transitions in our transition_frontier. So, their status is set as Unkown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so it's for transitions that you can't determine whether they are connected to the current chain or not?

protocol_state: Protocol_state.t [not null]
is_snarked bool [not null]
status consensus_status [not null]
time time [not null]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this time received? We already have time created on the blockchain state object.


module External_transition = struct
state_hash: string;
creator: Public_key.Compressed.t
Copy link
Member

@nholland94 nholland94 Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not currently accessible without changes to the protocol, afaik. (doesn't mean we shouldn't add it, just that we should specify it as a dependency of this work)

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
| Unknown
```

We also have the `receipt_chain_hash` object and is used to prove a `user_command` made it into a block. It does this by forming a Merkle list of `user_command`s from some `proving_receipt` to a `resulting_receipt`. The `resulting_receipt` is typically the `receipt_chain_hash` of a user's account on the best tip ledger. More info about `receipt_chain_hash` can be found on this [RFC](rfcs/0006-receipt-chain-proving.md) and this [OCaml file](../src/lib/receipt_chain_database_lib/intf.ml).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a minor context nit, but it's a little confusing to describe the resulting receipt as "typically the receipt_chain_hash of a user's account on the best tip ledger" because it may exist in different ledgers that were all the best tip ledger at one point of a local proposer's node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was a mistake that accidentally wrote. I meant to say it is computed from some ledger.

end
```

These containers should also have performant pagination queries, which is essentially a single contiguous range of sorted elements. `external_transitions` and `transactions` should be ordered by block length. Previously, we sorted these objects by time. Block length can arguably be a better ordering for transitions because clients can receive transitions at different times, but what remains constant is the block length that the transition appears in the blockchain. Additionally, we can sort transactions by the minimum block length of the `external_transition`s that they appear in. Typically, a client would expect to receive a transaction at the soonest block, which is the block with the smallest block length. If they do not appear in an `external_transition`, they will not appear in the sorting. A drawback with sorting `external_transitions` and transaction based on date is that if the client wants to add an arbitrary transaction and `external_transition` and persist them, then they would not have the extra burden of tagging these objects with a timestamp for pagination. Block length alleviates this use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand how you want to order transactions for the pagination. Transactions can be included in multiple block heights since they can be included in different forks. Would you analyze the current best chain and show transactions in order of their included block height only in relation to members of that chain? (could be tricky to compute the in a non-expensive way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider supporting multiple orderings for external transitions and transactions rather than optimizing for both? We should be able to achieve good performance on multiple types of pagination queries by just using some creatives indices in our database schema. I just wouldn't want to presume too much about how developers want to surface this information in their applications. Ideally, it would be nice to support arbitrary ordering, but I'm pretty sure that's impossible to do performantly, so we still need to chose what queries we want to support.


This section will discuss a proposal to make the querying done on a different process, which will be called the client process, and it will take advantage of in-memory data structures as well as persistent databases.

A persistent database that we would like to introduce into our codebase is `SQLite`. `SQLite` would be the best relational database for our use case because it has an Apache 2 license and its a very small binary (3.7MB on my Macbook). `SQLite` has a considerably good amount of support for OCaml compared to the other databases that will be discussed and it would be relatively easy to get `SQLite` working off. There is even a nice OCaml library called [ocaml-sqlexpr](https://github.com/mfp/ocaml-sqlexpr) that enables us to embed type-safe `SQLite` queries onto our OCaml code.
Copy link
Member

@nholland94 nholland94 Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are other databases not an option? I don't think SQLite is a terrible choice, but I don't find the argument for why we should do it over something like PostgreSQL that convincing. SQLite is more commonly chosen in circumstances where you want to embed a DB into an application that you want to distribute widely (for instance, including SQLite in browsers or IOS, where you can't get a heavyweight DB locally). But there is a very significant performance cost to choosing SQLite, and in our case it seems like performance is a priority. The small size of SQLite does not seem like a major advantage for us, and the licensing for DBs we don't embed doesn't actually matter since we won't be compiling or distributing it, so I don't see the benefit it gives us. I would like to see some more details arguing for why SQLite is worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original idea around SQLite was thinking that we would have this in process or at least part of the distributable we ship of coda.

Given that this will live out of process and out of that distributable, you’re right that we should consider other databases

@@ -0,0 +1,4470 @@
<!doctype html>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: added_time should be addedTime

input AddPaymentReceiptInput {
# Time that a payment gets added to another clients transaction database
# (stringified Unix time - number of milliseconds since January 1, 1970)
added_time: String!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here


The relational databases that we should use should be agnostic. There are different use cases for using a light database, such as SQLite, and heavier database, such as Postgres. Postgres should be used when we have an archive node or a block explorer trying to make efficient writes to store everything that it hears from the network. We would use SQLite when a user is downloading a wallet for the first time and only stores information about their wallets and their friends. SQLite would also be good to use if we are running a light version of Coda and they would only want to store a small amount of data. If a client wants to upgrade the database from SQLite to Postgres, then use a migration tool to support this.
If a client wants to upgrade the database from SQLite to Postgres, then a migration tool would support this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is necessary to commit to building -- though maybe would be a nice grant


The yellow node represent the entire component that the node is in.
We would use SQLite when a user is downloading a wallet for the first time and only stores information about their wallets and their friends. SQLite would also be good to use if we are running a light version of Coda and they would only want to store a small amount of data. The downside of using SQLite is that subscriptions are not offered. To circumvent this issue, we can use a polling system that gets data in a short amount of time (about 5 seconds). We can also create a wrapper process that enables clients to write subscriptions on mutations in SQLite.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not worth committing to the choice of sqlite over postgres for the wallet gui here either -- it won't affect the design choices we make for this system and I think we still need further discussion there

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

Whenever the daemon produces a new `Transition_frontier_diff`, it will batch them into the buffer, `transition_frontier_diff_pipe`. The buffer will have a max size. Once the size of the buffer exceeds the max size, it will flush the diffs to the archive process. To make sure that the archive process applies the diff with a low amount of latency, the buffer will flush the diffs in a short period amount of time (like 5 seconds). There will be a similar buffering process for the `transaction_pool`.
The archive process will then filter transactions in the `Transaction_pool_diff` and `Transition_frontier.Breadcrumb_added` based on what a client is interested in following. It will also compute the ids of each of these transactions, which are hashes.

Below is psuedocode for persisting a transaction when the archive process receives a `Transaction_pool_diff`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to update the pseudocode, but the following is probably worth mentioning -- since the sql server we'll be interacting with is out-of-process all of the calls will be through Deferreds right? And for consistency's sake, we'll want to surround the sql actions in sql transactions to avoid bad db states (like around 339-343)

Copy link
Contributor Author

@wu-s-john wu-s-john Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, all of the reads and writes will be defers. All the SQL actions will be SQL transactions. After performing all the commands for the transaction, the entire transaction will be committed to the database

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Around 339-343, the transactions are committed in the end to avoid consistency issues.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

#### Client Queries
The API process will also make subscription to changes in the daemon and changes in the database. The various subscriptions will be discussed in a later part of this RFC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wanted to avoid subscriptions directly to the daemon process if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that we need to subscribe from the daemon is it's sync status. The daemon can send updates of its sync status into the archive node and the archive node can write it into a table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daemon subscriptions are fine as long as we are carefully when we scale the API wrt how many subscriptions will be open to the daemon process. @Schmavery had a good point about doing GraphQL subscription proxying to API servers to keep this cheap from the deamon. We should still be careful about what we put subscriptions on, i.e. only put subscriptions on things that update semi-infrequently. Sync status is a good example of something that matches this criteria.

As mentioned in the [Running Bootstrap section](#bootstrap), we would not be able to fully determine the `consensus_status` of the block that were in the transition_frontier. Worse, we would not be able to determine which blocks became snarked.

We can have a third-party archived node tell a client the consensus_state of the blocks. Another option that we have is that we can create a P2P RPC call where peers can give a client a Merkle list proof of a state_hash and all the state_hashes of the blocks on top of the state_hash. The Merkle list should be length `K`, the number of block confirmations to gaurantee finality with a very high probability. If the Merkle list proof is size `K`, then the block should be fully confirmed. Otherwise, the block failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I just realized we didn't think about is what happens when there is a large reorg or your node goes offline for awhile and comes back later. What would happen to the database? How do we handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you mean large reorgs, you mean reorgs larger than K, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database schema should be defined in an incremental fashion such that bootstraps (which I think is what @bkase means by large reorgs) can just be handled as new diffs without reasoning about the old tables and trying to update them. This is to say that all tables should be designed with the concept that should the information required to further update that table be lost, the information in that table is still consistent from the view of the queries we plan on supporting. This is tricky though for some things like block finalization, so some thought needs to be put in to make that work in this context. I think the ability to recognize "UNKNOWN" or "LOST" states without rewriting old historical table data is important. Just something to think about wrt to our SQL schema.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
transactions
WHERE status = ADDED AND to = $RECEIVER
```
https://www.notion.so/codaprotocol/Proposed-Graphql-Commands-57a434c59bf24b649d8df7b10befa2a0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notion is private, we want this discussion to be public -- can you print this as a pdf and attach the pdf to this PR if you want to avoid the markdown

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on the notion:

  1. Doesn't blocks query need to be on the database not on the wallet?
  2. followUpdate doesn't this have to be where the filter lives? Doesn't the filter live on the daemon?

Copy link
Contributor Author

@wu-s-john wu-s-john Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notion is private, we want this discussion to be public -- can you print this as a pdf and attach the pdf to this PR if you want to avoid the markdown

Planning to do this. I just wanted to get my rough draft looked at before I share it to the public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably discuss at some point whether we want to be using a bunch of unions in simple status queries and how this will affect the queries on the consumer's end. For now, it seems like none of these changes are necessary in order for the scope of this rfc (storing client data) so we don't need to discuss it now necessarily.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
Copy link
Contributor

@Schmavery Schmavery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a takeaway of our meeting today this seems mostly ok to me, though I think it should concern itself more with the design of the archive node than the requirements of scaling the api since it's unclear what the plans should be there long term anyway (whereas we seem to clearly understand the requirements of an archive node). This might mean we can also put off implementing the filter logic for now, though the approach seems fine to me if some day we find we need such functionality.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

## Requirements

__NOTE: The types presented in OCaml code are disk representations of the types__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but I don't know why this section of ocaml types exists in the context of this rfc. What are these data structures supposed to be used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types are used to give a better illustration of the fields of each type of data that we are storing.

}

// Consensus_status is an ADT, but with it's small state spaces, it can be coerced into an integer to make it easy to store it in a database. Here are the following states:
// - pending : The block is still in the transition frontier. Value[0, (k-1)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this means that every node in the transition frontier get updated whenever we add to the best tip / get a new root? Is Value=0 the root or the leaf? What do you picture this update query looking like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. The value represents number of block confirmations. What do you mean what does this query look like?


### Deleting data

For this design, we are assuming that objects will never get deleted because the client will only be following transactions involving a small number of public keys. These transactions cannot be deleted because they are components used to prove the existence of transaction. We could delete blocks that failed to reach finality and the transaction involving them. However, the concept is outside of the scope of this RFC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't sound like this assumption that the client will only be following transactions involving a small number of public keys makes sense, since this is for an archive node. Is some form of "garbage collection" of "failed" blocks in scope in that case or do we expect the size of that data to be comparatively small?

I don't understand "These transactions cannot be deleted because they are components used to prove the existence of transaction."

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
transactions
WHERE status = ADDED AND to = $RECEIVER
```
https://www.notion.so/codaprotocol/Proposed-Graphql-Commands-57a434c59bf24b649d8df7b10befa2a0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably discuss at some point whether we want to be using a bunch of unions in simple status queries and how this will affect the queries on the consumer's end. For now, it seems like none of these changes are necessary in order for the scope of this rfc (storing client data) so we don't need to discuss it now necessarily.


A drawback with sorting blocks and transaction based on the time that they are first seen is that if the client wants to store an arbitrary transaction and block, then they would not have the extra burden of tagging these objects with a timestamp for pagination. `block_compare` alleviates this issue.

## SQL Database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the general idea here that the current transition frontier data lives alongside historical data? Are we worried about any performance implications there? (since the access/update patterns would presumably be very different between the two)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I am advocating for sorting transactions based on block_compare because a transaction can be received ambiguously at different times. So if we order transactions by first_received, different people can receive inconsistent results if they have the same set of transactions.

- The current implementation takes a lot of work and it is quite limited for a certain use case.
- Other alternatives are discussed in the previous commits of this [RFC](https://github.com/CodaProtocol/coda/pull/2901)
- In an ideal world where we have an infinite amount of time, we would implement our own native graph database leverage the `transition_frontier` as an in-memory cache. Namely, we would have all the blocks in the `transition_frontier` to be in the in-memory cache along with nodes that have a 1 to 2 degree of seperation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like another alternative is integrating the persistence system with the archival system in some way, or at least describing how they're related since they both involve saving transition frontier information to disk.

I think one or two reasons have been mentioned in person why this shouldn't be the solution we take but I always forget so it could be nice to call it out.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

```ocaml
module Transition_frontier_diff = struct
type t =
| Breadcrumb_added of {block: (block, State_hash.t) With_hash.t; sender_receipt_chains_from_parent_ledger: Receipt_chain_hash.t Public_key.Compressed.Map.t}
| Root_transitioned: of {new_: state_hash; garbage: state_hash.t list}
| Bootstrap of {lost_blocks : state_hash list}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know which blocks are lost in a bootstrap? What if you Ctrl-Ced coda and then restarted it later? What is ensuring the view of the transition frontier is consistent between the SQL database and the in memory daemon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we would know which blocks are lost during bootstrapping. Responded around line #333

end
```

`Breadcrumb_added` has the field `sender_receipt_chains_from_parent_ledger` because we will use the previous `receipt_chain_hashes` from senders involved in sending a `user_command` in the new added breadcrumb to compute the `receipt_chain_hashes` of the transactions that appear in the breadcrumb. More of this will be discussed later.

The `transaction_pool` diffs will essentially list the transactions that got added and removed from the `transaction_pool` after an operation has been done onto it
Whenever a node is bootstrapping, it will produce a diff containing the `state_hash` of all the blocks it has. These blocks would be lost and their consensus state in the database would be labeled as UNKNOWN.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Copy link
Member

@mrmr1993 mrmr1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Left a few nits as comments.


# Summary

The Coda protocol contains many different types of objects (blocks, transactions, accounts) and these objects have relationships with each other. A client communicating with the protocol would often make queries involving the relationship of these objects (i.e. Find the 10 most recent transactions that Alice sent after sending transactions, TXN). This RFC will discuss how we can take optionally advantage of these relationships by using relational databases to answer these queries. Note that the proposed architecture does not limit a node to one particular database. On the contrary, it moves the current existing database storages that we have and move them to an architecutre that allows a user to optionally use a database and change the database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/architecutre/architecture


The Coda protocol contains many different types of objects (blocks, transactions, accounts) and these objects have relationships with each other. A client communicating with the protocol would often make queries involving the relationship of these objects (i.e. Find the 10 most recent transactions that Alice sent after sending transactions, TXN). This RFC will discuss how we can take optionally advantage of these relationships by using relational databases to answer these queries. Note that the proposed architecture does not limit a node to one particular database. On the contrary, it moves the current existing database storages that we have and move them to an architecutre that allows a user to optionally use a database and change the database.

This will accelerate our process in writing concise and maintainable queries that are extremely performant. We will further more discuss the data flow that connects the daemon to these optional storage components and make efficient queries. This architecture entails ripping out the API and archive related work into separate processes. By keeping the API separate, it can be scaled in load balanced, as can the database, without having to scale the number of nodes that are speaking to the archiver (unless you want higher network consistency). We will also discuss the pros and cons of using this new design.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that scaled in load balanced should probably be scaled and load balanced


# Motivation

The primary database that we use to store transactions and blocks is Rocksdb. When we load the Rocksdb databases, we read all of the key-value pairs and load them into an in-memory cache that is designed for making fast in-memory reads and pagination queries. The downside of this is that this design limits us from making complex queries. Namely, we make a pagination query that is sent from or received by a public key, but not strictly one or the other. Additionally, the indexes for the current design is limited to the keys of the database. This enables us to answer queries about which transactions are in an block but not the other way around (i.e. for a given transaction, which blocks have the transaction). We can put this auxiliary information that tells us which blocks contain a certain transaction in the value of the transaction database. However, this results in verbose code and extra invariants to take care, leading to potential bugs. Relational databases are good at taking care of these type of details. Relational databases can give us more indexes and they can grant us more flexible and fast queries with less code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take care -> take care of

| Unknown (* Could not compute status of block i.e. This can happen after we bootstrap and we do not have a `transition_frontier` anymore *)
```

Notice that `PENDING` in `consensus_status` is the number of block confirmations for a block. This position has nothing to do with a block being snarked or not, so `Snarked` is not a constructor of this variant.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PENDING/Pending

| Unknown (* Could not compute status of block i.e. This can happen after we bootstrap and we do not have a `transition_frontier` anymore *)
```

Notice that `PENDING` in `consensus_status` is the number of block confirmations for a block. This position has nothing to do with a block being snarked or not, so `Snarked` is not a constructor of this variant.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be tweaked to say that Pending carries the number of confirmations, instead of is that number.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved

For adding arbitrary transactions, the client could simply add it to the `transactions` database. We can also include the previous receipt chain hashes involving that transaction and use these components and add them to the `receipt_chain` database. The user can add arbitrary blocks along with their transactions into the `blocks_to_transactions` table, `block` and `transactions` database. When we add a block whose block length is greater than root length, the block will be added through the `transition_frontier_controller`. Namely, the block will be added to the `network_transition_writer` pipe in `Coda_networking.ml`.

<a href="bootstrap"></a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be <a id="bootstrap"></a> for an internal link to work.


Whenever the daemon produces a new `Transition_frontier_diff`, it will batch the transactions into the buffer, `transition_frontier_diff_pipe`. The buffer will have a max size. Once the size of the buffer exceeds the max size, it will flush the diffs to the archive process. To make sure that the archive process applies the diff with a low amount of latency, the buffer will flush the diffs in a short interval (like every 5 seconds). There will be a similar buffering process for the `transaction_pool`. The `transaction_pool` diffs can be seen as a monoid and it would be trivial to concate them into one diff. This would send less memory to the archive process and ultimately reduce the amount of writes that have to hit to the databases.

It is worthy to note that whenever the archive process makes a write to the databases using deffereds. The archive process will write the diffs it receives in a sequential manner and will not context switch to another task. This will prevent any consistency issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence is a little confusing. Should it be read as

It is worth noting that the archive process writes to the databases using `Deferred.t`s.

or some other way?

Sql.commit sql
```

We can also use `RPC_parallel` to ensure that the daemon will die if the archive process will die and vice versa using RPC parallel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant using RPC parallel

# Rationale and Alternatives

- Makes it easy to perform difficult queries for our protocol
- Makes the client code more module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/module/modular

Copy link
Member

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick review of the SQL schema. I think there is some more thought that needs to be put into the relational model of these tables and to the optimization of the queries we want to perform. The current schema is very wasteful of space and duplicates a lot of information. It also doesn't use SQL ids for referencing tables, which destroys your relational query performance.

staged_ledger_hash string [not null]
ledger_hash string [not null]
epoch int [not null]
slot int [not null]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of making queries easier, we should probably turn these two fields into a global slot. It's arbitrary to convert a global slot back into a slot and epoch if you know what k is.

ledger_proof_nonce int [not null]
status int [not null]
block_length int [not null]
block_time date [not null]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called time received. The actual block time can be computed from the epoch:slot of the block.

Also, might want to consider using a Unix timestamp instead of a date. Dates are rather messy in SQL, and not all SQL databases treat them the same. Often times comparisons on dates in SQL databases are much less efficient as dates are stored with compactness and generality in mind. Just storing a Unix epoch timestamp in UTC will make queries easy and consistent across databases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sure, we just standardize all of dates as unsigned64 integers. In most databases, the type will be a bit string with a length of 64.

is_delegation bool [not null]
nonce int64 [not null]
sender string [not null]
receiver string [not null]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sender and receiver should probably be their own table with ids rather than a string with an index. Representing it this way is rather wasteful data wise. Public keys are not small.

fee int64 [not null]
memo string [not null]
first_seen date
transaction_pool_membership bool [not null]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to track whether or not it's in the transaction pool, you need to handle updating old members of the transaction pool whenever bootstrap or restart occurs. I would recommend either choosing to not track this an infer it from the state of the blocks it is included wrt finalization, or coming up with a more complex way to track this so it can be easily updated in the cases I mentioned before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will handle the tracking of whenever a node bootstraps or restarts. We need to have the transaction_pool_membership because we need a way to track whether a transaction has a scheduled status or not. I don't there is an easy way to do that...

memo string [not null]
first_seen date
transaction_pool_membership bool [not null]
is_added_manually bool [not null]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

fee int
work_id int64 // the list of work_ids are coerced to a single int64 work_id because the list will hold at most two jobs. There will be two booleans representing the number of jobs are in the `snark_job`
has_one_job bool
has_two_jobs bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just num_jobs? I know it can only be two or one, but having two xor fields is messy and can be screwed up. If you want to keep the possible states compact, just use one bool to determine which of the two states you are in.

Table snark_job {
prover int
fee int
work_id int64 // the list of work_ids are coerced to a single int64 work_id because the list will hold at most two jobs. There will be two booleans representing the number of jobs are in the `snark_job`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ids are guaranteed to only be 32 bits?

state_hash string [ref: > block.state_hash]
work_id int64 [ref: > snark_job.work_id]
has_one_job bool
has_two_jobs bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, shouldn't duplicate fields on join tables.

Table receipt_chain_hashes {
hash string [pk]
previous_hash string
user_command_id string [not null, ref: > user_commands.id]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should connect the receipt_chain_hashes together relationally to make querying easier. At minimum, you need to index these two string fields to keep lookups fast.

rfcs/0019-storing-client-data.md Outdated Show resolved Hide resolved
Copy link
Member

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL looks better now, still have a few minor suggestions, but nothing too major. Would like to have a slightly extended conversation wrt the fee transfers table though.

}
}

Table fee_transfers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think it's more important to optimize for your relational model, not the fields you store. Having an extra NULL field on a table is not a great reason to make an extra relationship when the tables are otherwise related in the same fashion. I would still suggest consolidating the fee transfers and user commands. I buy the argument for coinbases though since there is only ever 1 coinbase per block and it's associated with the creator of that block.

Table block_to_user_commands {
block_id int [not null, ref: > blocks.id]
transaction_id int [not null, ref: > user_commands.id]
receipt_chain_id int [ref: > receipt_chain_hashes.id]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model seems to imply that you will always have a block for an associated receipt chain, but I don't think that's actually true. We are planning on implementing the ability to go out and fetch receipt chains for blocks you missed, and we wouldn't want the requirement to be that you have to download the whole block. I think a trinary join table is not the right way to model this. Instead, make one binary join table between blocks and transactions, and another binary join table between transactions and receipt chains. This has an added advantage that queries don't need to be aware of the NULLability of the receipt chain field in the join table, which generally makes writing complex joins painful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this is not clear, but the model is trying to state that each receipt chain is associated with a block and a transaction. A transaction can technically appear in multiple blocks, but each of these blocks can produce a different receipt chain hash. This is namely because the previous receipt chain hash would be different for each block. I think this tri-association is necessary because of this reason.

I would prefer having this tri-association but we can also split this table into two tables. One table is a join between blocks and transactions where each association has an id. Then, we can have a join table between the blocks-transactions join table and receipt chain table. The latter seems more complicated and I would prefer not to implement that.

Indexes {
block_id [name:"block_to_user_command.block"]
transaction_id [name: "block_to_user_command.transaction"]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should not specify indices for our join tables yet. Indices on join tables are not free (as all indices are not), and the way you want to set them up on join tables is very dependent on the queries you are planning on doing on that table. Indices are always easy to add later. I generally err on the side of defining very few indices initially and adding them to optimize later. It's easier to identify which indices you need later than it is to identify which ones you don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can omit the indices but I am highly confident that we would make a lot queries that uses these join tables as look ups, like the blocks query to get the transactions in a block. If we do not have these indices for the join tables, these lookups are going to be slow.

Usually indices will impact write performance. However, our system does not require us to make a lot of writes. The write transaction that has the most amount of latency is writing a block. When we launch mainnet, we would probably receive at most 30 blocks per slot and the slot time would be at the minimum 30 seconds (given that we optimize the heck out of the protocol). Writing a block transaction to the database would only be milliseconds with this database. I don't think these indices on the join tables would kill performance given our system requirements.

@jkrauska jkrauska changed the title RFC for Storing Client Data [RFC] Storing Client Data Nov 4, 2019
@jkrauska jkrauska added the RFC A label for RFCs (request for comments) label Nov 4, 2019
@nholland94
Copy link
Member

We are going to reorganize this discussion and transfer all of the still-relevant information from here to a new RFC at the beginning of next year. As such, going to go ahead and close this specific PR for now to clean things up.

@nholland94 nholland94 closed this Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC A label for RFCs (request for comments)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants