-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adr: add API for retrieving data for a particular namespace only #302
Changes from all commits
319593e
9e6d782
ab1084a
d2939b3
d8b0b1e
582dc9f
8cff586
3879e4f
18f403a
cb6b4e1
916b289
6e6d87b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
# ADR 003: Retrieving Application messages | ||
|
||
## Changelog | ||
|
||
- 2021-04-25: initial draft | ||
|
||
## Context | ||
|
||
This ADR builds on top of [ADR 002](adr-002-ipld-da-sampling.md) and will use the implemented APIs described there. | ||
The reader should familiarize themselves at least with the high-level concepts the as well as in the [specs](https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#2d-reed-solomon-encoding-scheme). | ||
|
||
The academic Lazyledger [paper](https://arxiv.org/abs/1905.09274) describes the motivation and context for this API. | ||
The main motivation can be quoted from section 3.3 of that paper: | ||
|
||
> (Property1) **Application message retrieval partitioning.** Client nodes must be able to download all of the messages relevant to the applications they use [...], without needing to downloading any messages for other applications. | ||
|
||
> (Property2) **Application message retrieval completeness.** When client nodes download messages relevant to the applications they use [...], they must be able to verify that the messages they received are the complete set of messages relevant to their applications, for specific | ||
blocks, and that there are no omitted messages. | ||
|
||
|
||
|
||
The main data structure that enables above properties is called a Namespaced Merkle Tree (NMT), an ordered binary Merkle tree where: | ||
1. each node in the tree includes the range of namespaces of the messages in all descendants of each node | ||
2. leaves in the tree are ordered by the namespace identifiers of the leaf messages | ||
|
||
A more formal description can be found the [specification](https://github.com/lazyledger/lazyledger-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/data_structures.md#namespace-merkle-tree). | ||
An implementation can be found in [this repository](https://github.com/lazyledger/nmt). | ||
|
||
This ADR basically describes version of the [`GetWithProof`](https://github.com/lazyledger/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L193-L196) of the NMT that leverages the fact that IPFS uses content addressing and that we have implemented an [IPLD plugin](https://github.com/lazyledger/lazyledger-core/tree/37502aac69d755c189df37642b87327772f4ac2a/p2p/ipld) for an NMT. | ||
|
||
**Note**: The APIs defined here will be particularly relevant for Optimistic Rollup (full) nodes that want to download their Rollup's data (see [lazyledger/optimint#48](https://github.com/lazyledger/optimint/issues/48)). | ||
Another potential use-case of this API could be for so-called [light validator nodes](https://github.com/lazyledger/lazyledger-specs/blob/master/specs/node_types.md#node-type-definitions) that want to download and replay the state-relevant portion of the block data, i.e. transactions with [reserved namespace IDs](https://github.com/lazyledger/lazyledger-specs/blob/master/specs/consensus.md#reserved-namespace-ids). | ||
|
||
## Alternative Approaches | ||
|
||
The approach described below will rely on IPFS' block exchange protocol (bitswap) and DHT; IPFS's implementation will be used as a black box to find peers that can serve the requested data. | ||
This will likely be much slower than it potentially could be and for a first implementation we intentionally do not incorporate the optimizations that we could. | ||
Wondertan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We briefly mention potential optimizations for the future here: | ||
- Use of [graphsync](https://github.com/ipld/specs/blob/5d3a3485c5fe2863d613cd9d6e18f96e5e568d16/block-layer/graphsync/graphsync.md) instead of [bitswap](https://docs.ipfs.io/concepts/bitswap/) and use of [IPLD selectors](https://github.com/ipld/specs/blob/5d3a3485c5fe2863d613cd9d6e18f96e5e568d16/design/history/exploration-reports/2018.10-selectors-design-goals.md) | ||
Wondertan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- expose an API to be able to download application specific data by namespace (including proofs) with the minimal number of round-trips (e.g. finding nodes that expose an RPC endpoint like [`GetWithProof`](https://github.com/lazyledger/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L193-L196)) | ||
|
||
## Decision | ||
|
||
Most discussions on this particular API happened either on calls or on other non-documented way. | ||
We only describe the decision in this section. | ||
|
||
We decide to implement the simplest approach first. | ||
We first describe the protocol informally here and explain why this fulfils (Property1) and (Property2) in the [Context](#context) section above. | ||
|
||
In the case that leaves with the requested namespace exist, this basically boils down to the following: traverse the tree starting from the root until finding first leaf (start) with the namespace in question, then directly request and download all leaves coming after the start until the namespace changes to a greater than the requested one again. | ||
In the case that no leaves with the requested namespace exist in the tree, we traverse the tree to find the leaf in the position in the tree where the namespace would have been and download the neighbouring leaves. | ||
|
||
This is pretty much what the [`ProveNamespace`](https://github.com/lazyledger/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L132-L146) method does but using IPFS we can simply locate and then request the leaves, and the corresponding inner proof nodes will automatically be downloaded on the way, too. | ||
|
||
## Detailed Design | ||
|
||
We define one function that returns all shares of a block belonging to a requested namespace and block (via the block's data availability header). | ||
See [`ComputeShares`](https://github.com/lazyledger/lazyledger-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/block.go#L1371) for reference how encode the block data into namespace shares. | ||
|
||
```go | ||
// RetrieveShares returns all raw data (raw shares) of the passed-in | ||
// namespace ID nID and included in the block with the DataAvailabilityHeader dah. | ||
func RetrieveShares( | ||
ctx context.Context, | ||
nID namespace.ID, | ||
dah *types.DataAvailabilityHeader, | ||
api coreiface.CoreAPI, | ||
) ([][]byte, error) { | ||
// 1. Find the row root(s) that contains the namespace ID nID | ||
// 2. Traverse the corresponding tree(s) according to the | ||
// above informally described algorithm and get the corresponding | ||
// leaves (if any) | ||
// 3. Return all (raw) shares corresponding to the nID | ||
} | ||
|
||
``` | ||
|
||
Additionally, we define two functions that use the first one above to: | ||
1. return all the parsed (non-padding) data with [reserved namespace IDs](https://github.com/lazyledger/lazyledger-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/consensus.md#reserved-namespace-ids): transactions, intermediate state roots, evidence. | ||
2. return all application specific blobs (shares) belonging to one namespace ID parsed as a slice of Messages ([specification](https://github.com/lazyledger/lazyledger-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/data_structures.md#message) and [code](https://github.com/lazyledger/lazyledger-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/block.go#L1336)). | ||
|
||
The latter two methods might require moving or exporting a few currently unexported functions that (currently) live in [share_merging.go](https://github.com/lazyledger/lazyledger-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/share_merging.go#L57-L76) and could be implemented in a separate pull request. | ||
|
||
```go | ||
// RetrieveStateRelevantMessages returns all state-relevant transactions | ||
// (transactions, intermediate state roots, and evidence) included in a block | ||
// with the DataAvailabilityHeader dah. | ||
func RetrieveStateRelevantMessages( | ||
ctx context.Context, | ||
nID namespace.ID, | ||
dah *types.DataAvailabilityHeader, | ||
api coreiface.CoreAPI, | ||
) (Txs, IntermediateStateRoots, EvidenceData, error) { | ||
// like RetrieveShares but for all reserved namespaces | ||
// additionally the shares are parsed (merged) into the | ||
// corresponding types in the return arguments | ||
} | ||
``` | ||
|
||
```go | ||
// RetrieveMessages returns all Messages of the passed-in | ||
// namespace ID and included in the block with the DataAvailabilityHeader dah. | ||
func RetrieveMessages( | ||
ctx context.Context, | ||
dah *types.DataAvailabilityHeader, | ||
api coreiface.CoreAPI, | ||
) (Messages, error) { | ||
// like RetrieveShares but this additionally parsed the shares | ||
// into the Messages type | ||
} | ||
``` | ||
|
||
## Status | ||
|
||
Proposed | ||
|
||
## Consequences | ||
|
||
This API will most likely be used by Rollups too. | ||
We should document it properly and move it together with relevant parts from ADR 002 into a separate go-package. | ||
liamsi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Positive | ||
|
||
- easy to implement with the existing code (see [ADR 002](https://github.com/lazyledger/lazyledger-core/blob/47d6c965704e102ae877b2f4e10aeab782d9c648/docs/lazy-adr/adr-002-ipld-da-sampling.md#detailed-design)) | ||
- resilient data retrieval via a p2p network | ||
- dependence on a mature and well-tested code-base with a large and welcoming community | ||
|
||
### Negative | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if this is necessary to be added, but I also consider the current state of searches for non-existing namespaces as negative and expensive. Zero result request for node actually means downloading and traversing some non-trivial amount of nodes. While selectors can make things better, it's still a case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is right! In the worst-case, this could mean going through all row roots (e.g. say the last tree has a min/max NID range s.t. the namespace in question could be included) and then traversing the tree once from the root to the leaf that proves that this namespace is not included. I guess this is still OK as it is |
||
|
||
- with IPFS, we inherit the fact that potentially a lot of round-trips are done until the data is fully downloaded; in other words: this could end up way slower than potentially possible | ||
- anyone interacting with that API needs to run an IPFS node | ||
|
||
### Neutral | ||
|
||
- optimizations can happen incrementally once we have an initial working version | ||
|
||
## References | ||
|
||
We've linked to all references throughout the ADR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add that the ADR would also be helpful for DAS consensus nodes that want to verify internal state messages only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add that! Now that you've mentioned it, I guess we could also use a modified version of tendermint's internal gossiping mechanism for that: currently, the gossip routine pushes out random chunks of the whole (regular) block data (aka Parts) but instead, it could send out only the state relevant portions instead (as namespaced shares).
I feel like this could have several benefits: this gossiping is potentially faster than requesting the data from ipfs, the code is already there (but would need to be modified), we would remove some pressure from the proposer more quickly as we decrease the time where it is the only node on the network having all block data; it would buy some time for doing the DA sampling in parallel or pipelined shortly after receiving a few shares from the gossip routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added use-case: 916b289
I think the idea of using the gossiping mechanism should be discussed in an ADR / issue. Will leave the comment unresolved s.t. I can quickly find it when going through the open comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's an interesting idea and for its benefits, it can be better than what I was doing. I was recently thinking about/partially implementing that, but instead of parts, I am sending DAHeaders for nodes to catch up.
But I am concerned this is not a long-term solution and may require changes that soon will be removed, as IPFS is our main data source/sharing mechanism. Also, this is quite a sharp turn and should be documented as an issue somewhere first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but previously we were not optimizing for the use-case of light validator nodes that need to download all state relevant Txs. I guess with the deferred execution model that doesn't matter much as validators can start downloading these while the previous block is reaching consensus (it's the previous block's Tx the app hash is computed for) but with immediate execution, this will become a time-sensitive (and kinda consensus critical) issue. I think a simple hybrid gossiping approach that pushes out some data that every validator will need anyway paired with IPFS to get all other data on demand (e.g. for DAS or for downloading the whole block) could be a better approach. I think we have yet to decide on a specific approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this is not a decision either. I'm proposing to investigate this further: e.g. by looking into how easy it would be to change the current gossiping routine to only gossip reserved Txs only. If it turns out not to be easy we could decide to go down the other route.