Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Umbrella] [Sync] Engineering plan for follower (A), transaction dissemination (B) and check pointing (C) #1099

Closed
7 of 9 tasks
gdanezis opened this issue Mar 28, 2022 · 5 comments

Comments

@gdanezis
Copy link
Collaborator

gdanezis commented Mar 28, 2022

This issue gathers the tasks road map following the extensive design discussions in #194 . We we develop particular components we will make individual issues to discuss, but here we present the big picture.

Priority A: A client / authority / replica can request historic and real time local transaction sequence from another authority.

Priority B: Authorities proactively share certificates (and optionally transactions) with each other, to ensure honest authorities are close to having processed the same set of certificates. ( @laura-makdah & @gdanezis )

  • Leverage facilities in Priority A to implement a variant of the Murmur protocol (https://arxiv.org/pdf/1908.01738.pdf) between authorities. In that protocol each authority initiates following O(LogN) others (bidirectonally) at random, and exchanges their list of local updates.
  • Allow authorities that follow each other to maintain a diff of their databases.
  • Extend the protocol to allow an authority to request a non-real time sequence from another authority, and to create a diff between its local state and the state of the remote authority.

Priority C: Protocols to leverage the information available on authorities to derive periodic checkpoints where a super-majority of authorities agree to the set of certificates processed up to these points.

  • Create a DB to store information about the checkpoints, index transactions by checkpoint, and maintain sets of local transactions that are not yet included in the checkpoint, as well as sets of transactions in checkpoints not locally processed. (@gdanezis )
  • Create logic to leverage the local transaction diffs in Priority B, and a consensus mechanism to define global checkpoints, following the spanning tree checkpointing concepts (see https://docs.google.com/presentation/d/1TSs1FPC9INZMpipP1vP2dK7XMmOXtHk5CLWFKO6s7p8/edit?usp=sharing) initially. (@gdanezis )
  • Connect the information about the global sequence to the local sequence to facilitate local sequence sync in Priority B.
  • Connect to a consensus mechanism to actually construct the checkpoint (@asonnino ), and process incoming checkpoints from agreement.

cc: @LefKok who can provide reviews on design & feedback on interactions with reconfig. @kchalkias who will want to review crypto.

@velvia
Copy link
Contributor

velvia commented Mar 29, 2022

Tagging myself @velvia here.

A couple thoughts:

  • Do we have details about the precise exchange mechanism in B, what will be exchanged?

  • When we say authorities will "follow" each other are you saying they would follow each others' streams?

  • @gdanezis we should sync on "Create a DB to store info about checkpoints". My view is that the store that stores transactions and certs should natively understand each "block" ie the batch(es) between checkpoints and that should be used as a universal way of doing range reads across authorities. Even if the TX order is not the same across authorities the "checkpoint number" or "block height" etc should be universally agreed upon. We can leverage that

@gdanezis
Copy link
Collaborator Author

@gdanezis we should sync on "Create a DB to store info about checkpoints".

If it helps you can have a look here, this is what I was thinking of:
https://github.com/MystenLabs/sui/blob/waypoint/sui_core/src/authority/authority_checkpoints.rs

As you can see in this draft indeed checkpoints have a sequence number, and you can get the set of transaction digests (potentially maybe effects digests down the line) at each checkpoint. Is this what you were suggesting?

@gdanezis
Copy link
Collaborator Author

gdanezis commented Mar 29, 2022

Do we have details about the precise exchange mechanism in B, what will be exchanged?
When we say authorities will "follow" each other are you saying they would follow each others' streams?

To start with I would use the follower interfaces for authorities to follow a subset of other authorities to propagate the transactions digests that are processed. If a transaction digest seen at another authority is not processed locally, the authority can try to download the cert to execute it.

All this is TBD. How does that affect your work on storage?

@velvia
Copy link
Contributor

velvia commented Mar 29, 2022

All this is TBD. How does that affect your work on storage?

It probably doesn't, I was just curious about if we were using IBLTs, bitmaps, etc. The only thing that would affect is basically an efficient way of extracting whatever info is exchanged, if it is not the subscription.
For example, to build IBLTs or bitmaps, need to iterate through all TX digests in an efficient way in some range.

As you can see in this draft indeed checkpoints have a sequence number, and you can get the set of transaction digests (potentially maybe effects digests down the line) at each checkpoint. Is this what you were suggesting?

I looked into the WIP / authority_checkpoint that you have above. I guess I'm thinking of a different approach, one designed for easy state shedding. Here we will have a table going from checkpoint to TX digest (well, checkpoint and TX digest) and we also have a separate sequencing table for the batch sequences, and these are all growing rapidly with time. Instead I'm thinking of a TX store which organizes all TXs sequenced within each checkpoint or "block":

  • Checkpoint or block number -->
    • Sequence number --> TX/Certificate/Effects
    • Map from TX digest -> sequence number

So the cert/effects store would have built in a notion of checkpoints or blocks, and allow older checkpoints to easily be "dropped" or shipped off to cold storage.

Furthermore and the reason why I ask about the checkpoint number. Imagine if you have a CertificateSequenceNumber which is basically a u64 or a { checkpoint_block_no: u32, seq_no: u32 }.... assuming that each block or checkpoint has << 2^32 TX's, and there are no more than 2^32 checkpoints (checkpoint numbers could be recycled after a while). Now you can range scan all the transactions and certificates, in block order, and refer to them easily, and we have the property that every authority which has synced up should have the same set of TX's/certs within a given checkpoint block number, meaning that the CertificateSequenceNumber in two authorities should have an identical checkpoint_block_no.

@LefKok LefKok self-assigned this Apr 8, 2022
@gdanezis gdanezis added Type: Major Feature Major new functionality or integration, which has a significant impact on the network Triaged Issue is associated with a milestone and has at least 1 label of each: status, type, & priority Priority: Critical This task must be completed or something bad will happen e.g. missing a major milestone, major bug Status: In Progress Issue is actively being worked on labels Apr 13, 2022
@gdanezis gdanezis added this to the Pre Testnet milestone Apr 13, 2022
@gdanezis
Copy link
Collaborator Author

The latest diffs implement the gossip protocol:

@lxfind lxfind removed this from the Pre Testnet milestone May 9, 2022
@lxfind lxfind added the sui-node label May 9, 2022
@gdanezis gdanezis removed Type: Major Feature Major new functionality or integration, which has a significant impact on the network Priority: Critical This task must be completed or something bad will happen e.g. missing a major milestone, major bug Status: In Progress Issue is actively being worked on Triaged Issue is associated with a milestone and has at least 1 label of each: status, type, & priority labels May 18, 2022
@gdanezis gdanezis added this to the Pre Testnet milestone May 26, 2022
@lxfind lxfind changed the title [Sync] Engineering plan for follower (A), transaction dissemination (B) and check pointing (C) [Umbrella] [Sync] Engineering plan for follower (A), transaction dissemination (B) and check pointing (C) Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants