Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign layer status to match new consensus mechanisms #144

Open
lrettig opened this issue May 7, 2021 · 12 comments
Open

Redesign layer status to match new consensus mechanisms #144

lrettig opened this issue May 7, 2021 · 12 comments
Milestone

Comments

@lrettig
Copy link
Member

lrettig commented May 7, 2021

Currently a layer can have one of three statuses:

enum LayerStatus {
LAYER_STATUS_UNSPECIFIED = 0; // not yet approved or confirmed
LAYER_STATUS_APPROVED = 1; // approved by hare
LAYER_STATUS_CONFIRMED = 2; // confirmed by tortoise
}

These statuses no longer map to how our consensus mechanisms actually work. Here's a better, more accurate design:

  • unspecified
  • pending: layer is still syncing, or new, or otherwise waiting to be processed/validated
  • analyzing: hare is currently running for this layer
  • invalidated: hare failed to run for this layer, and the node has decided it's empty (all blocks are invalid); or, hare succeeded and confirmed an empty layer
  • tentative: hare succeeded running for the layer, and a node thus has weak confidence that all other nodes agree on its contents
  • stuck: tortoise tried to verify this layer, but couldn't (because hare hasn't finished yet, or because the global opinion on the layer is abstain and the layer isn't old enough yet to try healing)
  • skipped: tortoise tried and failed to verify this layer, but moved on and verified a later layer (see Allow tortoise to verify layer n+1 before verifying layer n go-spacemesh#2403)
  • confirmed: tortoise succeeded in verifying the layer
  • applied: layer state transitions have been applied, receipts generated, etc.
  • healing: even after a layer is confirmed and its state has been applied, in rare cases, a node may need to re-apply the layer as part of a self-healing process
  • final: at some point we may want to go one step further and say that the layer is totally final and its state can no longer be updated, even by self-healing (I'm not sure when or if we can say this)

We may not want to surface all of these possible statuses via the API, and this list is not precisely MECE as there is some overlap, but it's reasonably comprehensive.

Related: spacemeshos/go-spacemesh#2403

@avive
Copy link
Contributor

avive commented May 11, 2021

We also need to think about transaction statuses. It is my understanding that while in self-healing, no other data is canonical until the self healing is complete. So, a transaction in block which which is in a layer that is healing will also need to have a tentative state - perhaps it is healing or perhaps it is tentative. We need to carefully consider what's the minimum new set of possible states that will give users a clue regarding the state of a network but on the other hand not have too many states as these are very confusing even for technical people. And the states need to be for all mesh entities... not just layers.

@lrettig
Copy link
Member Author

lrettig commented May 11, 2021

To be clear, transactions obviously do not have an independent status - they derive their status from the status of their block and layer.

while in self-healing, no other data is canonical until the self healing is complete

What makes self-healing complex, in this context, is that it can invalidate a previously valid block (or vice-versa). So we could have blocks (and transactions) that are "approved" and applied to state, then reverted later. That's why I suggested introducing a "final" status, but we'll have to discuss with @tal-m the threshold beyond which we could apply this.

@avive
Copy link
Contributor

avive commented May 13, 2021

We need to refine this and find a minimal MECE set. For example, why do we need unspecified if we have pending? Obviously we need to find a balance between being descriptive and informative and not confusing users with too many states. I think 7 is the magic number here that above it most people will the states just overwhelming and overly complex. For example, if stuck is a temporary possible state then it can also be pending.
One thing to consider is to have all proposed states above until verified by tortoise to be pending and maybe provide more detailed hare-related status in the debugging api service.

Here's a minimalistic proposal for 3 high-level states for layer, block and tx (same states for all 3 entities):

  • Pending - Including when node determined needs to self heal in order to verify the layer and including unspecified.

  • Verified- Tortoise verified.

  • Confirmed - Verified and state applied (txs executed).

  • Hare related statuses: in debugging api service for tests.

@lrettig
Copy link
Member Author

lrettig commented May 14, 2021

why do we need unspecified if we have pending?

This is a quirk of how GRPC works (and golang) - there needs to be a default value other than pending so we know whether or not that value has been initialized correctly. It doesn't need to be exposed to the user (if it is, that's a bug).

@avive
Copy link
Contributor

avive commented May 14, 2021

So how about:

 enum LayerStatus { 
     LAYER_STATUS_UNSPECIFIED = 0; // unknown
     LAYER_STATUS_PENDING = 1;       // not yet approved or confirmed 
     LAYER_STATUS_APPROVED = 2;   // approved by hare 
     LAYER_STATUS_VERIFIED = 3;       // approved by tortoise 
     LAYER_STATUS_CONFIRMED = 4; // confirmed by tortoise and state applied
 }

So each state is additional confidence in confirmation compared to the one before it and the last one is the max level of confirmation we have in our system. We still have the question regarding can a verified layer move to pending due to self healing or not.

@lrettig
Copy link
Member Author

lrettig commented May 14, 2021

  • we definitely need an "invalid" status, for blocks that were marked contextually invalid (by hare OR by tortoise)
  • don't we want a "final" status as well?

@avive
Copy link
Contributor

avive commented May 14, 2021

  • Regarding invalid - I thought we are talking about layer statuses. Yes, for some blocks known to a node I guess that can be invalid if they are not in any valid layer.
  • Regarding final - this depends on whether self healing can change any block in the past w/o limitations. If yes then there is no final blocks, if no then I guess there are.

@lrettig
Copy link
Member Author

lrettig commented May 15, 2021

Discussed this with @tal-m today: regarding "final", we have no explicit finality. Finality will be implicit, subjective, and probabilistic, as in Bitcoin. So I think we can drop this status.

@avive
Copy link
Contributor

avive commented May 16, 2021

So after thinking more about this, maybe we go with these high-level layer (and transaction) statuses:

enum LayerStatus { 
     LAYER_STATUS_UNSPECIFIED = 0; // unknown
     LAYER_STATUS_PENDING = 1;       // not yet approved or confirmed 
     LAYER_STATUS_APPROVED = 2;   // approved by hare 
     LAYER_STATUS_VERIFIED = 3;       // approved by tortoise 
     LAYER_STATUS_CONFIRMED = 4; // confirmed by tortoise and state applied for txs in the layer
 }

and have additional sub-statuses regarding hare in lower-level api such as debuggingServices if needed for tests.

@lrettig
Copy link
Member Author

lrettig commented May 16, 2021

Add invalid to the list and I will agree with you :)

@avive
Copy link
Contributor

avive commented May 26, 2021

Add invalid to the list and I will agree with you :)

How is it different from LAYER_STATUS_UNSPECIFIED?

@avive avive modified the milestones: v.12, v1.2 May 30, 2021
@lrettig lrettig mentioned this issue Jun 10, 2021
@lrettig
Copy link
Member Author

lrettig commented Jun 10, 2021

Add invalid to the list and I will agree with you :)

How is it different from LAYER_STATUS_UNSPECIFIED?

I explained here. Individual blocks can be invalidated by hare or by tortoise. An entire layer can also be invalidated, e.g., if hare fails completely for that layer, which means that all of the blocks in the layer are marked invalid. Technically we can "verify" or "confirm" an empty layer, so I guess maybe we don't need a separate INVALID status. Do we need an EMPTY status? It can be implied by the nonexistence of any block data in the layer, as long as downstream clients know how to interpret and display empty layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants