Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - ATX v2 (for ATX merge) #5785

Closed
wants to merge 12 commits into from
Closed

[Merged by Bors] - ATX v2 (for ATX merge) #5785

wants to merge 12 commits into from

Conversation

poszu
Copy link
Contributor

@poszu poszu commented Mar 27, 2024

Motivation

Proposal of the new ATX version to support the atx merging.

Description 🚧

New ATX V2 allows identities to marry (join an equivocation set) and publish ATX together (merge).

πŸ’’ Marriage

Marriage happens when an identity selected by the miner publishes an ATX with the Marriages slice filled with marriage certificates.

A certificate proves the will of the identity that signed it to be married to the identity that publishes the certificate. For example, if identity Alice wants to marry Bob (and vice versa), the owner of these identities first picks one that will publish an ATX with a marriage certificate. Say it's Bob. Alice signs Bob's ID and creates a MarriageCertificate, which Bob includes in his next ATX.

The marriage is instant ⚑ and, from the moment Bob published the ATX, Alice and Bob are forever bound by forming a so-called equivocation set.

πŸ‘ͺ equivocation set

An equivocation set bounds all identities it contains. If one of them acts maliciously, all IDs in the set become malicious.

Merging

A merge is an act of publishing a single ATX on behalf of many identities. A merged ATX contains non-interactive proofs of space-time (NIPoST) for multiple identities. It is signed by a single ID that publishes it and must include a VRF nonce (that belongs to the ID signing the ATX) that satisfies the combined storage of all included IDs. It becomes active for the target epoch and will participate in the network protocols (i.e. Hare) and receive accumulated rewards on the selected coinbase account according to the weight of all IDs present in the ATX.

πŸ’‘ A merged ATX contains a reference to the ATX in which marriage happened for easy syntactical verification if the included IDs are married.

Restrictions

1️⃣ One merged ATX per epoch

There can be only one merged ATX per equivocation set. Identities that choose not to merge (for example because of software or hardware malfunction, being late with PoST proving, etc.) must publish separate ATXs just for themself. They cannot form a second merged ATX in the same epoch.

For example, given an equivocation set of IDs A,B,C,D,E. If C and E malfunction, the node:

  • can publish ATXs [A,B,D] + [C] + [E]
  • cannot publish ATXs [A,B,D] + [C,E]

2️⃣ There is an upper limit of equivocation set size

Only up to N identities can marry int a single equivocation set. Marrying more is malfeasance and cancels all IDs already present in the set as well as the ones that tried to join it.

3️⃣ The equivocation set cannot grow

Identities that are already married cannot marry again.

@poszu poszu changed the base branch from develop to remove-optionals-from-ActivationTx March 29, 2024 13:33
Copy link

codecov bot commented Mar 30, 2024

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 80.6%. Comparing base (357c1e4) to head (851ee6a).
Report is 6 commits behind head on develop.

Files Patch % Lines
activation/wire/challenge_v2.go 0.0% 13 Missing ⚠️
activation/wire/wire_v2.go 0.0% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           develop   #5785      +/-   ##
==========================================
+ Coverage         0   80.6%   +80.6%     
==========================================
  Files            0     289     +289     
  Lines            0   29868   +29868     
==========================================
+ Hits             0   24103   +24103     
- Misses           0    4169    +4169     
- Partials         0    1596    +1596     

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

}

type InnerActivationTxV2 struct {
PublishEpoch uint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're changing ATX structure, it would be nice if we could include an (untrusted) LayerID here for sync purposes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what the use case is and what value the LayerID would have? Would it need to be part of InnerActivationTx (for signing)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for syncv2 so that better clustering of ATX IDs during sync can be achieved, reducing sync traffic. The LayerID must be signed so it can't be changed by other nodes that process the ATX, confusing the sync algorithm by assigning the same ATX to different layers. It is assumed that some nodes may lie in that LayerID field in ATXs they publish, but with majority of nodes being honest, that should not cause substantial performance degradation of the set reconciliation algorithm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would LayerID field contain the layer in which the ATX was created or something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PublishEpoch needs to be the first field in the type, because some decoding logic depends on it and we cannot change older ATXs retroactively to now use the LayerID instead.

I'm not fully understanding how LayerID can help improve sync? An ATX isn't published in any layer - protocol wise it should make no difference if an ATX is published on the first day or the last day of an epoch?

Also PublishEpoch is part of the PoET challenge. So if it would be a LayerID an honest node would need to know in advance with 5 minutes accuracy when it will publish its ATX 2 weeks later, which seems very difficult to predict (especially in a multi-smesher setup) as a) there is some delay when the PoEt proof will be available, it could take multiple passes to generate the PoST proof, the node might not be in sync before publishing the ATX further delaying the publication, etc.

Any sensible node will then just set the LayerID to the last layer before the next PoET round starts to give itself the maximum time to generate the PoST and publish the ATX, causing all ATXs of the same PoET phase to be published in the same layer...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, we need to keep the publish epoch the first field in the encoding for backwards compatibility and so we know which version of the ATX the bytes after that encode.

We would then need to additionally encode the LayerID which the publishing node has no incentive to be honest about and the receiving node has no way to verify if it is correct or not. What happens if someone with a lot of identities just claims they were all published in the same layer if they weren't?

If it's not an issue for a successful sync then why is helpful to have it? If it does cause problems - how can we prevent a malicious user from disrupting the network by claiming incorrect publication layers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, we need to keep the publish epoch the first field in the encoding for backwards compatibility and so we know which version of the ATX the bytes after that encode.

πŸ‘

@ivan4th Could you motivate and explain adding the LayerID field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poszu The range-based set reconciliation algorithm is based on subdividing the ranges of possible key values for which the fingerprints (hashes of keys XORed together) don't match. If keys are just hash values, and say a peer has 100 hashes that the other one doesn't have, then syncing these 100 keys which are evenly spread across the key space may cause 100 subdivisions (roughly speaking), narrowing down to each of the keys individually.

But if the keys include LayerID and thus can be sorted by time (layer), then the most likely situation is that these 100 keys will be close to one another, and with the algorithm we use it may only take just a few subdivisions to sync all of the keys. Basically, just narrow down to the 100 new keys with just small "already synced" gaps between quickly and then transfer all of the IDs in the time-based range, downloading the missing blobs afterwards.

With such an optimization, the new sync algorithm will be very efficient at handling "send me all of the recent data" case, which is expected to be most common for the synced / mostly synced nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I don't see any downside to using a publish layer instead of a publish epoch. The ATX handler could translate it to the epoch easily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand your goal but I still fail to see the benefit.

Any honest node will optimize their post proofing times to consume most (if not all) of their CG, so most of the ATXs will be published towards the end of a CG, causing all ATXs to be concentrated in very few layers.

Even if 100% of all nodes are honest, the great majority of ATXs will be within those few layers (the last layers of every CG and the last few layers of the epoch). Which makes this optimization much less efficient as it would be if ATXs would be spread evenly across an epoch.

@poszu poszu changed the base branch from remove-optionals-from-ActivationTx to develop April 24, 2024 12:43
PublishEpoch types.EpochID
PrevATXID types.ATXID
PositioningATXID types.ATXID
CommitmentATXID *types.ATXID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the CommitmentATXID doesn't need to be included in the Nipost Challenge. The InitialPost is already valid only for that ATX.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commitment ATX is needed to be able to verify a PoST

in ATX v1 it used to be part of the NiPostChallenge, maybe so that PoET can refuse registrations with an invalid InitialPost (which would require the commitment ATX for verification). If we don't think that would be useful in the future we can move the commitment ATX to a different section of the ATX.

I generally would also prefer not having struct embedding on wire types. Instead the ATX could just be one big object with multiple fields and NIPostChallengeV2 could be a method returning the struct above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIPostChallengeV2 is a structure aggregating things a miner must commit to before registering in the poet. The InitialPost already implicitly contains the commitment to a specific commitment ATX as it's one of the input parameters to PoST initialization so I think it doesn't need to be repeated (but it still needs to be present in the initial ATX to verify PoSTs and the VRF Nonce.

Not that the NIPostChallengeV2 is not part of the ATX V2 at all. During ATX handling, it can be reconstructed from the fields present in the ATX.

PublishEpoch types.EpochID
PrevATXID types.ATXID
PositioningATXID types.ATXID
CommitmentATXID *types.ATXID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commitment ATX is needed to be able to verify a PoST

in ATX v1 it used to be part of the NiPostChallenge, maybe so that PoET can refuse registrations with an invalid InitialPost (which would require the commitment ATX for verification). If we don't think that would be useful in the future we can move the commitment ATX to a different section of the ATX.

I generally would also prefer not having struct embedding on wire types. Instead the ATX could just be one big object with multiple fields and NIPostChallengeV2 could be a method returning the struct above.

InnerActivationTxV2

SmesherID types.NodeID
// Signature over InnerActivationTxV2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add as information that it is the root of the merkle tree formed from all fields of InnerActiationTxV2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's worth documenting it when we decide on the signing schema.

Comment on lines +22 to +28
// Hash serializes the NIPostChallenge and returns its hash.
// The serialized challenge is first prepended with a byte 0x00, and then hashed
// for second preimage resistance of poet membership merkle tree.
func (c *NIPostChallengeV2) Hash() types.Hash32 {
ncBytes := codec.MustEncode(c)
return hash.Sum([]byte{0x00}, ncBytes)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also be changed to be the root of a merkle tree formed by the NiPostChallengeV2, to ensure we can proof any field to be a specific value in a malfeasance proof without having to attach the full ATX. We just need to make sure that the NiPostChallengeV2 is attached via its root to the tree to be able to extract each field individually.

Regarding pre-image resistance: we need to check if this is still needed with the new signature scheme.

activation/wire/wire_v2.go Outdated Show resolved Hide resolved
activation/wire/wire_v2.go Outdated Show resolved Hide resolved
activation/wire/wire_v2.go Outdated Show resolved Hide resolved
@poszu poszu changed the title [WIP] ATX v2 (for ATX merge) ATX v2 (for ATX merge) May 7, 2024
@poszu poszu marked this pull request as ready for review May 7, 2024 08:19
@poszu poszu requested a review from dshulyak as a code owner May 7, 2024 08:19
@poszu
Copy link
Contributor Author

poszu commented May 9, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request May 9, 2024
## Motivation

Proposal of the new ATX version to support the atx merging.
@spacemesh-bors
Copy link

spacemesh-bors bot commented May 9, 2024

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title ATX v2 (for ATX merge) [Merged by Bors] - ATX v2 (for ATX merge) May 9, 2024
@spacemesh-bors spacemesh-bors bot closed this May 9, 2024
@spacemesh-bors spacemesh-bors bot deleted the atx-v2 branch May 9, 2024 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants