Skip to content

Implement approval store#384

Merged
yacovm merged 2 commits into
mainfrom
reconfig-6
Jun 2, 2026
Merged

Implement approval store#384
yacovm merged 2 commits into
mainfrom
reconfig-6

Conversation

@yacovm
Copy link
Copy Markdown
Collaborator

@yacovm yacovm commented May 18, 2026

Also self-sign during aggregation

  • Introduces ApprovalStore (msm/approvals.go) — an in-memory store of validator approvals for epoch transitions, keyed by (NodeID, PChainHeight). It verifies BLS signatures on
    ingest, deduplicates by timestamp (newer wins, older is dropped), and prunes per-node entries to at most len(validators) by evicting the oldest timestamp.
  • Adds Timestamp to ValidatorSetApproval (msm/encoding.go) so the store can order/evict approvals deterministically.
  • computeNewApprovals (msm/msm.go) now optimistically self-signs the next epoch's P-chain reference height each round and appends its own ValidatorSetApproval to the peer set since the store deduplicates it later.

@yacovm yacovm force-pushed the reconfig-6 branch 3 times, most recently from bbd66e3 to 2c54008 Compare May 19, 2026 21:27
Comment thread msm/msm.go
Comment thread msm/msm.go
Comment thread msm/util_test.go
Comment thread msm/msm_test.go Outdated
Comment thread msm/approvals.go
Comment thread msm/approvals.go
}
}

func (as *ApprovalStore) checkApprovalSignature(approval *ValidatorSetApproval, pk []byte) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm is it not a problem that we verify the signature is valid over just the pChainHeight here, but in the MSM code we verify it with the context + asn1 encoding? shouldn't there be one canonical way of verifying/signing that is shared for both of them?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because when I implemented this, I still haven't done that domain separation in the MSM.

Will fix 👍

Copy link
Copy Markdown
Collaborator Author

@yacovm yacovm Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it still not a problem though that we don't check the approval signature against the timestamp? we also don't pass in who sent the approval so someone could send an approval with a timestamp = MaxUint64 and say it was from another node. This would remove all other approvals from that node in maybePruneOldApprovals.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have answered your own question.

This only works if the dissemination of the approvals is point to point, over authenticated channels, and not via gossip.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also recall that:

type NextEpochApprovals struct {
	NodeIDs   []byte `canoto:"bytes,1"`
	Signature []byte `canoto:"bytes,2"`

	canotoData canotoData_NextEpochApprovals
}

This doesn't include timestamps, and we naturally cannot expect all signatures to match a single timestamp.

The timestamp is just there to give us liveness when a node crashes and restarts or picks a different P-chain height to sign over. We need some way of disposing of "old" in-memory values.

We could have the timestamp be even computed locally inside of HandleApproval and not sent over the wire at all.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the timestamp be even computed locally inside of HandleApproval and not sent over the wire at all

might be a nice change, why send it if we don't need to.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread msm/approvals.go Outdated
Comment thread msm/approvals.go Outdated
Comment thread msm/approvals_test.go
Comment thread msm/approvals_test.go Outdated
Also self-sign during aggregation

- Introduces ApprovalStore (msm/approvals.go) — an in-memory store of validator approvals for epoch transitions, keyed by (NodeID, PChainHeight). It verifies BLS signatures on
  ingest, deduplicates by timestamp (newer wins, older is dropped), and prunes per-node entries to at most len(validators) by evicting the oldest timestamp.
- Adds Timestamp to ValidatorSetApproval (msm/encoding.go) so the store can order/evict approvals deterministically.
- computeNewApprovals (msm/msm.go) now optimistically self-signs the next epoch's P-chain reference height each round and appends its own ValidatorSetApproval to the peer set since the store deduplicates it later.

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Comment thread msm/approvals.go Outdated
}

func (as *ApprovalStore) maybePruneOldApprovals(approval *ValidatorSetApproval) {
for len(as.approvalsByNodes[approval.NodeID]) > len(as.validators) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just do something like instead? i think the for loop is redundant(see comment)

func (as *ApprovalStore) maybePruneOldApprovals(approval *ValidatorSetApproval) {
	approvals := as.approvalsByNodes[approval.NodeID]

	// We add at most one approval per call, so the node can exceed the cap by
	// at most one entry. When it does, evict the one with the oldest timestamp.
	if len(approvals) <= len(as.validators) {
		return
	}

	var oldest *ValidatorSetApproval
	for _, a := range approvals {
		if oldest == nil || a.Timestamp < oldest.Timestamp {
			oldest = a
		}
	}

	as.logger.Debug("Deleting old approval from node",
		zap.String("nodeID", fmt.Sprintf("%x", oldest.NodeID)),
		zap.Uint64("oldestApprovalPChainHeight", oldest.PChainHeight),
		zap.Uint64("oldestApprovalTimestamp", oldest.Timestamp))
	delete(approvals, oldest.PChainHeight)
	as.storedCount--
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we could do that, that was probably too much of defensive programming

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
@yacovm yacovm merged commit 1075aa7 into main Jun 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants