Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Support poet certificates #5221

Closed
wants to merge 71 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
d16cbe7
Support for poet certificates WIP
poszu Nov 1, 2023
08677b3
Support for poet certificates cont
poszu Nov 3, 2023
d68a562
Initial certification using existing ATX or initial POST
poszu Nov 6, 2023
535dfcf
Improve poet HTTP client coverage
poszu Nov 6, 2023
513c29a
Bump certifier
poszu Nov 6, 2023
2bf4807
Use the right post challenge to certify
poszu Nov 8, 2023
54f915b
Fixed POST metadata sent to certifier
poszu Nov 9, 2023
6cc3971
Bump certifier image
poszu Nov 9, 2023
f79232c
Migrate certifier store to localdb
poszu Nov 9, 2023
a2893dc
Remove CertifierInfo struct
poszu Nov 9, 2023
ffd77d0
Add systest for poet registrations with PoW and cert
poszu Nov 10, 2023
326bde0
Configurable certifier retry options
poszu Nov 13, 2023
f2ec6f7
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Nov 13, 2023
e0a6f89
Increase systest CI job timeout
poszu Nov 13, 2023
0054df7
Keep POST proof in localDB for certification
poszu Nov 14, 2023
50fa30f
Take optional poet certificates in config
poszu Nov 14, 2023
2a9210e
Bump poet to v0.10.0
poszu Nov 15, 2023
111576b
Apply suggestions from code review
poszu Nov 20, 2023
678e2c6
update nipost_test.go
poszu Nov 20, 2023
29118cd
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Nov 20, 2023
ca0fcf0
fixup ProofToCertifyMetadata struct name
poszu Nov 20, 2023
f80eb4a
Run poet systests in parallel
poszu Nov 20, 2023
4302d65
Add NodeID to certs passed in config
poszu Nov 20, 2023
cb33ecf
Fix certifier config in systests
poszu Nov 20, 2023
604bd51
Rename NewCertifierOption
poszu Nov 21, 2023
0a86126
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Nov 21, 2023
ce012ca
Apply localdb code migrations on checkpoint recovery
poszu Nov 21, 2023
97df610
Remove importing certificates
poszu Nov 22, 2023
18e8f23
Build initial post if post cannot be found
poszu Nov 23, 2023
98defe3
Remove sourcegraph/conc dep
poszu Nov 23, 2023
4eaa867
Bump certifier-service in systests to v0.6.0
poszu Nov 23, 2023
4270563
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Nov 24, 2023
c1ccfa8
Don't update post in localdb
poszu Nov 24, 2023
b37a705
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Nov 27, 2023
8439433
Fix UT
poszu Nov 27, 2023
5cca828
Autoscale post verifying workers (#5354)
poszu Dec 27, 2023
00edbc3
Fix default value of challenge in initial_post table
poszu Dec 28, 2023
c61e32d
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Jan 5, 2024
9b9f881
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Jan 9, 2024
e81a4c1
Fix flaky TestNIPostBuilder_Close
poszu Jan 9, 2024
7e3961b
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Feb 2, 2024
210c3e1
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Feb 12, 2024
c053552
Fixes
poszu Feb 12, 2024
7319d84
Bump post and certifier services in systests
poszu Feb 12, 2024
314e921
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Mar 21, 2024
97f0ff1
Support for poet certs with expiration
poszu Mar 21, 2024
7e35d3f
Use cert verification shared from poet
poszu Mar 25, 2024
59f45a0
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Mar 25, 2024
bd3130d
Fix linting issues
poszu Mar 25, 2024
87c9284
Bump certifier service in systests
poszu Mar 25, 2024
f347e3d
Rename local sql migration
poszu Mar 25, 2024
377a5d5
Fix staticcheck
poszu Mar 25, 2024
342eee9
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Apr 18, 2024
459f453
Use nipost.Post
poszu Apr 18, 2024
601df9d
More review fixes
poszu Apr 18, 2024
af1e89a
Update poet and post-rs
poszu Apr 18, 2024
b7426fa
update test
poszu Apr 19, 2024
d72d416
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Apr 23, 2024
af29624
Fix UTs
poszu Apr 23, 2024
0ec2d97
Refactor certifier to lookup PoST on its own
poszu Apr 23, 2024
845c12d
Fix systest
poszu Apr 23, 2024
30e322f
Merge remote-tracking branch 'origin/develop' into support-poet-certi…
poszu Apr 26, 2024
c0b306c
Review feedback
poszu Apr 26, 2024
84c1800
Optimize searching for positioning ATX (#5952)
poszu May 22, 2024
f4c358d
Move certification to the poet client
poszu May 22, 2024
d1d597a
Extend E2E activation test to use poet certificates with expiration
poszu May 22, 2024
377c62d
Parallelize certifying initial post
poszu May 22, 2024
62ea324
Avoid redundant certifications for (nodeID, pubkey) pairs
poszu May 22, 2024
879db03
Review feedback
poszu May 23, 2024
c9f3f96
Fix remaining UTs
poszu May 23, 2024
7f83cab
Fix remaining UTs
poszu May 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
161 changes: 38 additions & 123 deletions activation/activation.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@
localDB *localsql.Database
publisher pubsub.Publisher
nipostBuilder nipostBuilder
certifier certifierService
validator nipostValidator
certifierConfig CertifierConfig
layerClock layerClock
syncer syncer
log *zap.Logger
Expand All @@ -94,7 +94,6 @@
// since they (can) modify the fields below.
smeshingMutex sync.Mutex
signers map[types.NodeID]*signing.EdSigner
certifiers map[types.NodeID]certifierService
eg errgroup.Group
stop context.CancelFunc
}
Expand Down Expand Up @@ -147,9 +146,9 @@
}
}

func WithCertifierConfig(c CertifierConfig) BuilderOption {
func WithPoetCertifier(c certifierService) BuilderOption {
return func(b *Builder) {
b.certifierConfig = c
b.certifier = c
}
}

Expand All @@ -173,14 +172,13 @@
localDB: localDB,
publisher: publisher,
nipostBuilder: nipostBuilder,
certifier: &disabledCertifier{},
layerClock: layerClock,
syncer: syncer,
log: log,
poetRetryInterval: defaultPoetRetryInterval,
postValidityDelay: 12 * time.Hour,
postStates: NewPostStates(log),
certifiers: make(map[types.NodeID]certifierService),
certifierConfig: DefaultCertifierConfig(),
}
for _, opt := range opts {
opt(b)
Expand Down Expand Up @@ -320,18 +318,34 @@
return maps.Keys(b.signers)
}

// Create the initial post and save it.
func (b *Builder) buildInitialPost(ctx context.Context, nodeID types.NodeID) (*types.Post, *types.PostInfo, error) {
func (b *Builder) buildInitialPost(ctx context.Context, nodeID types.NodeID) error {
// Generate the initial POST if we don't have an ATX...
if _, err := atxs.GetLastIDByNodeID(b.db, nodeID); err == nil {
return nil
}
// ...and if we haven't stored an initial post yet.
_, err := nipost.GetPost(b.localDB, nodeID)
switch {
case err == nil:
b.log.Info("load initial post from db")
return nil
case errors.Is(err, sql.ErrNotFound):
b.log.Info("creating initial post")
default:
return fmt.Errorf("get initial post: %w", err)
}

// Create the initial post and save it.
startTime := time.Now()
post, postInfo, err := b.nipostBuilder.Proof(ctx, nodeID, shared.ZeroChallenge)
if err != nil {
return nil, nil, fmt.Errorf("post execution: %w", err)
return fmt.Errorf("post execution: %w", err)
}
if postInfo.Nonce == nil {
b.log.Error("initial PoST is invalid: missing VRF nonce. Check your PoST data",
log.ZShortStringer("smesherID", nodeID),
)
return nil, nil, errors.New("nil VRF nonce")
return errors.New("nil VRF nonce")
}
initialPost := nipost.Post{
Nonce: post.Nonce,
Expand All @@ -349,127 +363,36 @@
}, postInfo.NumUnits)
if err != nil {
b.log.Error("initial POST is invalid", log.ZShortStringer("smesherID", nodeID), zap.Error(err))
return nil, nil, fmt.Errorf("initial POST is invalid: %w", err)
}

if err := nipost.AddPost(b.localDB, nodeID, initialPost); err != nil {
b.log.Error("failed to save initial post", zap.Error(err))
if err := nipost.RemovePost(b.localDB, nodeID); err != nil {

Check warning on line 366 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L366

Added line #L366 was not covered by tests
b.log.Fatal("failed to remove initial post", log.ZShortStringer("smesherID", nodeID), zap.Error(err))
}
return fmt.Errorf("initial POST is invalid: %w", err)
}

metrics.PostDuration.Set(float64(time.Since(startTime).Nanoseconds()))
public.PostSeconds.Set(float64(time.Since(startTime)))
b.log.Info("created the initial post")
return post, postInfo, nil
}

// Obtain certificates for the poets.
// We want to certify immediately after the startup or creating the initial POST
// to avoid all nodes spamming the certifier at the same time when
// submitting to the poets.
func (b *Builder) certifyPost(ctx context.Context, nodeID types.NodeID, post *nipost.Post) {
client := NewCertifierClient(b.log, nodeID, post, WithCertifierClientConfig(b.certifierConfig.Client))
certifier := NewCertifier(b.localDB, b.log, client)
certifier.CertifyAll(ctx, b.poets)

b.smeshingMutex.Lock()
b.certifiers[nodeID] = certifier
b.smeshingMutex.Unlock()
return nipost.AddPost(b.localDB, nodeID, initialPost)
}

func (b *Builder) obtainPostFromLastAtx(ctx context.Context, nodeId types.NodeID) (*nipost.Post, error) {
atxid, err := atxs.GetLastIDByNodeID(b.db, nodeId)
if err != nil {
return nil, fmt.Errorf("no existing ATX found: %w", err)
}
atx, err := atxs.Get(b.db, atxid)
if err != nil {
return nil, fmt.Errorf("failed to retrieve ATX: %w", err)
}
if atx.NIPost == nil {
return nil, errors.New("no NIPoST found in last ATX")
}
if atx.CommitmentATX == nil {
if commitmentAtx, err := atxs.CommitmentATX(b.db, nodeId); err != nil {
return nil, fmt.Errorf("failed to retrieve commitment ATX: %w", err)
} else {
atx.CommitmentATX = &commitmentAtx
}
}
if atx.VRFNonce == nil {
if nonce, err := atxs.VRFNonce(b.db, nodeId, b.layerClock.CurrentLayer().GetEpoch()); err != nil {
return nil, fmt.Errorf("failed to retrieve VRF nonce: %w", err)
} else {
atx.VRFNonce = &nonce
}
}

b.log.Info("found POST in an existing ATX", zap.String("atx_id", atxid.Hash32().ShortString()))
return &nipost.Post{
Nonce: atx.NIPost.Post.Nonce,
Indices: atx.NIPost.Post.Indices,
Pow: atx.NIPost.Post.Pow,
Challenge: atx.NIPost.PostMetadata.Challenge,
NumUnits: atx.NumUnits,
CommitmentATX: *atx.CommitmentATX,
VRFNonce: *atx.VRFNonce,
}, nil
}

func (b *Builder) obtainPost(ctx context.Context, nodeID types.NodeID) (*nipost.Post, error) {
b.log.Info("looking for POST for poet certification")
post, err := nipost.GetPost(b.localDB, nodeID)
switch {
case err == nil:
b.log.Info("found POST in local DB")
return post, nil
case errors.Is(err, sql.ErrNotFound):
// no post found
default:
return nil, fmt.Errorf("loading initial post from db: %w", err)
}

b.log.Info("POST not found in local DB. Trying to obtain POST from an existing ATX")
if post, err := b.obtainPostFromLastAtx(ctx, nodeID); err == nil {
b.log.Info("found POST in an existing ATX")
if err := nipost.AddPost(b.localDB, nodeID, *post); err != nil {
b.log.Error("failed to save post", zap.Error(err))
}
return post, nil
}
func (b *Builder) run(ctx context.Context, sig *signing.EdSigner) {
defer b.log.Info("atx builder stopped")

b.log.Info("POST not found in existing ATXs. Generating the initial POST")
for {
post, postInfo, err := b.buildInitialPost(ctx, nodeID)
err := b.buildInitialPost(ctx, sig.NodeID())
if err == nil {
return &nipost.Post{
Nonce: post.Nonce,
Indices: post.Indices,
Pow: post.Pow,
Challenge: shared.ZeroChallenge,
NumUnits: postInfo.NumUnits,
CommitmentATX: postInfo.CommitmentATX,
VRFNonce: *postInfo.Nonce,
}, nil
break
}
b.log.Error("failed to generate initial proof:", zap.Error(err))
currentLayer := b.layerClock.CurrentLayer()
select {
case <-ctx.Done():
return nil, ctx.Err()
return
case <-b.layerClock.AwaitLayer(currentLayer.Add(1)):
}
}
}

func (b *Builder) run(ctx context.Context, sig *signing.EdSigner) {
defer b.log.Info("atx builder stopped")

post, err := b.obtainPost(ctx, sig.NodeID())
if err != nil {
b.log.Error("failed to obtain post for certification", zap.Error(err))
return
}
b.certifyPost(ctx, sig.NodeID(), post)
b.certifier.CertifyAll(ctx, sig.NodeID(), b.poets)

for {
err := b.PublishActivationTx(ctx, sig)
Expand Down Expand Up @@ -616,7 +539,7 @@
}, post.NumUnits)
if err != nil {
logger.Error("initial POST is invalid", zap.Error(err))
if err := nipost.RemovePost(b.localDB, nodeID); err != nil {

Check warning on line 542 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L542

Added line #L542 was not covered by tests
logger.Fatal("failed to remove initial post", zap.Error(err))
}
return nil, fmt.Errorf("initial POST is invalid: %w", err)
Expand Down Expand Up @@ -746,17 +669,9 @@
func (b *Builder) createAtx(
ctx context.Context,
sig *signing.EdSigner,
challenge *types.NIPostChallenge,
) (*types.ActivationTx, error) {
pubEpoch := challenge.PublishEpoch
// TODO: in future, encode the right NiPoST challenge version depending on the pubEpoch.
challengeHash := wire.NIPostChallengeToWireV1(challenge).Hash()

b.smeshingMutex.Lock()
certifier := b.certifiers[sig.NodeID()]
b.smeshingMutex.Unlock()

nipostState, err := b.nipostBuilder.BuildNIPost(ctx, sig, challenge.PublishEpoch, challengeHash, certifier)
challenge *wire.NIPostChallengeV1,
) (*wire.ActivationTxV1, error) {
nipostState, err := b.nipostBuilder.BuildNIPost(ctx, sig, challenge.PublishEpoch, challenge.Hash())
if err != nil {
return nil, fmt.Errorf("build NIPost: %w", err)
}
Expand Down
8 changes: 5 additions & 3 deletions activation/activation_multi_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -250,11 +250,11 @@ func Test_Builder_Multi_InitialPost(t *testing.T) {
},
nil,
)
_, err := tab.obtainPost(context.Background(), sig.NodeID())
err := tab.buildInitialPost(context.Background(), sig.NodeID())
require.NoError(t, err)

// postClient.Proof() should not be called again
_, err = tab.obtainPost(context.Background(), sig.NodeID())
err = tab.buildInitialPost(context.Background(), sig.NodeID())
require.NoError(t, err)
return nil
})
Expand Down Expand Up @@ -397,7 +397,9 @@ func Test_Builder_Multi_HappyPath(t *testing.T) {
VRFNonce: types.VRFPostIndex(rand.Uint64()),
}
nipostState[sig.NodeID()] = state
tab.mnipost.EXPECT().BuildNIPost(gomock.Any(), sig, ref.Publish, ref.Hash(), gomock.Any()).Return(state, nil)
tab.mnipost.EXPECT().
BuildNIPost(gomock.Any(), sig, ref.PublishEpoch, ref.Hash()).
Return(state, nil)

// awaiting atx publication epoch log
tab.mclock.EXPECT().CurrentLayer().DoAndReturn(
Expand Down