Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all DKG packets are now gossiped throughout the network #1230

Merged
merged 31 commits into from
Jul 6, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
e7cf3e5
DKG messages between participants are now signed and verified
CluEleSsUK May 2, 2023
c49a965
all DKG packets are now gossiped throughout the network
CluEleSsUK May 5, 2023
608a84b
fixed a refactor bug
CluEleSsUK May 5, 2023
4226f5e
linter
CluEleSsUK May 5, 2023
3f7a30c
fixed race condition
CluEleSsUK May 5, 2023
d51d8e7
import reordering D:
CluEleSsUK May 5, 2023
bfbb0ed
wait for nodes to be alive in LargeDKG test before running DKG
CluEleSsUK May 9, 2023
ec9b320
ignore leaver errors in proposal
CluEleSsUK May 9, 2023
edf0f34
move packet dedupe to avoid races
CluEleSsUK May 9, 2023
06cb6af
test lower retries for large DKG
CluEleSsUK May 9, 2023
a4b2e60
added some more schemes from env
CluEleSsUK May 9, 2023
358ebd7
fixed bug where packets were being re-signed when gossiping them!
CluEleSsUK May 9, 2023
e7cc8af
remove the unnecessary (racy) brokenbroadcaster
CluEleSsUK May 9, 2023
29773a1
updated with the rebased signatures branch
CluEleSsUK May 15, 2023
8c24046
moved comment, removed sleep, regenerated proto
CluEleSsUK Jun 6, 2023
8ff54dc
addressed PR comments
CluEleSsUK Jun 13, 2023
cc28768
packet participant arrays checked to avoid malicious nulls taking things
CluEleSsUK Jun 23, 2023
6633058
additional nil checks
CluEleSsUK Jun 23, 2023
7179d25
added DKG failed state in for DKGs that don't hit threshold
CluEleSsUK Jun 23, 2023
67d198c
fixed linting complaints
CluEleSsUK Jun 23, 2023
f9efcc1
fixed some unit tests
CluEleSsUK Jun 23, 2023
5919617
use beaconID from metadata rather than passing it around
CluEleSsUK Jul 4, 2023
daada49
removed mysterious debug log setting
CluEleSsUK Jul 4, 2023
dc92fe5
removed unnecessary grpc call options
CluEleSsUK Jul 4, 2023
0965277
Update internal/dkg/actions.go
CluEleSsUK Jul 4, 2023
b92f25e
advance clock a bit before running failed DKG in test
CluEleSsUK Jul 6, 2023
a63094a
added clock to DKG runner instead of using system time
CluEleSsUK Jul 6, 2023
84b1c9f
use mock clock for determining all DKG timings
CluEleSsUK Jul 6, 2023
c045951
fixed some references to clocks
CluEleSsUK Jul 6, 2023
8b2dec2
fixed an old test where multiple clocks were being created
CluEleSsUK Jul 6, 2023
a44a01b
added a sleep in a test for science
CluEleSsUK Jul 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion client/grpc/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,6 @@ func TestClientClose(t *testing.T) {
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
//nolint:revive // we drain the channel
for range c.Watch(context.Background()) {
}
wg.Done()
Expand Down
3 changes: 1 addition & 2 deletions client/http/http_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ func TestHTTPWatch(t *testing.T) {
if len(first.Randomness()) == 0 {
t.Fatal("should get randomness from watching")
}
//nolint:revive // draining channel until the context expires

for range result {
}
_ = httpClient.Close()
Expand Down Expand Up @@ -186,7 +186,6 @@ func TestHTTPClientClose(t *testing.T) {
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
//nolint:revive // draining channel
for range httpClient.Watch(context.Background()) {
}
wg.Done()
Expand Down
2 changes: 1 addition & 1 deletion client/lp2p/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ func (c *Client) Watch(ctx context.Context) <-chan client.Result {
c.log.Debugw("client.Watch done")
end()
// drain leftover on innerCh
for range innerCh { //nolint:revive
for range innerCh {
}
c.log.Debugw("client.Watch finished draining the innerCh")
return
Expand Down
1 change: 0 additions & 1 deletion client/optimizing.go
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,6 @@ LOOP:
stats = append(stats, rr.stat)
res = rr.result
if rr.err != nil && !errors.Is(rr.err, common.ErrEmptyClientUnsupportedGet) {
//nolint:errorlint
err = fmt.Errorf("%v - %w", err, rr.err)
} else if rr.err == nil {
err = nil
Expand Down
1 change: 1 addition & 0 deletions crypto/schemes.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
bls "github.com/drand/kyber-bls12381"
"github.com/drand/kyber/pairing"
"github.com/drand/kyber/sign"

// The package github.com/drand/kyber/sign/bls is deprecated because it is vulnerable to
// rogue public-key attack against BLS aggregated signature. The new version of the protocol can be used to
// make sure a signature aggregate cannot be verified by a forged key. You can find the protocol in kyber/sign/bdn.
Expand Down
60 changes: 53 additions & 7 deletions internal/core/drand_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,23 @@ import (
"context"
"errors"
"fmt"
"github.com/drand/drand/common"
dkg3 "github.com/drand/drand/protobuf/crypto/dkg"
"google.golang.org/grpc"
"google.golang.org/protobuf/types/known/timestamppb"
"io"
"os"
"path"
"strings"
"testing"
"time"

"google.golang.org/grpc"
"google.golang.org/protobuf/types/known/timestamppb"

"github.com/drand/drand/common"
dkg3 "github.com/drand/drand/protobuf/crypto/dkg"

"github.com/jonboulle/clockwork"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

chain2 "github.com/drand/drand/common/chain"
"github.com/drand/drand/common/key"
"github.com/drand/drand/crypto"
Expand All @@ -26,9 +32,6 @@ import (
context2 "github.com/drand/drand/internal/test/context"
"github.com/drand/drand/internal/test/testlogger"
"github.com/drand/drand/protobuf/drand"
"github.com/jonboulle/clockwork"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

func setFDLimit(t testing.TB) {
Expand Down Expand Up @@ -1266,6 +1269,49 @@ func TestDKGPacketWithoutMetadata(t *testing.T) {
require.Error(t, err)
}

func TestDKGPacketWithNilInArray(t *testing.T) {
t.Setenv("DRAND_TEST_LOGS", "DEBUG")
CluEleSsUK marked this conversation as resolved.
Show resolved Hide resolved
beaconID := "blah"
scenario := NewDrandTestScenario(t, 2, 2, 1*time.Second, beaconID, clockwork.NewFakeClockAt(time.Now()))

// the first slot will be nil
joiners := make([]*drand.Participant, len(scenario.nodes)+1)
for i, node := range scenario.nodes {
identity := node.drand.priv.Public
pk, err := identity.Key.MarshalBinary()
if err != nil {
t.Fatal(err)
}
// + 1 here, so the first entry is nil
joiners[i+1] = &drand.Participant{
Address: identity.Addr,
Tls: identity.TLS,
PubKey: pk,
Signature: identity.Signature,
}
}
err := scenario.nodes[0].dkgRunner.StartNetwork(2, 1, crypto.DefaultSchemeID, 1*time.Minute, 1, joiners)

require.NoError(t, err)
}

func TestFailedReshareContinuesUsingOldGroupfile(t *testing.T) {
t.Setenv("DRAND_TEST_LOGS", "DEBUG")
CluEleSsUK marked this conversation as resolved.
Show resolved Hide resolved
period := 1 * time.Second
beaconID := "blah"
scenario := NewDrandTestScenario(t, 2, 2, period, beaconID, clockwork.NewFakeClockAt(time.Now()))

g, err := scenario.RunDKG(t)
require.NoError(t, err)

scenario.SetMockClock(t, g.GenesisTime)

leader := scenario.nodes[0]
err = scenario.RunFailingReshare()
require.Equal(t, test.ErrDKGFailed, err)
require.Equal(t, leader.drand.group, g)
}

// AddNodesWithOptions creates new additional nodes that can participate during the initial DKG.
// The options set will overwrite the existing ones.
func (d *DrandTestScenario) AddNodesWithOptions(t *testing.T, n int, beaconID string, opts ...ConfigOption) []*MockNode {
Expand Down
53 changes: 53 additions & 0 deletions internal/core/util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -469,6 +469,59 @@ func (d *DrandTestScenario) RunDKG(t *testing.T) (*key.Group, error) {
return groupFile, nil
}

func (d *DrandTestScenario) RunFailingReshare() error {
AnomalRoil marked this conversation as resolved.
Show resolved Hide resolved
if len(d.nodes) == 0 {
return errors.New("cannot run a DKG with 0 nodes in the drand test scenario")
}

remainers := make([]*drand.Participant, len(d.nodes))
for i, node := range d.nodes {
identity := node.drand.priv.Public
pk, err := identity.Key.MarshalBinary()
if err != nil {
return err
}
remainers[i] = &drand.Participant{
Address: identity.Addr,
Tls: identity.TLS,
PubKey: pk,
Signature: identity.Signature,
}
}

leader := d.nodes[0]
followers := d.nodes[1:]

err := leader.dkgRunner.StartProposal(
d.thr,
d.clock.Now().Add(10*time.Second),
1,
[]*drand.Participant{},
remainers,
[]*drand.Participant{},
)
if err != nil {
return err
}
for _, follower := range followers {
err = follower.dkgRunner.Accept()
if err != nil {
return err
}

follower.daemon.Stop(context.Background())
}

err = leader.dkgRunner.StartExecution()
if err != nil {
return err
}

// advance by the grace period so all nodes kick off the DKG
d.AdvanceMockClock(d.t, d.nodes[0].daemon.opts.dkgKickoffGracePeriod)
return leader.dkgRunner.WaitForDKG(leader.daemon.log, leader.dkgRunner.BeaconID, 2, 30)
}

// WaitForDKG waits for the DKG complete and returns the group file
// it takes the gorup file from the leader node and thus assumes the leader has not been evicted!
func (d *DrandTestScenario) WaitForDKG(t *testing.T, node *MockNode, epoch uint32, numberOfSeconds int) (*key.Group, error) {
Expand Down
5 changes: 5 additions & 0 deletions internal/dkg/actions.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ func (d *Process) gossip(
done := make(chan bool, 1)
errChan := make(chan error, 1)

// we first filter the recipients to avoid a malicious leader sending `nil`
CluEleSsUK marked this conversation as resolved.
Show resolved Hide resolved
recipients = util.Filter(recipients, func(p *drand.Participant) bool {
return p != nil && p.Address != ""
})

if len(recipients) == 0 {
done <- true
return done, errChan
Expand Down
3 changes: 3 additions & 0 deletions internal/dkg/actions_active.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ import (

//nolint:gocyclo //
func (d *Process) Command(ctx context.Context, command *drand.DKGCommand) (*drand.EmptyDKGResponse, error) {
if command == nil {
return nil, errors.New("command cannot be nil")
}
beaconID := command.Metadata.BeaconID
commandName := commandType(command)
d.lock.Lock()
Expand Down
4 changes: 4 additions & 0 deletions internal/dkg/actions_passive.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ func (d *Process) Packet(ctx context.Context, packet *drand.GossipPacket) (*dran
d.lock.Lock()
defer d.lock.Unlock()

AnomalRoil marked this conversation as resolved.
Show resolved Hide resolved
if packet == nil {
return nil, errors.New("packet cannot be nil")
}
// if there's no metadata on the packet, we won't be able to verify the signature or perform other state changes
if packet.Metadata == nil {
return nil, errors.New("packet missing metadata")
Expand Down Expand Up @@ -97,6 +100,7 @@ func commandType(command *drand.DKGCommand) string {
case *drand.DKGCommand_Join:
return "Joining"
case *drand.DKGCommand_Execute:
//nolint:goconst
return "Executing"
case *drand.DKGCommand_Abort:
return "Aborting"
Expand Down
15 changes: 14 additions & 1 deletion internal/dkg/execution.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,20 @@ func (d *Process) executeAndFinishDKG(ctx context.Context, beaconID string, conf
executeAndStoreDKG := func() error {
output, err := d.startDKGExecution(ctx, beaconID, current, &config)
if err != nil {
return err
// if the DKG doesn't reach threshold, we must transition to `Failed` instead of
// returning an error and rolling back
if !util.ErrorContains(err, "dkg: too many uncompliant new participants") && !util.ErrorContains(err, "dkg abort") {
return err
}

d.log.Errorw("DKG failed as too many nodes were evicted. Storing failed state")

next, err := current.Failed()
if err != nil {
return err
}

return d.store.SaveCurrent(beaconID, next)
}

finalState, err := current.Complete(output.FinalGroup, output.KeyShare)
Expand Down
Loading
Loading