Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream management #798

Merged
merged 5 commits into from
Apr 1, 2022
Merged

Stream management #798

merged 5 commits into from
Apr 1, 2022

Conversation

nikkolasg
Copy link
Collaborator

@nikkolasg nikkolasg commented Mar 15, 2021

This PR revisits the stream mode and makes sure the go routines are exited from any errors. Fixes #795

@nikkolasg
Copy link
Collaborator Author

Failing on test broadcast, should get #797 merged first

core/drand_public.go Outdated Show resolved Hide resolved
core/drand_public.go Outdated Show resolved Hide resolved
@nikkolasg
Copy link
Collaborator Author

@hsanjuan I've ported most of the syncing logic in one module and make drand use this module only - there were basically three different places where we had syncing logic.
I think it's much more robust than before, and might fix the "node don't sync" bug we've been seeing - there is an explicit logic to restart syncing if things stall.

@nikkolasg
Copy link
Collaborator Author

TestBroadcast failing - need to merge #797 first

@nikkolasg
Copy link
Collaborator Author

I quite don't understand the race issue being raised by the test:
It happens at the following line:

/go/src/github.com/drand/drand/chain/beacon/sync_manager.go:236 +0x290

It tells there are two writes at this line. However this line is only defining a function

send := func(b *chain.Beacon) bool { ... }

That's really weird, I haven't seen something like this before.

Copy link
Member

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is all of this implicit in which beacon chain it's in the context of? i guess so, but was surprised to not find any reference of the beacon id throughout.

chain/beacon/node.go Show resolved Hide resolved
Client: bp.privGateway,
Clock: bp.opts.clock,
})
go syncer.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason not to have Run() as a private method that's started as a goroutine when calling NewSyncManager, vs having it as a separate method that's started as a go routine by the caller immediately after creation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I don't really have any preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least like that, it's explicit that it's running in its own go routine? If you feel I should change it, I can change it. Let me know.

chain/beacon/chain.go Show resolved Hide resolved
@AnomalRoil
Copy link
Member

@willscott Yes, it's implicit: a drand daemon can have multiple beacon processes (with different beacon IDs), each registering a beacon handler, which has a chainStore, which has a SyncManager.

func (s *SyncManager) tryNode(global context.Context, upTo uint64, n net.Peer) bool {
// we put a cancel to still keep the global context open but stop with this
// peer if things go sideway
cnode, cancel := context.WithCancel(global)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a context with a timeout here, or do we rely on the GRPC client to timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we started trying to plumb life-cycles through a bit more previously, but didn't really get them complete so I don't think we need to require that as part of this PR either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Kind: Refactoring / Cleanup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

potential goroutine leak from metrics
4 participants