core/leader: run lead func synchronously #497

Merged
merged 5 commits into from Feb 10, 2017

Projects

None yet

3 participants

@jbowens
Member
jbowens commented Feb 9, 2017 edited

This refactors the core/leader package and how it's used by cmd/cored:

  • the lead closure is run synchronously within the leader.Run loop
  • the global IsLeading value isn't updated until the lead closure completes
  • thecmd/cored's lead closure calls Recover instead of pushing the responsibility to the core/fetch and core/generator packages
  • the leader.Run loop responsible for invoking lead is separated from the loop responsible for detecting leadership changes

This ensures that all leadership-only routines and code paths are guaranteed to have a recovered, current blockchain state. It also ensures that if blockchain recovery takes longer than 10 seconds, a new leader won't be demoted prematurely.

In a followup PR, we'll be able to remove this goroutine from cmd/cored, move the pin creating into the lead closure and delete the protocol.Chain.Ready function.

Fixes #489.

@jbowens jbowens added the PTAL label Feb 9, 2017
- update(ctx, l)
+ var leadCtx context.Context
+ var cancel func()
+ for leader := range leadershipChanges(ctx, l) {
@kr
kr Feb 9, 2017 Member

Sending a value over this channel raises the possibility that it might not strictly alternate. Would it be better to track that in this loop? like

leader := false
for range leadershipChanges(ctx, l) {
    leader = !leader
    if leader ...
}

Not sure, what do you think?

@jbowens
jbowens Feb 9, 2017 edited Member

Like if something else sent a value over the channel? Right now, the leader value is ferried directly from the source of truth, which is nice. Moving the leader value into this loop would make it clearly strictly alternate, but introduce the possibility that it might be on the wrong state.

What do you think about something like

var leadCtx context.Context
var cancel func()
for {
    <-promotedCh
    log.Messagef(ctx, "I am the core leader")
    leadCtx, cancel = context.WithCancel(ctx)
    l.lead(leadCtx)
    isLeading.Store(true)

    <-demotedCh
    log.Messagef(ctx, "No longer core leader")
    cancel()
    isLeading.Store(false)
}
@kr
kr Feb 9, 2017 Member

Yeah I guess either way it depends on the leadershipChanges loop behaving correctly. There are two invariants it provides:

  • each value is the opposite of the previous value
  • the first value is true

My suggestion just moved us from depending on that first invariant to depending on the second one.

On second thought, I think it's ok to depend on either one of those invariants, but we should probably document them both in leadershipChanges. So IMO this code is fine as-is (with the addition of a brief comment).

The two-channel option also seems ok, but it has the possibility of deadlocking if either loop gets out of sync. An advantage of your original approach here is that it would be easy for this loop to enforce the invariant by also tracking its own bool variable, if we wanted to be that paranoid. (I don't think it's necessary, but I wouldn't object, either.)

@jbowens
Member
jbowens commented Feb 9, 2017

updated with documentation of leadershipChanges's invariants

core/leader/leader.go
@@ -69,6 +69,12 @@ func Run(db *sql.DB, addr string, lead func(context.Context)) {
// is leader periodically. Every time the process becomes leader
// or is demoted from being a leader, it sends a bool on the
// returned channel.
+//
+// It provides the invariants:
+// * the first value sent on the channel is `true` when the
@kr
kr Feb 9, 2017 Member

Sorry to nitpick, but this could be interpreted as "the first value sent on the channel is true but only under certain conditions: when the process ...".

How about "The first value sent on the channel is true. (This will happen at the time the process is first elected leader.)"?

@jbowens
Member
jbowens commented Feb 10, 2017

PTAL

@kr
kr approved these changes Feb 10, 2017 View changes
@kr
Member
kr commented Feb 10, 2017

LGTM

jbowens added some commits Feb 9, 2017
@jbowens @chainbot jbowens core/leader: wip 8eb3604
@jbowens @chainbot jbowens synchronous recovery on leader promotion cee5798
@jbowens @chainbot jbowens nil leading 1980bb6
@jbowens @chainbot jbowens document leadershipChanges invariants 89940d2
@jbowens jbowens clarify.
fb85f5c
@chainbot chainbot merged commit 1af749b into main Feb 10, 2017

3 checks passed

licence/cla Contributor License Agreement is signed.
Details
wercker/cored Wercker pipeline passed
Details
wercker/java Wercker pipeline passed
Details
@chainbot chainbot deleted the leadership branch Feb 10, 2017
@jbowens jbowens added a commit that referenced this pull request Feb 10, 2017
@jbowens jbowens protocol: remove Chain.Ready
With the changes from #497, we can now move the block processor pin
creation into the `leader.Run` closure after the blockchain is
recovered.
99a2161
@chainbot chainbot added a commit that referenced this pull request Feb 10, 2017
@jbowens @chainbot jbowens + chainbot protocol: remove Chain.Ready
With the changes from #497, we can now move the block processor pin
creation into the `leader.Run` closure after the blockchain is
recovered.
c8919db
@chainbot chainbot added a commit that referenced this pull request Feb 10, 2017
@jbowens @chainbot jbowens + chainbot protocol: remove Chain.Ready
With the changes from #497, we can now move the block processor pin
creation into the `leader.Run` closure after the blockchain is
recovered.

Closes #505
b7fc25b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment