Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

liveness: allow registering callbacks after start #107265

Merged
merged 13 commits into from
Jul 26, 2023

Commits on Jul 24, 2023

  1. kvserver: avoid implicit nils in TestReplicaLeaseCounters

    The two removed fields are nil. This made a test failure during the refactors
    in this PR more annoying. If we're going to set up a half-inited NodeLiveness,
    let's at least be honest about it.
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    7c2b6ea View commit details
    Browse the repository at this point in the history
  2. gossip: add GetNodeID accessor

    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    708435a View commit details
    Browse the repository at this point in the history
  3. liveness: add and adopt Gossip interface

    This will help with testing.
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    7d32c4f View commit details
    Browse the repository at this point in the history
  4. liveness: rename storage -> storageImpl

    We'll give this a proper interface soon.
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    31b9ec4 View commit details
    Browse the repository at this point in the history
  5. liveness: export storage methods

    So that it can implement a public interface.
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    beb2c9b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ea7cc94 View commit details
    Browse the repository at this point in the history
  7. liveness: export LivenessUpdate

    It's needed to implement the liveness storage (once it exists).
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    52792fc View commit details
    Browse the repository at this point in the history
  8. liveness: half-adopt Storage

    We still want a Storage to be passed into NewNodeLiveness as opposed to a
    `*kv.DB`, but so far, so good.
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    ef8410e View commit details
    Browse the repository at this point in the history
  9. liveness: finish adopting Storage

    Now Liveness is constructed using a `Storage` as opposed to a `*kv.DB`.
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    f865176 View commit details
    Browse the repository at this point in the history
  10. liveness: allow registering callbacks after start

    I discovered[^1] a deadlock scenario when multiple nodes in the cluster restart
    with additional stores that need to be bootstrapped. In that case, liveness
    must be running when the StoreIDs are allocated, but it is not.
    
    Trying to address this problem, I realized that when an auxiliary Store is bootstrapped,
    it will create a new replicateQueue, which will register a new callback into NodeLiveness.
    
    But if liveness must be started at this point to fix cockroachdb#106706, we'll run into the assertion
    that checks that we don't register callbacks on a started node liveness.
    
    Something's got to give: we will allow registering callbacks at any given point
    in time, and they'll get an initial set of notifications synchronously. I
    audited the few users of RegisterCallback and this seems OK with all of them.
    
    [^1]: cockroachdb#106706 (comment)
    
    Epic: None
    Release note: None
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    f7e72f9 View commit details
    Browse the repository at this point in the history
  11. liveness: only call onSelfHeartbeat on self

    I think there was a bug here. This method was previously invoked in
    `updateLiveness`, but that method is the general workhorse for updating
    anyone's liveness. In particular, it is called by `IncrementEpoch`. So we were
    invoking `onSelfHeartbeat` when we would increment other nodes' epochs. This
    doesn't seem great.
    
    Additionally, the code was trying to avoid invoking this callback before
    liveness was officially "started". Heartbeating yourself before liveness is
    started is unfortunately a thing due to the tangled start-up initialization
    sequence; we may see heartbeats triggered by lease requests.
    
    Avoid both complications by invoking `onSelfCallback` from the actual main
    heartbeat loop, whose only job is to heartbeat the own liveness record.
    
    I tried to adopt `TestNodeHeartbeatCallback` to give better coverage, but
    it's a yak shave. A deterministic node liveness (i.e. a way to invoke the
    main heartbeat loop manually) would make this a lot simpler. I filed an
    issue to that effect:
    
    cockroachdb#107452
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    3bf63a6 View commit details
    Browse the repository at this point in the history
  12. liveness: move liveness and store regex to globals

    Helps with testing.
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    4bf5865 View commit details
    Browse the repository at this point in the history
  13. liveness: add test for IsLiveCallback invocation

    This tests that regardless of when a callback is registered, it gets called.
    tbg committed Jul 24, 2023
    Configuration menu
    Copy the full SHA
    d767731 View commit details
    Browse the repository at this point in the history