KAFKA-13966 Set curClaimEpoch after we enqueue bootstrap write #12269
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch does two things to tighten up races during bootstrap. One is to simply move the volatile write until after the "bootstrapMetadata" event has been enqueued. The other thing is to prepend "bootstrapMetadata" to the queue.
From the JIRA, this test was flaky when we see the "registerBroker" event precede the "boostrapMetadata" event
The test harness in QuorumControllerTest is polling the controllers to see when their current leader epoch matches the expected leader epoch
Prior to this patch, we were setting the
curClaimEpoch
volatile before scheduling the "bootstrapMetadata" event. This creates a race where the test code can see the controller as active before the bootstrap metadata has been enqueued. By moving the volatile write after the "bootstrapMetadata" event enqueueing, we can eliminate this race.In production, there is nothing stopping clients from making requests to a controller that is in the process of starting up. Once the socket server begins processing requests, we will begin seeing events on the QuorumController queue. The check for active controller is done when events are processed, so we could have events enqueued at any time. By prepending the "bootstrapMetadata" event, we can ensure any events that see the controller as active will come after the bootstrap metadata has been processed.
If we handle a "registerBroker" event before "bootstrapMetadata" is enqueued, then we are sure to see
curClaimEpoch = -1
. If a "registerBroker" is waiting in the queue while we finish bootstrapping, then we are sure to enqueue "bootstrapMetadata" before it. This means "registerBroker" (and any other event) is guaranteed to see the controller as inactive or that it is active and the bootstrap metadata has been written.