Skip to content

SAMZA-2663: Handle job model expiration and new job model flows for multiple incomplete rebalances#1528

Merged
mynameborat merged 1 commit intoapache:masterfrom
mynameborat:standalone-rebalance-bug
Sep 11, 2021
Merged

SAMZA-2663: Handle job model expiration and new job model flows for multiple incomplete rebalances#1528
mynameborat merged 1 commit intoapache:masterfrom
mynameborat:standalone-rebalance-bug

Conversation

@mynameborat
Copy link
Contributor

Problem:
As part of SAMZA-2638, we introduced skipping container restart and stops on no changes to work assignment for processors across rebalances. However, we only update the active job model with the proposed job model on starting the container as part of onNewJobModel. This leads to a scenario where the processor is stopped but the future rebalances assume the container is still running. More information on scenario below.

Description:
Imagine the quorum is in steady state with job model version v1. A new rebalance occurs and the leader generates v2. Processor P1 has changes in work assignment and as a result stops the container as part of job model expiration. However, in the event of the rebalance being unsuccessful (barrier times out), a new rebalance occurs which generates a job model version v3. In the scenario where work assignment for P1 in v3 is same as v1, then the state transition assumes the processor hasn't stopped the container and proceeds to do an no-op.

Changes:

  • Track job model expiration
  • onNewJobModel triggers new job model as long as the active job model has been expired
  • Handle no change in work assignment optimization only during checkJobModelExpired flow.

Test:

  • Added unit test to cover the scenario of multiple incomplete rebalances

API Changes: None

Upgrade Instructions: None

Copy link
Contributor

@lhaiesp lhaiesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mynameborat mynameborat merged commit 516a40d into apache:master Sep 11, 2021
bringhurst added a commit to bringhurst/samza that referenced this pull request Sep 27, 2021
…ws for multiple incomplete rebalances (apache#1528)"

This reverts commit 516a40d.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants