New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backport - sort of - 1.2] Prevent multiple stream processors - fix and narrow test #8101
[backport - sort of - 1.2] Prevent multiple stream processors - fix and narrow test #8101
Conversation
...src/main/java/io/camunda/zeebe/broker/system/partitions/impl/PartitionTransitionProcess.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you mentioned you will not follow my request I don't know what else I should review here (#8059 (comment) )
The purpose of this PR is to separate consensus from ongoing discussions.
The property based test has been run and in my mind validated the fix. Last week I also ran it with chains of six consecutive transitions and found no inconsistency. This gives me the confidence to merge the fix into the stable branch.
Please also indicate whether you think we need to add the property based test to stable/1.2. In my mind, it will be sufficient to introduce it in develop.
You know my opinion. I would like to have it on stable and close to the dev. Since we continue working on dev it might happen that dev property tests are green, but the property test in stable are not right? But you or @deepthidevaki might have an different opinion.
IMO adding property test to the stable branch is a good idea. But it would be ok to do this as part of a separate PR because we are still discussing about the tests on the other PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Do we want to toggle the transition feature flag back to using the new one?
One small thing - This PR is mostly the backport of the fix done in develop, right? If yes, I would suggest to change the title as such and may be link the original PR in the PR description.
Good question. @Zelldon @deepthidevaki @npepinpe What do you think? |
With this commit the stream processor is always set in the transition context, regardless of whether it could be started successfully or not. The main benefit is that the reference to the newly created stream processor is always stored in the context, and so it can be found there and closed. Previously, if the opening of the stream processor failed, it was in an undefined state - potentially running and potentially registered in ActorScheduler which holds a permanent reference to it.
b544ef8
to
c6cc35f
Compare
bors merge |
8101: [backport - sort of - 1.2] Prevent multiple stream processors - fix and narrow test r=pihme a=pihme ## Description Changes the transition logic as follows: - preparation/cleanup is done for all steps (not just the steps started by the last iteration) - preparation/cleanup is done in the context of the next transition, not in the context of the last transition. As a consequence, preparation/cleanup will be executed more often than transitionTo. This can also be seen in the log for the new test case. Essentially, when a transition is "skipped" then only it's transitionTo is skipped, but the cleanup is executed anyway. I think one could improve that by making the cleanup react to the cancel signal, but I want to be conservative here. Also, multiple cleanup calls should be fast, because if a cleanup succeeds it sets e.g. the stream processor to null in the context, and any subsequent call will do nothing if it finds no stream processor. Previously: - The old transition did clean up the steps that were started by it - The cleanup assumed that the transitionTo will immediately follow, but this was not a given. The transitionTo might be cancelled, and might eventually transition to a completely different role. - So in essence, it did prepare for a role that maybe never came. ## Related issues closes #8044 subset of #8059 Essentially the same changes as #8062 (develop branch) ## Review Hints This PR is a subset of #8059. It contains the fix and commits related to the first round of review comments. It does not contain the property based test, which is still the object of further discussion. The purpose of this PR is to separate consensus from ongoing discussions. The property based test has been run and in my mind validated the fix. Last week I also ran it with chains of six consecutive transitions and found no inconsistency. This gives me the confidence to merge the fix into the stable branch. Please also indicate whether you think we need to add the property based test to `stable/1.2`. In my mind, it will be sufficient to introduce it in develop. Co-authored-by: pihme <pihme@users.noreply.github.com>
Build failed: Reason: Flaky test |
We definitely want to enable it on develop. As for 1.2, how confident do we feel about it at the moment? I would defer to the judgement of those who were involved at the moment - however if you want, I can take a deeper look at the PR and the test coverage and give a more detailed opinion. |
bors retry |
8101: [backport - sort of - 1.2] Prevent multiple stream processors - fix and narrow test r=pihme a=pihme ## Description Changes the transition logic as follows: - preparation/cleanup is done for all steps (not just the steps started by the last iteration) - preparation/cleanup is done in the context of the next transition, not in the context of the last transition. As a consequence, preparation/cleanup will be executed more often than transitionTo. This can also be seen in the log for the new test case. Essentially, when a transition is "skipped" then only it's transitionTo is skipped, but the cleanup is executed anyway. I think one could improve that by making the cleanup react to the cancel signal, but I want to be conservative here. Also, multiple cleanup calls should be fast, because if a cleanup succeeds it sets e.g. the stream processor to null in the context, and any subsequent call will do nothing if it finds no stream processor. Previously: - The old transition did clean up the steps that were started by it - The cleanup assumed that the transitionTo will immediately follow, but this was not a given. The transitionTo might be cancelled, and might eventually transition to a completely different role. - So in essence, it did prepare for a role that maybe never came. ## Related issues closes #8044 subset of #8059 Essentially the same changes as #8062 (develop branch) ## Review Hints This PR is a subset of #8059. It contains the fix and commits related to the first round of review comments. It does not contain the property based test, which is still the object of further discussion. The purpose of this PR is to separate consensus from ongoing discussions. The property based test has been run and in my mind validated the fix. Last week I also ran it with chains of six consecutive transitions and found no inconsistency. This gives me the confidence to merge the fix into the stable branch. Please also indicate whether you think we need to add the property based test to `stable/1.2`. In my mind, it will be sufficient to introduce it in develop. Co-authored-by: pihme <pihme@users.noreply.github.com>
On develop the old code has been removed for a month now, so there is no other option right now. I feel confident to switch to the new code in |
Build failed: |
bors retry |
Build succeeded: |
Description
Changes the transition logic as follows:
As a consequence, preparation/cleanup will be executed more often than transitionTo. This can also be seen in the log for the new test case. Essentially, when a transition is "skipped" then only it's transitionTo is skipped, but the cleanup is executed anyway. I think one could improve that by making the cleanup react to the cancel signal, but I want to be conservative here. Also, multiple cleanup calls should be fast, because if a cleanup succeeds it sets e.g. the stream processor to null in the context, and any subsequent call will do nothing if it finds no stream processor.
Previously:
Related issues
closes #8044
subset of #8059
Essentially the same changes as #8062 (develop branch)
Review Hints
This PR is a subset of #8059. It contains the fix and commits related to the first round of review comments.
It does not contain the property based test, which is still the object of further discussion.
The purpose of this PR is to separate consensus from ongoing discussions.
The property based test has been run and in my mind validated the fix. Last week I also ran it with chains of six consecutive transitions and found no inconsistency. This gives me the confidence to merge the fix into the stable branch.
Please also indicate whether you think we need to add the property based test to
stable/1.2
. In my mind, it will be sufficient to introduce it in develop.Definition of Done
Code changes:
backport stable/0.25
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation: