Do not take snapshots after shutdown was requested #7571

pihme · 2021-08-02T14:48:47Z

Description

The change sets the db to null right away, so that any concurrent calls thereafter have no access to the database while it is closing.

It is not completely safe yet. The database could be set to 'nullbetween checking fornull` in line 187 and using the reference in line 194.

Related issues

related to #7188

Definition of Done

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/0.25) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
New content is added to the release announcement

npepinpe

As you pointed out, this reduces the rate of incidence of the bug but doesn't completely fix it. Since taking a snapshot may happen on a different thread, it could even be that db is already null when the check is done on line 187, but the value read is outdated. IIRC, the state controller is called from the AsyncSnapshotDirector and from the ZeebePartition, both of which are actors, and as such executed at times on different threads. It could be that we should make the state controller an actor as well, though that would widen the scope of this PR.

❓ Do you think that the race condition you highlighted is not likely enough/important enough to warrant spending more time, or do you see value in immediately mitigating it with the idea that we'll come back and fix it properly later? If it's the latter, then we shouldn't close the issue by merging this PR, as it's not really fixed. I personally would opt to fix it properly, but I'd like to hear your thoughts.

pihme · 2021-08-03T14:56:53Z

question Do you think that the race condition you highlighted is not likely enough/important enough to warrant spending more time, or do you see value in immediately mitigating it with the idea that we'll come back and fix it properly later?

I think it is an improvement over status quo. I also hope that with the other work we are doing and that Chris is doing this error won't be able to occur anymore, because we have better control over the shutdown sequence.

But I am also fine with leaving the issue open and revisiting it later.

npepinpe

👍

Let's merge this for now, but remove the closes from the PR description as I wouldn't close the issue yet.

pihme · 2021-08-04T07:47:57Z

bors r+

ghost · 2021-08-04T08:32:11Z

Build succeeded:

continuous-integration/jenkins/branch

github-actions · 2021-08-04T08:32:35Z

Backport failed for stable/1.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin stable/1.0
git worktree add -d .worktree/backport-7571-to-stable/1.0 origin/stable/1.0
cd .worktree/backport-7571-to-stable/1.0
git checkout -b backport-7571-to-stable/1.0
ancref=$(git merge-base be2bfabcd49de3185e0faa952f4bc7a227259148 c8fe46c88a785bbc52fdb6b54cec4fa1bcafc193)
git cherry-pick -x $ancref..c8fe46c88a785bbc52fdb6b54cec4fa1bcafc193

github-actions · 2021-08-04T08:32:37Z

Backport failed for stable/1.1, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin stable/1.1
git worktree add -d .worktree/backport-7571-to-stable/1.1 origin/stable/1.1
cd .worktree/backport-7571-to-stable/1.1
git checkout -b backport-7571-to-stable/1.1
ancref=$(git merge-base be2bfabcd49de3185e0faa952f4bc7a227259148 c8fe46c88a785bbc52fdb6b54cec4fa1bcafc193)
git cherry-pick -x $ancref..c8fe46c88a785bbc52fdb6b54cec4fa1bcafc193

npepinpe · 2021-09-07T12:22:54Z

It looks like we never backported this in the end? /cc @pihme

Or at least I don't see any linked PRs 🤔

pihme · 2021-09-07T12:26:13Z

Seems like it slipped through, yes.

7782: [Backports stable/1.1] Do not take snapshots after shutdown was requested r=pihme a=npepinpe ## Description Backport of #7571 to stable/1.0. ## Related issues relates to #7188 Co-authored-by: pihme <pihme@users.noreply.github.com>

7781: [Backports stable/1.0] Do not take snapshots after shutdown was requested r=pihme a=npepinpe ## Description Backport of #7571 to stable/1.0. ## Related issues relates to #7188 Co-authored-by: pihme <pihme@users.noreply.github.com>

fix(broker): do not take snapshots after shutdown was requested

c8fe46c

pihme requested a review from npepinpe August 2, 2021 14:48

pihme added backport stable/1.0 labels Aug 2, 2021

pihme added this to the Refactor Bootstrapping milestone Aug 2, 2021

npepinpe reviewed Aug 3, 2021

View reviewed changes

npepinpe approved these changes Aug 3, 2021

View reviewed changes

ghost merged commit 9588418 into develop Aug 4, 2021

ghost deleted the 7188-no-snaphots-after-shutdown branch August 4, 2021 08:32

This was referenced Sep 7, 2021

[Backports stable/1.0] Do not take snapshots after shutdown was requested #7781

Merged

[Backports stable/1.1] Do not take snapshots after shutdown was requested #7782

Merged

npepinpe added Release: 1.1.4 labels Oct 13, 2021

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not take snapshots after shutdown was requested #7571

Do not take snapshots after shutdown was requested #7571

pihme commented Aug 2, 2021 •

edited

Loading

npepinpe left a comment

pihme commented Aug 3, 2021

npepinpe left a comment

pihme commented Aug 4, 2021

ghost commented Aug 4, 2021

github-actions bot commented Aug 4, 2021

github-actions bot commented Aug 4, 2021

npepinpe commented Sep 7, 2021 •

edited

Loading

pihme commented Sep 7, 2021

Do not take snapshots after shutdown was requested #7571

Do not take snapshots after shutdown was requested #7571

Conversation

pihme commented Aug 2, 2021 • edited Loading

Description

Related issues

Definition of Done

npepinpe left a comment

Choose a reason for hiding this comment

pihme commented Aug 3, 2021

npepinpe left a comment

Choose a reason for hiding this comment

pihme commented Aug 4, 2021

ghost commented Aug 4, 2021

github-actions bot commented Aug 4, 2021

github-actions bot commented Aug 4, 2021

npepinpe commented Sep 7, 2021 • edited Loading

pihme commented Sep 7, 2021

pihme commented Aug 2, 2021 •

edited

Loading

npepinpe commented Sep 7, 2021 •

edited

Loading