Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acquiring a write lock in the Journal blocks installing a received snapshot for ever #8369

Closed
romansmirnov opened this issue Dec 13, 2021 · 0 comments · Fixed by #8372
Closed
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug version:1.3.0 Marks an issue as being completely or in parts released in 1.3.0

Comments

@romansmirnov
Copy link
Member

romansmirnov commented Dec 13, 2021

Describe the bug

Based on the observations made in #7992, the follower could not install a received snapshot: #7992 (comment)

According to the thread dump in #7992, the installation of the snapshot is blocked when trying to acquire a write lock:

image

The write lock cannot be acquired, because a read lock is already held which was acquired when trying to open a reader. But the read lock will never be released because opening the reader failed with the exception "java.lang.IllegalStateException: Segment not open:

https://github.com/camunda-cloud/zeebe/blob/2dee25fcb748d8a9c6457eee465d08fd5a4d5471/journal/src/main/java/io/camunda/zeebe/journal/file/SegmentedJournal.java#L211-L218

To Reproduce

  1. Acquire a read lock, when trying to open a reader
  2. Let opening the reader fail
  3. Try to install a new snapshot, which will acquire a write lock on the same journal

Observed Behavior:

  • The Raft thread cannot acquire the write lock, because a read lock is held already.
  • This blocks the Raft thread so that the snapshot cannot be installed.
  • Additionally, an Actor Thread that executes an Actor Job to compact the log is waiting for the completion of a future which must be completed by the blocked Raft thread.
    image

Expected behavior

  • When opening the reader fails unexpectedly, the read lock is always released.
  • The Raft thread and other threads are not blocked by it.

Environment:

  • Zeebe Version: 1.2.5

related to #7992

@romansmirnov romansmirnov added kind/bug Categorizes an issue or PR as a bug 1.2.6 labels Dec 13, 2021
@romansmirnov romansmirnov self-assigned this Dec 13, 2021
@romansmirnov romansmirnov added this to In progress in Zeebe Dec 13, 2021
@romansmirnov romansmirnov moved this from In progress to Review in progress in Zeebe Dec 13, 2021
ghost pushed a commit that referenced this issue Dec 14, 2021
8363: Allow multiple parallel releases of the benchmark images r=npepinpe a=npepinpe

## Description

This PR allows multiple parallel releases by using a temporary working directory for every run. This is particularly useful when building multiple patch releases at the same time.



8372: fix(journal): always release acquired read lock r=romansmirnov a=romansmirnov

## Description

* Release read lock in a `finally` block so that the lock is also released in a failure case

<!-- Please explain the changes you made here. -->

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #8369 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
Co-authored-by: Roman <roman.smirnov@camunda.com>
ghost pushed a commit that referenced this issue Dec 14, 2021
8372: fix(journal): always release acquired read lock r=romansmirnov a=romansmirnov

## Description

* Release read lock in a `finally` block so that the lock is also released in a failure case

<!-- Please explain the changes you made here. -->

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #8369 



Co-authored-by: Roman <roman.smirnov@camunda.com>
@ghost ghost closed this as completed in df9710c Dec 14, 2021
@ghost ghost closed this as completed in #8372 Dec 14, 2021
Zeebe automation moved this from Review in progress to Done Dec 14, 2021
ghost pushed a commit that referenced this issue Dec 16, 2021
8346: [Backport stable/1.2] fix(gtw/jobs): ignore notifications if already scheduled r=romansmirnov a=github-actions[bot]

# Description
Backport of #8317 to `stable/1.2`.

relates to #8267

8387: [Backport stable/1.2] fix(journal): always release acquired read lock r=romansmirnov a=github-actions[bot]

# Description
Backport of #8372 to `stable/1.2`.

relates to #8369

Co-authored-by: Roman <roman.smirnov@camunda.com>
@korthout korthout added the version:1.3.0 Marks an issue as being completely or in parts released in 1.3.0 label Jan 4, 2022
@KerstinHebel KerstinHebel removed this from Done in Zeebe Mar 23, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug version:1.3.0 Marks an issue as being completely or in parts released in 1.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants