-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Journal reset resulting in an intermediate state which is detected as corruption #12754
Comments
A potential solution might be to delete the segments in the reverse order, combined with the PR that fixed #12374. In a related issue, I discussed with @npepinpe a different approach. I was planning to create a separate issue for it, but I'm just dumping my thoughts here, as I think it is also relevant for this bug. Terminologies In-Sync replicas - The followers that are in sync with leaders and participating in the commit. In-sync replicas forms the quorum. Out-of-sync follower - The follower which is receiving the event potentially after it is already committed. In a setup with replicationFactor 3, there will be 2 in-sync replicas, out of which one will be the leader. The third one can be an out-of-sync follower. Raft guarantees that any committed event is available in all in-sync replicas. The events are replicated to the out-of-sync follower eventually. But there is no guarantee when they are replicated. Ideally, they are replicated immediately. But it is possible the follower is slow, or the network is slow, resulting the out-of-sync follower to lag behind the leader by a large amount. At some point, the leader sends a snapshot instead of the events, because the follower is lagging behind. At this point, the current behavior is as follows: Follower resets the log (delete all segments) The above process can lead to the following scenarios: Step 1 and 2 are not atomic. If the replica crashes in between, then it restarts with an empty state. In a normal scenario, the above situation is not problematic because the in-sync replicas are healthy and it is guaranteed that only one of the in-sync replica becomes the leader. However, in a disaster scenario where all in-sync replicas are gone, it might be acceptable to continue functioning with whatever data is available in the out-of-sync replica. It might not have up-to-date state, but for some use-cases it is ok to lose last X amount of data. In such cases, we want to enable users to recover the cluster from the state of the out-of-sync replica. However, in the above scenario the out-of-sync replica might be in a state where it's state is empty or the snapshot is not useable. So, it would be good if we could ensure that the state in the out-of-sync follower is always in a valid state. To acheive that we can Do not delete old snapshot and reset log immediately when it receives a snapshot. Solution 1: We keep the old snapshot + logs in a different folder, the new snapshot and new logs in a different folder. On restarts, it attempts to use the new snapshot + log and fall back to the old state if necessary. Solution 2: Do not reset the logs, when a new snapshot is received. Instead, add a marker record in the journal to indicate that there is a snapshot at the position. The readers must know how to handle these “gaps”. For example, if stream processor reader hits this marker record, it has to throw away its state and replace with the snapshot. If the raft leader reads this record, it should replicate the snapshot instead of the event etc. The compaction logic should take this into account, and compact only if there is one valid snapshot with its followup event in the log. So the follower can rollback to an old valid state if necessary. This would also help in some issue that we observed recently where restarting the node during resetting the log can incorrectly lead to detecting it as corrupted log. |
With #12868, we implemented a quick fix to prevent the specific case we're aware of. By deleting the segments in reverse order during reset, we can ensure that there are no gaps in the log/snapshot, and thus no corruption. Data loss is acceptable here as it was the desired outcome of the reset operation. However, we agreed not to close the issue - this quick fix only solves a specific thing, and not the root cause, which is that the install operation on the follower is not atomic, specifically persisting the snapshot/clearing the log/updating the meta store/getting the first record for the snapshot/etc. are all independent operations and the node may stop at any point in between, leaving us in a weird state. |
12868: fix(journal): avoid gaps in logs due to reset r=npepinpe a=npepinpe ## Description Avoid gaps in the log when resetting the log by deleting the segments in reverse order. This may cause data loss, but that's acceptable in the context of a reset operation. Deleting segments in reverse order will ensure we have no gaps between the remaining snapshot and segments should the operation fail midway. Note that this does not make persisting a new snapshot/reseting the log atomic unfortunately. Unfortunately tests were also not ideal, and require heavy use of mocks. It's also not possible to write integration test without major refactoring. As this is a quick fix, the hope is that the proper solution to make the Raft install operation atomic will redeem this. Not marked as closing the original issue because this is, after all, a quick fix. It only fixes a specific case we're aware of. ## Related issues <!-- Which issues are closed by this PR or are related --> related to #12754 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
12698: Replace go-bindata with go's embed FS r=npepinpe a=npepinpe ## Description Took some slack time to complete this quick win: replaces usage of go-bindata, an external tool dependency, with the built-in capabilities of the go compiler using `go:embed` and the embedFS. The version file is embedded at build time into the binary, much like when we used go-bindata, and read back as a package field. ## Related issues closes #6034 12868: fix(journal): avoid gaps in logs due to reset r=npepinpe a=npepinpe ## Description Avoid gaps in the log when resetting the log by deleting the segments in reverse order. This may cause data loss, but that's acceptable in the context of a reset operation. Deleting segments in reverse order will ensure we have no gaps between the remaining snapshot and segments should the operation fail midway. Note that this does not make persisting a new snapshot/reseting the log atomic unfortunately. Unfortunately tests were also not ideal, and require heavy use of mocks. It's also not possible to write integration test without major refactoring. As this is a quick fix, the hope is that the proper solution to make the Raft install operation atomic will redeem this. Not marked as closing the original issue because this is, after all, a quick fix. It only fixes a specific case we're aware of. ## Related issues <!-- Which issues are closed by this PR or are related --> related to #12754 12880: deps(go): bump github.com/docker/docker from 24.0.1+incompatible to 24.0.2+incompatible in /clients/go r=npepinpe a=dependabot[bot] Bumps [github.com/docker/docker](https://github.com/docker/docker) from 24.0.1+incompatible to 24.0.2+incompatible. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/docker/docker/releases">github.com/docker/docker's releases</a>.</em></p> <blockquote> <h2>v24.0.2</h2> <h2>24.0.2</h2> <p>For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones:</p> <ul> <li><a href="https://github.com/docker/cli/issues?q=is%3Aclosed+milestone%3A24.0.2">docker/cli, 24.0.2 milestone</a></li> <li><a href="https://github.com/moby/moby/issues?q=is%3Aclosed+milestone%3A24.0.2">moby/moby, 24.0.2 milestone</a></li> </ul> <h3>Bug fixes and enhancements</h3> <ul> <li>Fix a panic during build when referencing locally tagged images. <a href="https://redirect.github.com/moby/buildkit/pull/3899">moby/buildkit#3899</a>, <a href="https://redirect.github.com/moby/moby/pull/45582">moby/moby#45582</a></li> <li>Fix builds potentially failing with <code>exit code: 4294967295</code> when performing many concurrent build stages. <a href="https://redirect.github.com/moby/moby/pull/45620">moby/moby#45620</a></li> <li>Fix DNS resolution on Windows ignoring <code>etc/hosts</code> (<code>%WINDIR%\System32\Drivers\etc\hosts</code>), including resolution of <code>localhost</code>. <a href="https://redirect.github.com/moby/moby/pull/45562">moby/moby#45562</a></li> <li>Apply a workaround for a containerd bug that causes concurrent <code>docker exec</code> commands to take significantly longer than expected. <a href="https://redirect.github.com/moby/moby/pull/45625">moby/moby#45625</a></li> <li>containerd image store: Fix an issue where the image <code>Created</code> field would contain an incorrect value. <a href="https://redirect.github.com/moby/moby/pull/45623">moby/moby#45623</a></li> <li>containerd image store: Adjust the output of image pull progress so that the output has the same format regardless of whether the containerd image store is enabled. <a href="https://redirect.github.com/moby/moby/pull/45602">moby/moby#45602</a></li> <li>containerd image store: Switching between the default and containerd image store now requires a daemon restart. <a href="https://redirect.github.com/moby/moby/pull/45616">moby/moby#45616</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/moby/moby/commit/659604f9ee60f147020bdd444b26e4b5c636dc28"><code>659604f</code></a> Merge pull request <a href="https://redirect.github.com/docker/docker/issues/45625">#45625</a> from thaJeztah/24.0_backport_serialize_exec_starts_...</li> <li><a href="https://github.com/moby/moby/commit/6660133ffb91a3b754fe4ad16b87523dbc8493c8"><code>6660133</code></a> Merge pull request <a href="https://redirect.github.com/docker/docker/issues/45582">#45582</a> from thaJeztah/24.0_backport_vendor_buildkit_0.11.7...</li> <li><a href="https://github.com/moby/moby/commit/67b3563d096d21e45b92cbd1ecc2ce4bc72eab36"><code>67b3563</code></a> Merge pull request <a href="https://redirect.github.com/docker/docker/issues/45623">#45623</a> from vvoland/c8d-inspect-created-time-24</li> <li><a href="https://github.com/moby/moby/commit/7a4ea198032957918ffb4359d1b621b8cfd82201"><code>7a4ea19</code></a> libcontainerd: work around exec start bug in c8d</li> <li><a href="https://github.com/moby/moby/commit/ae6e9333c00dd6bfa674fde77399650841803821"><code>ae6e933</code></a> vendor: github.com/moby/buildkit v0.11.7-0.20230525183624-798ad6b0ce9f</li> <li><a href="https://github.com/moby/moby/commit/0d9acd24fe3a4d45b602f896d091a3855057d31d"><code>0d9acd2</code></a> c8d/inspect: Fill <code>Created</code> time if available</li> <li><a href="https://github.com/moby/moby/commit/37bc6397044d5a937929cbe94bf80a9d0b42ab2f"><code>37bc639</code></a> Merge pull request <a href="https://redirect.github.com/docker/docker/issues/45620">#45620</a> from thaJeztah/24.0_backport_update_go_runc_v1.1.0</li> <li><a href="https://github.com/moby/moby/commit/04eccf81654771f187cd7fdf34b3b12553e4e028"><code>04eccf8</code></a> vendor: github.com/containerd/go-runc v1.1.0</li> <li><a href="https://github.com/moby/moby/commit/24722779ffc5e4de697bea0606a9b92fa8111d2e"><code>2472277</code></a> Merge pull request <a href="https://redirect.github.com/docker/docker/issues/45616">#45616</a> from thaJeztah/24.0_backport_lock_in_snapshotter_se...</li> <li><a href="https://github.com/moby/moby/commit/9d8acb7bd1dfc45ac5522e843c81f7653bbd9c0c"><code>9d8acb7</code></a> Merge pull request <a href="https://redirect.github.com/docker/docker/issues/45612">#45612</a> from vvoland/dangling-image-repotagsdigests-test-24</li> <li>Additional commits viewable in <a href="https://github.com/docker/docker/compare/v24.0.1...v24.0.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/docker/docker&package-manager=go_modules&previous-version=24.0.1+incompatible&new-version=24.0.2+incompatible)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> 12882: deps(maven): bump software.amazon.awssdk:bom from 2.20.73 to 2.20.74 r=npepinpe a=dependabot[bot] Bumps software.amazon.awssdk:bom from 2.20.73 to 2.20.74. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=software.amazon.awssdk:bom&package-manager=maven&previous-version=2.20.73&new-version=2.20.74)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
12698: Replace go-bindata with go's embed FS r=npepinpe a=npepinpe ## Description Took some slack time to complete this quick win: replaces usage of go-bindata, an external tool dependency, with the built-in capabilities of the go compiler using `go:embed` and the embedFS. The version file is embedded at build time into the binary, much like when we used go-bindata, and read back as a package field. ## Related issues closes #6034 12868: fix(journal): avoid gaps in logs due to reset r=npepinpe a=npepinpe ## Description Avoid gaps in the log when resetting the log by deleting the segments in reverse order. This may cause data loss, but that's acceptable in the context of a reset operation. Deleting segments in reverse order will ensure we have no gaps between the remaining snapshot and segments should the operation fail midway. Note that this does not make persisting a new snapshot/reseting the log atomic unfortunately. Unfortunately tests were also not ideal, and require heavy use of mocks. It's also not possible to write integration test without major refactoring. As this is a quick fix, the hope is that the proper solution to make the Raft install operation atomic will redeem this. Not marked as closing the original issue because this is, after all, a quick fix. It only fixes a specific case we're aware of. ## Related issues <!-- Which issues are closed by this PR or are related --> related to #12754 12877: Event subprocess attaches to inner instance of multi-instance only r=saig0 a=korthout ## Description <!-- Please explain the changes you made here. --> Event sub processes inside a multi-instance sub process would be attached to the multi-instance body in addition to the inner instance. However, there is no reason why an event sub process should be attached to the multi-instance body. Instead, event sub processes should only be attached to the inner instances. **Example scenario** The parallel multi-instance sub process creates an `item` variable as input element in each inner instance. <img width="865" alt="Screenshot 2023-05-26 at 18 25 08" src="https://github.com/camunda/zeebe/assets/3511026/cf37717b-50cf-4478-84d3-d32009f54972"> Each message event sub process subscribes to the same message with a correlation key expression that uses the input element variable `item`. <img width="922" alt="Screenshot 2023-05-26 at 18 25 39" src="https://github.com/camunda/zeebe/assets/3511026/9fc40f17-a2e7-4628-a9fd-a70cb6bffe90"> The bug will surface as an incident on the multi-instance body stating: >failed to evaluate expression 'item.id': no variable found for name 'item' This was because we attempted to subscribe to the event sub process already when activating the multi-instance body, and **before** activating any of the inner instances. The problem is resolved by no longer attaching the event sub process to the multi-instance body. ## Related issues <!-- Which issues are closed by this PR or are related --> closes #11578 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com> Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
12868: fix(journal): avoid gaps in logs due to reset r=npepinpe a=npepinpe ## Description Avoid gaps in the log when resetting the log by deleting the segments in reverse order. This may cause data loss, but that's acceptable in the context of a reset operation. Deleting segments in reverse order will ensure we have no gaps between the remaining snapshot and segments should the operation fail midway. Note that this does not make persisting a new snapshot/reseting the log atomic unfortunately. Unfortunately tests were also not ideal, and require heavy use of mocks. It's also not possible to write integration test without major refactoring. As this is a quick fix, the hope is that the proper solution to make the Raft install operation atomic will redeem this. Not marked as closing the original issue because this is, after all, a quick fix. It only fixes a specific case we're aware of. ## Related issues <!-- Which issues are closed by this PR or are related --> related to #12754 12877: Event subprocess attaches to inner instance of multi-instance only r=saig0 a=korthout ## Description <!-- Please explain the changes you made here. --> Event sub processes inside a multi-instance sub process would be attached to the multi-instance body in addition to the inner instance. However, there is no reason why an event sub process should be attached to the multi-instance body. Instead, event sub processes should only be attached to the inner instances. **Example scenario** The parallel multi-instance sub process creates an `item` variable as input element in each inner instance. <img width="865" alt="Screenshot 2023-05-26 at 18 25 08" src="https://github.com/camunda/zeebe/assets/3511026/cf37717b-50cf-4478-84d3-d32009f54972"> Each message event sub process subscribes to the same message with a correlation key expression that uses the input element variable `item`. <img width="922" alt="Screenshot 2023-05-26 at 18 25 39" src="https://github.com/camunda/zeebe/assets/3511026/9fc40f17-a2e7-4628-a9fd-a70cb6bffe90"> The bug will surface as an incident on the multi-instance body stating: >failed to evaluate expression 'item.id': no variable found for name 'item' This was because we attempted to subscribe to the event sub process already when activating the multi-instance body, and **before** activating any of the inner instances. The problem is resolved by no longer attaching the event sub process to the multi-instance body. ## Related issues <!-- Which issues are closed by this PR or are related --> closes #11578 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com> Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
ZDP-Planning:
|
Describe the bug
Related to #12374
A customer observed another case where the startup failed with the error:
I could not verify it, but our assumption currently is that this is a false positive. The following might have happened:
I think this is plausible as the reset/deleting segments is not and atomic operation. We might not be handling it correctly.
Expected behavior
The reset or snapshot commit process should not result in an invalid intermediate state which is detected as corruption.
Environment:
Related to support
The text was updated successfully, but these errors were encountered: