-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with: Failing installation of 'LogStoragePartitionStep' #9040
Comments
@deepthidevaki can you double check that the system did not react and that it's just a logging issue? 👍 |
It behaved as expected.
However it was logged as an error in PartitionTransitionProcess ` https://github.com/camunda/zeebe/blob/acd6aff1a3d29959235871a1b9c0e4a9216b2b9c/broker/src/main/java/io/camunda/zeebe/broker/system/partitions/impl/PartitionTransitionProcess.java#L87. We can fix this by conditionally logging the errors. But of course a better solution will be to revisit the transitions and as part of it remove the term-check. |
Let's reduce the log level for these recoverable errors to at least WARN, warning being setup for issues which may recover by themselves but give hints in case the operator notices something is wrong, or if the warning consistently repeats. |
9122: fix(broker): do not log transition failure due to term mismatch as error r=deepthidevaki a=deepthidevaki ## Description If a transition failed due to a term mismatch, there will be a new transition following. There is no need to log it as an error because conceptually it is not an error. It is expected to fail to prevent inconsistencies. When this failure happens, it is ignored and ZeebePartition continue with the next transition. ## Related issues closes #9040 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
9125: [Backport stable/8.0] fix(broker): do not log transition failure due to term mismatch as error r=deepthidevaki a=github-actions[bot] # Description Backport of #9122 to `stable/8.0`. relates to #9040 9133: [Backport stable/8.0] Prevent duplicate key insertion for DMN r=remcowesterhoud a=github-actions[bot] # Description Backport of #9121 to `stable/8.0`. relates to #9115 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com> Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
I feel we not completely resolve the issue with the recent PR. We can still see errors like: IllegalStateException: Not leader of partition 2 Which seem to be more or less expect if I take a look at the log.
Logs are stored as well under https://drive.google.com/drive/u/0/folders/1vYJilfkLRlrF9CqTAUwRT9dLeDqTZULb Exception:
|
This is a different issue, but related. Might make sense to open a new issue. |
Triage: Moved to todo in order to open a new issue with the related information, afterwards this issue should be closed. |
Created a new issue #9586 |
Describe the bug
We see an error on prod which says:
Expected that current term '150' is same as raft term '151', but was not. Failing installation of 'LogStoragePartitionStep' on partition 20.
on broker 6.Based on metrics we can see:
that we have a leader (Broker 5), Broker 6 becomes leader, Broker 5 steps down, Broker 0 becomes Leader and Broker 6 steps down. Based on that I would say everything worked as expected ?
Error group: https://console.cloud.google.com/errors/detail/CPf-xtb-3czwbw;service=zeebe;time=P7D?project=camunda-cloud-240911
Interesting is that this error happend in the cluster multiple times on the same time:
Occurrences
To Reproduce
IDk
Expected behavior
I think this is expected that this can happen, than I would expect we log a warning instead of an error.
Since we worked on this here #8717 I also expected a different exception ?
Log/Stacktrace
https://drive.google.com/drive/folders/18XzGehQ0z2ut4inT-wXiBbBGM-DD3Zpf
Full Stacktrace
Environment:
Camunda saas
The text was updated successfully, but these errors were encountered: