New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not possible to cancel process instance with many active element instances #11355
Comments
Another but related error occured on PROD:
Error group, https://console.cloud.google.com/errors/detail/CJujpJmq_NqemgE;service=zeebe;time=P7D?project=camunda-cloud-240911 |
ℹ️ Currently, the |
Happened on 8.1.3 https://console.cloud.google.com/errors/detail/CKvjvtrYm_SiuwE;service=zeebe;time=P7D?project=camunda-cloud-240911 |
I would request to re-evaluate the priority of this by @camunda/zeebe-process-automation Incidents shouldn't happen twice. This seems to be an issue that people seem to run into easily, and there is no good way to resolve it. |
Triage summary:
Let's continue working on this issue by providing this quick and dirty solution |
@korthout could you please check if I have summarized all the details in https://github.com/camunda/product-hub/issues/1067 ? |
@aleksander-dytko Thanks for creating the EPIC. I think you cover all the details. |
This happened again, except this time the number of child element instances is so great it causes the nodes to first slow down to a crawl due to very high GC times, then be killed due to OOM. Incident link: https://camunda.slack.com/archives/C051HA4V63D In case of investigation with this data, the key of the command is Affected version is 8.1.9, though I imagine most versions are affected. From the heap dump:
Memory metrics: In our case, the cluster was also unusable, and likely the only way to recover it is to give it ludicrous amounts of memory. |
Relevant support issue: https://jira.camunda.com/browse/SUPPORT-16499 And clusters which run into this are likely to be affected by #12239 as well (relevant support issue: https://jira.camunda.com/browse/SUPPORT-16394). Please update the support team once these issues are fixed with a patch ETA 🙏 |
I've renamed this issue as the descriptions are not related to deep-nesting. They are related to a process instance which contains many active elements instances. For the deep-nesting we have another issue: I've created an epic to do a proper task breakdown #12485 |
12604: Terminate children using the new `ProcessInstanceBatch` command r=berkaycanbc a=remcowesterhoud ## Description <!-- Please explain the changes you made here. --> This PR switches the termination of child instances to use the new `ProcessInstanceBatch` command. ## Related issues <!-- Which issues are closed by this PR or are related --> closes #12538 closes #11355 Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
Describe the bug
We got reports of crash looping Zeebe brokers on prod, it looks like the process which is running does some nesting or looping over certain activities. TODO: I will add the process model later.
The user tried to cancel the corresponding process instance but this failed because there were too many activities to terminate.
Error group: https://console.cloud.google.com/errors/detail/COWzpqvwz4Cg0wE;service=zeebe;time=P7D?project=camunda-cloud-240911
I put the severity to high since I see no workaround. BTW due to the loop and which causes the pod crash looping the cluster was in this case unusable.
To Reproduce
Have a process instance with a lot of activities active, and terminate the corresponding process instance.
Expected behavior
Termination of instances takes into account the batch size, and terminates activities batch-wise, similar issue as to activitate multi instances.
Log/Stacktrace
Full Stacktrace
Environment:
relates to https://jira.camunda.com/browse/SUPPORT-16499
The text was updated successfully, but these errors were encountered: