Operate import slows down in case of data loss #19424

sdorokhova · 2024-06-17T09:29:05Z

Describe the bug

When searching for flow node instance parents we try for 2 times with 2 seconds delay for the case when parent flow node instance was imported with the previous batch but Elastic refresh did not yet happen. This 2 seconds sleep becomes a problem in case of high data load and piece of data with the parents has been lost. In this case for every imported child we will wait for 2 seconds when searching for parent which make import extremely slow.

We should avoid blocking import for 2 seconds.

To Reproduce

Steps to reproduce the behavior:

Deploy the process with subprocess, or multi-instance flow node.
Start the process in Zeebe and wait till flow node inside subprocess or multi-instance is activated.
Remove Zeebe records for parent flow node (e.g. subprocess)
Start Operate in debug mode and observe how the import behaves.

Current behavior

The import is waiting for 2 seconds before reporting missing parent and continuing.

Expected behavior

No waiting.

Environment

Operate Version: 8.2.22.

Additional context

Related support case: https://jira.camunda.com/browse/SUPPORT-22204

Acceptance Criteria

Definition of Ready - Checklist

The bug has been reproduced by the assignee in the environment compatible with the provided one; otherwise, the issue is closed with a comment
The issue has a meaningful title, description, and testable acceptance criteria
The issue has been labeled with an appropriate Bug-area label
Necessary screenshots, screen recordings, or files are attached to the bug report

For UI changes required to solve the bug:

Design input has been collected by the assignee

Implementation

🔍 Root Cause Analysis

💭 Proposed Solution

👉 Handover Dev to QA

Changed components:
Side effects on other components:
Handy resources:
BPMN/DMN models, plugins, scripts, REST API endpoints + example payload, etc :
Example projects:

Commands/Steps needed to test; Versions to validate:

Docker file / HELM chart : in case that it needed to be tested via docker share the version contain the fixed along with version of other services .

Release version ( in which version this fixed/feature will be released):

📗 Link to the test case

The text was updated successfully, but these errors were encountered:

* introduce new config parameters `camunda.operate.importer.retryReadingParents` which will prevent retrying with sleep call when reading parents from Elastic/Opensearch * we have a `sleep` call in Incident post processor also. The reason: we want to wait for Elastic refresh shards, before processing the next batch. Replaced it with scheduling next call with delay. Also increased backoff period to 5 sec. Closes #19424

mihail-ca · 2024-06-17T11:09:46Z

/public

* introduce new config parameters `camunda.operate.importer.retryReadingParents` which will prevent retrying with sleep call when reading parents from Elastic/Opensearch * we have a `sleep` call in Incident post processor also. The reason: we want to wait for Elastic refresh shards, before processing the next batch. Replaced it with scheduling next call with delay. Also increased backoff period to 5 sec. Closes #19424

sdorokhova added kind/bug Categorizes an issue or PR as a bug component/operate Related to the Operate component/team labels Jun 17, 2024

sdorokhova changed the title ~~Operate import slow down in case of data loss~~ Operate import slows down in case of data loss Jun 17, 2024

github-actions bot added the support Marks an issue as related to a customer support request label Jun 17, 2024

sdorokhova mentioned this issue Jun 17, 2024

fix: avoid blocking the threads in importer #19429

Merged

sdorokhova mentioned this issue Jun 17, 2024

fix: avoid blocking the threads in importer #19431

Merged

sdorokhova self-assigned this Jun 17, 2024

sdorokhova linked a pull request Jun 17, 2024 that will close this issue

fix: avoid blocking the threads in importer #19431

Merged

sdorokhova added Release: 8.4.10 Release: 8.3.13 Release: 8.5.4 labels Jun 19, 2024

sdorokhova added the version:8.2.28 Label that represents issues released on version 8.2.28 label Jul 1, 2024

sdorokhova closed this as completed in #19429 Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operate import slows down in case of data loss #19424

Operate import slows down in case of data loss #19424

sdorokhova commented Jun 17, 2024 •

edited

Loading

mihail-ca commented Jun 17, 2024

Operate import slows down in case of data loss #19424

Operate import slows down in case of data loss #19424

Comments

sdorokhova commented Jun 17, 2024 • edited Loading

Describe the bug

To Reproduce

Current behavior

Expected behavior

Environment

Additional context

Acceptance Criteria

Definition of Ready - Checklist

Implementation

🔍 Root Cause Analysis

💭 Proposed Solution

👉 Handover Dev to QA

📗 Link to the test case

mihail-ca commented Jun 17, 2024

sdorokhova commented Jun 17, 2024 •

edited

Loading