New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on deletion of entities with many relationships #6357
Comments
hey @iFrozenPhoenix do you happen to know fir which entity type it happens or does it seem to happen randomly? Additionally, are you trying to delete a bunch of them at once or only one by one? If it's one by one, do you delete them by entering in the entity and then delete it or do you delete directly from the list? |
The entities I have problems are of type Malware. If I enter into the entity and try to delete it nothing happens. If I try to delete the entity via Bulk Deletion I see the mentioned error. It doesn't affect all Malwares only certain. |
I was able to isolate the octi error messages on the backend, see the following.
|
I could find out a bit more. The problem seems to occur with any entity if it has more then 1000 relations. If the relations are deleted in advance the deletion of the entity also works. If I understand it correctly the mass selection and deletion deletes each object (Relation / Entity) one by one. For me it looks like there is a difference on how the relations are deleted if I delete the entity compared to if I delete the relations by mass selecting them). With Version 5.12.* it worked without problems. |
Thanks for the investigation! I'm reaching out internally to seek for advice and help on this one. |
I tried adjusting my oopensearch settings, especially the thread_pool.search.size parameter. I significantly increased it and now the error message is a different one ("Execution timeout, too many concurrent call on the same entities - no reason provided") when I try to delete such a entity. |
Increase thread_pool.search.size is a good move has the error in your logs is generated when opensearch is under to much load. So basically we have to improve your cluster capacity or opensearch memory/threads figures. Error "Execution timeout, too many concurrent call on the same entities" is completely different. Its an error generated by opencti when too many operation try to be done on the same element in a short period of time. A quicker opensearch can mitigate this problem but if you have a lot of workers working on the same elements and you try at the same time to delete you can definitely have this kind of error. However after a bit of time you should be able to delete the element if no other process try to work on it. |
Currently I'm running OCTI with the following setup. The Opensearch config (only modifications) I use is thread_pool.search.min_queue_size: 10000
thread_pool.search.max_queue_size: 10000
thread_pool.search.queue_size: 10000
thread_pool.search.size: 32 Befor version 6 I didn't apply the custom Opensearch config and had xms3G and xmx3G and it ran very smooth. Now after applying the above config it runs again, at least nearly smooth, but I have still the problem with the pagination error (Eventually the Execution Timeout, too many concurrent requests was sent in a moment where the ingestion rate was very high. After I have applied the above config I see again the following ingestion metrics: and the UI is again relatively smooth, despite of some dashboard widgets with relations filtering. Could you eventually give me a suggestion what else I could try?. I have the following metrics: Disk throughput is 2k iops / (50 - 100k), disk bandwidth 100 Mbps / ca. 1000 Mbps, Opensearch CPU Usage ( 105 - 230 % / 500 left; total 800 %), Ram 4 GB Left) |
I have to add, currently I have attached a new connector that is currently ingesting 1.2 M new entities, therefore the high no in bundles to be processed. |
Additionally, opensearch's _cat/thread_pool?v query with (active, queued, rejected) does not show a single rejection. |
cc @richard-julien do you have any suggestion ? |
As your platform is growing and maybe the data your ingesting also evolve you start to have some contention problem on your opensearch. A less effective opensearch will introduce some latencies and too many concurrent requests error when a lot of workers works on the same object and the opensearch is slower than expected to process everything. Today the solution can be to decrease the number of workers and assume that you will be a bit slower to ingest your data. Or you need to move to a more powerful opensearch, moving to a cluster approach. Even if you move to a bigger opensearch you can still have some "too many concurrent" times to times when the platform process very highly connected data. Its something we work on to limit this kind of locking. |
Description
When trying to delete an entity i receive the error "Fail to execute engine pagination - no reason provided".
It only happens on certain entities.
Environment
Reproducible Steps
Expected Output
Deletion is successful
Actual Output
Deletion fails with the error message "Fail to execute engine pagination - no reason provided"
Additional information
Screenshots (optional)
The text was updated successfully, but these errors were encountered: