New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU randomly spiking at 100% CPU #13312
Comments
Can you take a coredump the next time this happens and share it? That way we can investigate what's going on. |
@woodytec can you create a minidump the next time this happens? On windows the easiest way is via the task manager - on the "details" page right click on the arangod.exe process and select "Create dump file". |
@mpoeter Sharing a core dump might be problematic due to PII issues but I will find out. We are using a docker container with tag 3.6.3. What's the best way of of installing the debugging symbols for this build into the container so we can look at the coredump stacktraces and share that? Installing the debian package for the symbols into the docker container doesn't work off the bat. Some more information: after about 30 mins of 100% CPU, the DB gets unresponsive and we get errors saying the scheduler queue is full. |
When using strace on arangodb process, this is being reported endlessly: futex({hex number}, FUTEX_WAIT_PRIVATE, 2, {0, {number}}) = -1 ETIMEDOUT (Connection timed out) Does this provide any clues? Have you seen this before? |
Hi, |
OK thanks. We are currently planning an upgrade to 3.7.6 next week. We'll see if the issue persists after, as you suggest. |
@minimind Did you upgrade successfully? Does the problem persist? |
@minimind Any update from your side? |
Yes. We upgraded to 3.7.6 and this completely fixed the problem. We haven't had a single occurrence. |
Great, thanks for the feedback! I'm going to close the issue, but please re-open should the problem occur again. |
My Environment
Component, Query & Data
Affected feature: Server
Problem:
We are encountering 100% CPU spikes on our ArangoDB database every week or so at random times. The CPU gradually increases linearly for about 30mins until it reaches 100%. Rebooting Arango sets the CPU to normal. There are no obvious causes. There are in the order of 100m documents in our main collection. There are a few slow queries but nothing out of the ordinary. Load on the DB is low and not out of the ordinary - it usually hovers around 5% or so. There are no errors in the logs. Reading advice in previous issues, we ran the 'top -H' and could see the CPU allocation is divided equally between the various SchedWorker processes. There are no errors in the logs, or any out-of-the-ordinary messages at all. There doesn't appear to be a resource limitation for RAM.
We'd like to know if this is expected or we should be concerned? Is this a known issue? What other information could we provide?
The text was updated successfully, but these errors were encountered: