Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.6.2 crash #11454

Closed
imtrobin opened this issue Apr 16, 2020 · 6 comments
Closed

3.6.2 crash #11454

imtrobin opened this issue Apr 16, 2020 · 6 comments

Comments

@imtrobin
Copy link

imtrobin commented Apr 16, 2020

My Environment

  • ArangoDB Version: 3.6.2
  • Storage Engine: RocksDB
  • Deployment Mode: Single Server
  • Deployment Strategy: Manual Start
  • Infrastructure: own
  • Operating System: Centos7

I was previously running 3.4.x (MMFile) for months without problem. I upgraded to 3.6.2, couple of days ago, and I'm have experienced 2 crashes.

I'm running usr/sbin/aragod directly, it says "Killed"

/usr/sbin/arangod
2020-04-15T07:54:01Z [1945] INFO [e52b0] ArangoDB 3.6.2 [linux] 64bit, using jemalloc, build tags/v3.6.2-0-g7c2e5d3654, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d  10 Sep 2019
2020-04-15T07:54:01Z [1945] INFO [75ddc] detected operating system: Linux version 3.10.0-862.14.4.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Wed Sep 26 15:12:11 UTC 2018
2020-04-15T07:54:01Z [1945] INFO [43396] {authentication} Jwt secret not specified, generating...
2020-04-15T07:54:01Z [1945] INFO [144fe] using storage engine 'rocksdb'
2020-04-15T07:54:01Z [1945] INFO [3bb7d] {cluster} Starting up with role SINGLE
2020-04-15T07:54:01Z [1945] INFO [6ea38] using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2020-04-15T07:54:01Z [1945] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2020-04-15T07:54:01Z [1945] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2020-04-15T07:54:02Z [1945] INFO [cf3f4] ArangoDB (version 3.6.2 [linux]) is ready for business. Have fun!
2020-04-15T08:03:11Z [1945] WARNING [ebe22] Failed to update Foxx store from GitHub.
..same
2020-04-15T18:29:17Z [1945] WARNING [ebe22] Failed to update Foxx store from GitHub.
2020-04-16T03:19:08Z [1945] WARNING [3ad54] {engines} slow background settings sync: 1.405565 s
2020-04-16T03:19:13Z [1945] WARNING [3ad54] {engines} slow background settings sync: 1.801387 s
Killed
2020-04-14T01:59:59Z [27920] INFO [33eae] hangup received, about to reopen logfile
2020-04-14T01:59:59Z [27920] INFO [23db2] hangup received, reopened logfile
2020-04-14T04:02:46Z [27920] WARNING [ebe22] Failed to update Foxx store from GitHub.
..Same
2020-04-14T10:07:28Z [27920] WARNING [ebe22] Failed to update Foxx store from GitHub.
2020-04-14T15:32:04Z [27920] WARNING [3ad54] {engines} slow background settings sync: 1.200316 s
2020-04-14T15:52:16Z [27920] WARNING [ebe22] Failed to update Foxx store from GitHub.
..same
2020-04-14T17:36:58Z [27920] WARNING [ebe22] Failed to update Foxx store from GitHub.
2020-04-15T07:02:25Z [1370] INFO [e52b0] ArangoDB 3.6.2 [linux] 64bit, using jemalloc, build tags/v3.6.2-0-g7c2e5d3654, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d  10 Sep 2019
2020-04-15T07:02:25Z [1370] INFO [75ddc] detected operating system: Linux version 3.10.0-862.14.4.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Wed Sep 26 15:12:11 UTC 2018
2020-04-15T07:02:25Z [1370] INFO [43396] {authentication} Jwt secret not specified, generating...
2020-04-15T07:02:25Z [1370] INFO [144fe] using storage engine 'rocksdb'
2020-04-15T07:02:25Z [1370] INFO [3bb7d] {cluster} Starting up with role SINGLE
2020-04-15T07:02:25Z [1370] INFO [6ea38] using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2020-04-15T07:02:25Z [1370] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2020-04-15T07:02:25Z [1370] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2020-04-15T07:02:25Z [1370] WARNING [b387d] found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 27920, but that process seems to be dead already
2020-04-15T07:02:40Z [1497] INFO [e52b0] ArangoDB 3.6.2 [linux] 64bit, using jemalloc, build tags/v3.6.2-0-g7c2e5d3654, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d  10 Sep 2019
2020-04-15T07:02:40Z [1497] INFO [75ddc] detected operating system: Linux version 3.10.0-862.14.4.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Wed Sep 26 15:12:11 UTC 2018
2020-04-15T07:02:40Z [1497] INFO [43396] {authentication} Jwt secret not specified, generating...
2020-04-15T07:02:40Z [1497] INFO [144fe] using storage engine 'rocksdb'
2020-04-15T07:02:40Z [1497] INFO [3bb7d] {cluster} Starting up with role SINGLE
2020-04-15T07:02:40Z [1497] INFO [6ea38] using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2020-04-15T07:02:40Z [1497] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2020-04-15T07:02:40Z [1497] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2020-04-15T07:02:40Z [1497] WARNING [b387d] found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 1370, but that process seems to be dead already
2020-04-15T07:02:42Z [1497] INFO [cf3f4] ArangoDB (version 3.6.2 [linux]) is ready for business. Have fun!
2020-04-15T07:11:47Z [1497] INFO [b4133] control-c received, beginning shut down sequence
2020-04-15T07:11:48Z [1497] INFO [4bcb9] ArangoDB has been shut down
@imtrobin
Copy link
Author

Any assistance on how to solve this? The crash is happening everyday, even though it's a test server, I was running 3.4.x for months without issues.

@jsteemann
Copy link
Contributor

@imtrobin : IIRC, "Killed" will be displayed only if the process received a kill signal.
If you don't use any scripts of your own that kill processes, I guess the arangod process got sent a SIGKILL signal by the operating system. This signal cannot be intercepted or handled by the killed process.
The Linux kernel often employs an OOM (out of memory) killer that will terminate processes that consume many resources (i.e. RAM) in case the OS runs out of memory. In this case, you should see invocations of the OOM killer in the operating system logs. Can you check if that's the case?
Do you have any restrictions for memory usage put in place, e.g. via running arangod in a container or such? Do you know if the process is supposed to use lots of RAM?

@imtrobin
Copy link
Author

I checked the OS logs, yes it indeed an out of memory. This is only a test dev server, and I'm doing only 10 small documents with 10 collections only . This is amount of memory left while arango is running

free -h
total used free shared buff/cache available
Mem: 487M 233M 47M 4.2M 207M 215M
Swap: 819M 520K 819M

It gets killed only when I leave it overnight, not while I'm using it for testing, so it's a memory leak?

@jsteemann
Copy link
Contributor

@imtrobin : what seems to happen here is the following:
arangod keeps running, but is idle from the user perspective. In the background however, it record statistics every few seconds and store them in an internal collection (_statistics). These statistics can be viewed when using the web interface etc.
But the statistics gathering will lead to the database storing new data in RocksDB every few seconds, which will lead to in-memory buffers and caches filling up. This is somewhat intentional, as buffers and caches are used for all kinds of read/write database operations.
They can also be limited in size.
What seems to happen here is that the overall size of these buffers/caches at some point exceeds the available memory, so the OS decides to kill the process.

To prevent the problem from happening, you could adjust (downsize) the values of --rocksdb.total-write-buffer-size and --rocksdb.block-cache-size in your configuration.
The defaults for these options are 512MB and 256MB for small systems. This means that the RocksDB storage engine in ArangoDB will happily try to use up to 768MB of RAM for these two things alone.

It seems your system only has 487MB of RAM, so the defaults are too high and you will hit an OOM very quickly. Lowering the config values should help, but will then end up with very small buffers/caches.
You can also turn off statistics gathering by setting --server.statistics to false.

@imtrobin
Copy link
Author

Thanks for the explaination, I will give it a go and try with statistic off

@imtrobin
Copy link
Author

it surivived two days with stats off, so thank you.

@dothebart dothebart added the 2 Solved Resolution label Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants