the dbserver crash 'FATAL out of memory in V8' #7264

betwjp · 2018-11-08T10:23:45Z

My Environment

ArangoDB Version: 3.3.19
Storage Engine: RocksDB
Deployment Mode: Cluster
Deployment Strategy: ArangoDB Starter
Configuration: 6 servers, 32core, 128Gb memory
Infrastructure: own
Operating System: CentOS 6.9
Used Package: CentOS rpm

Component, Query & Data

we use supervisor mode to start dbserver. and notice one dbserver node crash frequency, the dbserver run one or two day, the it will crash. all the dbservers are the same config, the crash server always the same node.

my arangdb.conf is:

[server]
authentication = false
endpoint = tcp://[::]:8530
storage-engine = rocksdb
threads = 32
[rocksdb]
write-buffer-size = 128108864 
max-write-buffer-number=4
min-write-buffer-number-to-merge=2
dynamic-level-bytes = true
level0-compaction-trigger = 8
level0-slowdown-trigger = 17
rocksdb.level0-stop-trigger = 24
max-bytes-for-level-base = 536870912
max-bytes-for-level-multiplier = 8
max-background-jobs = 8
num-threads-priority-high = 6
num-threads-priority-low = 6
block-cache-size=10474836480
[log]
use-local-time = true
level = INFO
[javascript]
v8-contexts = 16
v8-contexts-minimum = 8
[query]
registry-ttl = 100

the log is below:

2018-11-07T18:23:45 [25743] INFO {cluster} using heartbeat interval value '1000 ms' from agency
2018-11-07T18:23:46 [25743] INFO using endpoint 'http+tcp://[::]:8530' for non-encrypted requests
2018-11-07T18:23:46 [25743] INFO bootstrapped DB server PRMR-8b3e3acb-f453-457a-964b-e69ae26a64f2
2018-11-07T18:23:46 [25743] INFO ArangoDB (version 3.3.19 [linux]) is ready for business. Have fun!
2018-11-08T17:14:05 [25743] WARNING {communication} out of memory while reading from client
2018-11-08T17:14:05 [25743] WARNING {communication} out of memory while reading from client
2018-11-08T17:14:05 [25743] WARNING {communication} out of memory while reading from client
2018-11-08T17:14:06 [25743] ERROR {threads} scheduler loop caught exception: out of memory (exception location: /var/lib/otherjenkins/workspace/RELEASE__BuildPackages/arangod/Scheduler/SocketTask.cpp:578)
2018-11-08T17:14:07 [25743] ERROR {threads} scheduler loop caught exception: out of memory (exception location: /var/lib/otherjenkins/workspace/RELEASE__BuildPackages/arangod/Scheduler/SocketTask.cpp:578)
2018-11-08T17:14:08 [25743] FATAL out of memory in V8 (Committing semi space failed.)
2018-11-08T17:14:19 [14960] INFO ArangoDB 3.3.19 [linux] 64bit, using jemalloc, build tags/v3.3.19-0-gfe9657c-dirty, VPack 0.1.30, RocksDB 5.6.0, ICU 58.1, V8 5.7.492.77, OpenSSL 1.0.1e-fips 11 Feb 2013
2018-11-08T17:14:19 [14960] INFO {authentication} Jwt secret not specified, generating...
2018-11-08T17:14:19 [14960] INFO detected operating system: Linux version 2.6.32-696.6.3.el6.x86_64 (mockbuild@c1bl.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Wed Jul 12 14:17:22 UTC 2017
2018-11-08T17:14:19 [14960] WARNING {communication} /proc/sys/net/ipv4/tcp_tw_recycle is enabled (1)'. This can lead to all sorts of "random" network problems. It is advised to leave it disabled (should be kernel default)
2018-11-08T17:14:19 [14960] WARNING {communication} execute 'sudo bash -c "echo 0 > /proc/sys/net/ipv4/tcp_tw_recycle"'
2018-11-08T17:14:19 [14960] INFO using storage engine rocksdb
2018-11-08T17:14:19 [14960] INFO {cluster} Starting up with role PRIMARY
2018-11-08T17:14:19 [14960] INFO {syscall} file-descriptors (nofiles) hard limit is 655350, soft limit is 655350
2018-11-08T17:14:19 [14960] INFO {authentication} Authentication is turned off, authentication for unix sockets is turned on
2018-11-08T17:14:19 [14960] WARNING {memory} /proc/sys/vm/overcommit_ratio is set to '87'. It is recommended to set it to at least '94' (100 * (max(0, (RAM - Swap Space)) / RAM)) to utilize all available RAM. Setting it to this value will minimize swap usage, but may result in more out-of-memory errors, while setting it to 100 will allow the system to use both all available RAM and swap space.
2018-11-08T17:14:19 [14960] WARNING {memory} execute 'sudo bash -c "echo 94 > /proc/sys/vm/overcommit_ratio"'
2018-11-08T17:15:04 [14960] INFO {cluster} Cluster feature is turned on. Agency version: {"server":"arango","version":"3.3.19","license":"community"}, Agency endpoints: http+tcp://10.1.96.117:8531, http+tcp://10.1.96.116:8531, http+tcp://10.1.96.119:8531, server id: 'PRMR-8b3e3acb-f453-457a-964b-e69ae26a64f2', internal address: tcp://10.1.96.116:8530, role: PRIMARY
2018-11-08T17:15:04 [14960] INFO {cluster} using heartbeat interval value '1000 ms' from agency
2018-11-08T17:15:04 [14960] INFO using endpoint 'http+tcp://[::]:8530' for non-encrypted requests
2018-11-08T17:15:04 [14960] INFO bootstrapped DB server PRMR-8b3e3acb-f453-457a-964b-e69ae26a64f2
2018-11-08T17:15:04 [14960] INFO ArangoDB (version 3.3.19 [linux]) is ready for business. Have fun!

the arangod.log.supervisor is :

2018-11-08T17:14:19 [14242] INFO {startup} waitpid woke up with return value 25743 and status 256 and DONE = false
2018-11-08T17:14:19 [14242] ERROR {startup} child process 25743 terminated unexpectedly, exit status 1. will now start a new child process
2018-11-08T17:14:19 [14242] INFO {startup} supervisor has forked a child process with pid 14960

The text was updated successfully, but these errors were encountered:

betwjp · 2018-11-12T09:05:24Z

can you give some suggest? when I restart the pid, the pid will exit unexpectedly after two days.

dothebart · 2018-11-12T09:49:11Z

Quoting some of your log output:

2018-11-08T17:14:19 [14960] WARNING {memory} /proc/sys/vm/overcommit_ratio is set to '87'. It is recommended to set it to at least '94' (100 * (max(0, (RAM - Swap Space)) / RAM)) to utilize all available RAM. Setting it to this value will minimize swap usage, but may result in more out-of-memory errors, while setting it to 100 will allow the system to use both all available RAM and swap space.
2018-11-08T17:14:19 [14960] WARNING {memory} execute 'sudo bash -c "echo 94 > /proc/sys/vm/overcommit_ratio"'

next to that, please follow the recommendations jstemann gave in this issue: #5579 (comment)

OmarAyo · 2018-11-15T10:46:47Z

Hi @betwjp

Just sending a followup, to check how things are going. Did you had the chance to read jstemann's recommendation ?

Please provide a feedback

Best,

OmarAyo · 2018-12-03T09:34:01Z

Hi @betwjp

We have not received any feedback in a while. If needed, please feel free to comment and we will reopen this issue.

OmarAyo added 3 Cluster 3 OOM System runs out of memory / resources 3 RocksDB Storage engine related 1 Analyzing labels Nov 8, 2018

OmarAyo added Waiting User Reply 1 Question and removed 1 Analyzing labels Nov 13, 2018

OmarAyo closed this as completed Dec 3, 2018

OmarAyo added 2 User Abandoned Resolution and removed Waiting User Reply labels Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the dbserver crash 'FATAL out of memory in V8' #7264

the dbserver crash 'FATAL out of memory in V8' #7264

betwjp commented Nov 8, 2018 •

edited by vinaypyati

betwjp commented Nov 12, 2018

dothebart commented Nov 12, 2018 •

edited

OmarAyo commented Nov 15, 2018

OmarAyo commented Dec 3, 2018

the dbserver crash 'FATAL out of memory in V8' #7264

the dbserver crash 'FATAL out of memory in V8' #7264

Comments

betwjp commented Nov 8, 2018 • edited by vinaypyati

My Environment

Component, Query & Data

my arangdb.conf is:

the log is below:

the arangod.log.supervisor is :

betwjp commented Nov 12, 2018

dothebart commented Nov 12, 2018 • edited

OmarAyo commented Nov 15, 2018

OmarAyo commented Dec 3, 2018

betwjp commented Nov 8, 2018 •

edited by vinaypyati

dothebart commented Nov 12, 2018 •

edited