Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the dbserver crash 'FATAL out of memory in V8' #7264

Closed
betwjp opened this issue Nov 8, 2018 · 4 comments
Closed

the dbserver crash 'FATAL out of memory in V8' #7264

betwjp opened this issue Nov 8, 2018 · 4 comments
Labels
1 Question 2 User Abandoned Resolution 3 Cluster 3 OOM System runs out of memory / resources 3 RocksDB Storage engine related

Comments

@betwjp
Copy link

betwjp commented Nov 8, 2018

My Environment

  • ArangoDB Version: 3.3.19
  • Storage Engine: RocksDB
  • Deployment Mode: Cluster
  • Deployment Strategy: ArangoDB Starter
  • Configuration: 6 servers, 32core, 128Gb memory
  • Infrastructure: own
  • Operating System: CentOS 6.9
  • Used Package: CentOS rpm

Component, Query & Data

we use supervisor mode to start dbserver. and notice one dbserver node crash frequency, the dbserver run one or two day, the it will crash. all the dbservers are the same config, the crash server always the same node.

my arangdb.conf is:

[server]
authentication = false
endpoint = tcp://[::]:8530
storage-engine = rocksdb
threads = 32
[rocksdb]
write-buffer-size = 128108864 
max-write-buffer-number=4
min-write-buffer-number-to-merge=2
dynamic-level-bytes = true
level0-compaction-trigger = 8
level0-slowdown-trigger = 17
rocksdb.level0-stop-trigger = 24
max-bytes-for-level-base = 536870912
max-bytes-for-level-multiplier = 8
max-background-jobs = 8
num-threads-priority-high = 6
num-threads-priority-low = 6
block-cache-size=10474836480
[log]
use-local-time = true
level = INFO
[javascript]
v8-contexts = 16
v8-contexts-minimum = 8
[query]
registry-ttl = 100

the log is below:

2018-11-07T18:23:45 [25743] INFO {cluster} using heartbeat interval value '1000 ms' from agency
2018-11-07T18:23:46 [25743] INFO using endpoint 'http+tcp://[::]:8530' for non-encrypted requests
2018-11-07T18:23:46 [25743] INFO bootstrapped DB server PRMR-8b3e3acb-f453-457a-964b-e69ae26a64f2
2018-11-07T18:23:46 [25743] INFO ArangoDB (version 3.3.19 [linux]) is ready for business. Have fun!
2018-11-08T17:14:05 [25743] WARNING {communication} out of memory while reading from client
2018-11-08T17:14:05 [25743] WARNING {communication} out of memory while reading from client
2018-11-08T17:14:05 [25743] WARNING {communication} out of memory while reading from client
2018-11-08T17:14:06 [25743] ERROR {threads} scheduler loop caught exception: out of memory (exception location: /var/lib/otherjenkins/workspace/RELEASE__BuildPackages/arangod/Scheduler/SocketTask.cpp:578)
2018-11-08T17:14:07 [25743] ERROR {threads} scheduler loop caught exception: out of memory (exception location: /var/lib/otherjenkins/workspace/RELEASE__BuildPackages/arangod/Scheduler/SocketTask.cpp:578)
2018-11-08T17:14:08 [25743] FATAL out of memory in V8 (Committing semi space failed.)
2018-11-08T17:14:19 [14960] INFO ArangoDB 3.3.19 [linux] 64bit, using jemalloc, build tags/v3.3.19-0-gfe9657c-dirty, VPack 0.1.30, RocksDB 5.6.0, ICU 58.1, V8 5.7.492.77, OpenSSL 1.0.1e-fips 11 Feb 2013
2018-11-08T17:14:19 [14960] INFO {authentication} Jwt secret not specified, generating...
2018-11-08T17:14:19 [14960] INFO detected operating system: Linux version 2.6.32-696.6.3.el6.x86_64 (mockbuild@c1bl.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Wed Jul 12 14:17:22 UTC 2017
2018-11-08T17:14:19 [14960] WARNING {communication} /proc/sys/net/ipv4/tcp_tw_recycle is enabled (1)'. This can lead to all sorts of "random" network problems. It is advised to leave it disabled (should be kernel default)
2018-11-08T17:14:19 [14960] WARNING {communication} execute 'sudo bash -c "echo 0 > /proc/sys/net/ipv4/tcp_tw_recycle"'
2018-11-08T17:14:19 [14960] INFO using storage engine rocksdb
2018-11-08T17:14:19 [14960] INFO {cluster} Starting up with role PRIMARY
2018-11-08T17:14:19 [14960] INFO {syscall} file-descriptors (nofiles) hard limit is 655350, soft limit is 655350
2018-11-08T17:14:19 [14960] INFO {authentication} Authentication is turned off, authentication for unix sockets is turned on
2018-11-08T17:14:19 [14960] WARNING {memory} /proc/sys/vm/overcommit_ratio is set to '87'. It is recommended to set it to at least '94' (100 * (max(0, (RAM - Swap Space)) / RAM)) to utilize all available RAM. Setting it to this value will minimize swap usage, but may result in more out-of-memory errors, while setting it to 100 will allow the system to use both all available RAM and swap space.
2018-11-08T17:14:19 [14960] WARNING {memory} execute 'sudo bash -c "echo 94 > /proc/sys/vm/overcommit_ratio"'
2018-11-08T17:15:04 [14960] INFO {cluster} Cluster feature is turned on. Agency version: {"server":"arango","version":"3.3.19","license":"community"}, Agency endpoints: http+tcp://10.1.96.117:8531, http+tcp://10.1.96.116:8531, http+tcp://10.1.96.119:8531, server id: 'PRMR-8b3e3acb-f453-457a-964b-e69ae26a64f2', internal address: tcp://10.1.96.116:8530, role: PRIMARY
2018-11-08T17:15:04 [14960] INFO {cluster} using heartbeat interval value '1000 ms' from agency
2018-11-08T17:15:04 [14960] INFO using endpoint 'http+tcp://[::]:8530' for non-encrypted requests
2018-11-08T17:15:04 [14960] INFO bootstrapped DB server PRMR-8b3e3acb-f453-457a-964b-e69ae26a64f2
2018-11-08T17:15:04 [14960] INFO ArangoDB (version 3.3.19 [linux]) is ready for business. Have fun!

the arangod.log.supervisor is :

2018-11-08T17:14:19 [14242] INFO {startup} waitpid woke up with return value 25743 and status 256 and DONE = false
2018-11-08T17:14:19 [14242] ERROR {startup} child process 25743 terminated unexpectedly, exit status 1. will now start a new child process
2018-11-08T17:14:19 [14242] INFO {startup} supervisor has forked a child process with pid 14960
@OmarAyo OmarAyo added 3 Cluster 3 OOM System runs out of memory / resources 3 RocksDB Storage engine related 1 Analyzing labels Nov 8, 2018
@betwjp
Copy link
Author

betwjp commented Nov 12, 2018

can you give some suggest? when I restart the pid, the pid will exit unexpectedly after two days.

@dothebart
Copy link
Contributor

dothebart commented Nov 12, 2018

Quoting some of your log output:

2018-11-08T17:14:19 [14960] WARNING {memory} /proc/sys/vm/overcommit_ratio is set to '87'. It is recommended to set it to at least '94' (100 * (max(0, (RAM - Swap Space)) / RAM)) to utilize all available RAM. Setting it to this value will minimize swap usage, but may result in more out-of-memory errors, while setting it to 100 will allow the system to use both all available RAM and swap space.
2018-11-08T17:14:19 [14960] WARNING {memory} execute 'sudo bash -c "echo 94 > /proc/sys/vm/overcommit_ratio"'

next to that, please follow the recommendations jstemann gave in this issue: #5579 (comment)

@OmarAyo
Copy link
Contributor

OmarAyo commented Nov 15, 2018

Hi @betwjp

Just sending a followup, to check how things are going. Did you had the chance to read jstemann's recommendation ?

Please provide a feedback

Best,

@OmarAyo
Copy link
Contributor

OmarAyo commented Dec 3, 2018

Hi @betwjp

We have not received any feedback in a while. If needed, please feel free to comment and we will reopen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Question 2 User Abandoned Resolution 3 Cluster 3 OOM System runs out of memory / resources 3 RocksDB Storage engine related
Projects
None yet
Development

No branches or pull requests

3 participants