-
Notifications
You must be signed in to change notification settings - Fork 529
Closed
Labels
Description
I've got the following problem when running Riak 1.3 with LevelDB on EC2 (c1.xlarge):
$ sudo cat /var/log/riak/error.log
2013-03-28 19:18:35.943 [error] <0.88.0> gen_server disksup terminated with reason: {port_died,normal}
$ sudo cat /var/log/riak/crash.log
2013-03-28 19:18:36 =ERROR REPORT====
** Generic server disksup terminating
** Last message in was {'EXIT',#Port<0.1415>,normal}
** When Server state == [{data,[{"OS",{unix,linux}},{"Timeout",1800000},{"Threshold",80},{"DiskData",[{"/",206424760,29},{"/dev/shm",3559560,0}]}]}]
** Reason for termination ==
** {port_died,normal}
2013-03-28 19:18:37 =CRASH REPORT====
crasher:
initial call: disksup:init/1
pid: <0.88.0>
registered_name: disksup
exception exit: {{port_died,normal},[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,747}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
ancestors: [os_mon_sup,<0.86.0>]
messages: []
links: [<0.87.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 1597
stack_size: 24
reductions: 98287
neighbours:
This happens suddenly also when the cluster is not stressed. The cluster is made of 4 machines, 64 partitions, and n_val=2. This is a very sever problem because it often happens that the whole cluster crashes. Each server has an average of 60GB of data stored.
These are other logs from another machine in the same cluster, that crashed as well:
- error.log: http://pastebin.com/ARNqLE3z
- crash.log: http://pastebin.com/WPUF2wjW
LevelDB is configured like this on every machine:
%% eLevelDB Config
{eleveldb, [
{data_root, "/var/lib/riak/leveldb"},
{max_open_files, 50}, %% Maximum number of files open at once per partition
{cache_size, 117440512} %% 112MB
]}The machines are EC2 High CPU extra large instances (c1.xlarge), as such they have:
- 7 GiB of memory
- 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
I'm using the Erlang VM bundled with the RHEL 6.0 x86 64 distribution.
Reactions are currently unavailable