Skip to content

Riak 1.3 suddenly crashes, even without a big load  #301

@subnetmarco

Description

@subnetmarco

I've got the following problem when running Riak 1.3 with LevelDB on EC2 (c1.xlarge):

$ sudo cat /var/log/riak/error.log 
2013-03-28 19:18:35.943 [error] <0.88.0> gen_server disksup terminated with reason: {port_died,normal}

$ sudo cat /var/log/riak/crash.log
2013-03-28 19:18:36 =ERROR REPORT====
** Generic server disksup terminating 
** Last message in was {'EXIT',#Port<0.1415>,normal}
** When Server state == [{data,[{"OS",{unix,linux}},{"Timeout",1800000},{"Threshold",80},{"DiskData",[{"/",206424760,29},{"/dev/shm",3559560,0}]}]}]
** Reason for termination == 
** {port_died,normal}
2013-03-28 19:18:37 =CRASH REPORT====
  crasher:
    initial call: disksup:init/1
    pid: <0.88.0>
    registered_name: disksup
    exception exit: {{port_died,normal},[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,747}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
    ancestors: [os_mon_sup,<0.86.0>]
    messages: []
    links: [<0.87.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 1597
    stack_size: 24
    reductions: 98287
  neighbours:

This happens suddenly also when the cluster is not stressed. The cluster is made of 4 machines, 64 partitions, and n_val=2. This is a very sever problem because it often happens that the whole cluster crashes. Each server has an average of 60GB of data stored.

These are other logs from another machine in the same cluster, that crashed as well:

LevelDB is configured like this on every machine:

%% eLevelDB Config
{eleveldb, [
    {data_root, "/var/lib/riak/leveldb"},
    {max_open_files, 50}, %% Maximum number of files open at once per partition
    {cache_size, 117440512} %% 112MB
]}

The machines are EC2 High CPU extra large instances (c1.xlarge), as such they have:

  • 7 GiB of memory
  • 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)

I'm using the Erlang VM bundled with the RHEL 6.0 x86 64 distribution.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions