Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: b0c1d8d603
Fetching contributors…

Octocat-spinner-32-eaf2f5

Cannot retrieve contributors at this time

file 51 lines (35 sloc) 2.122 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
We consider 0.3 most appropriate for someone who wants to evaluate
Cassandra without dealing with the highly variable degree of stability
that a nightly build offers. Here are the known issues you should
be most concerned about:

 1. With enough and large enough keys in a ColumnFamily, Cassandra will
    run out of memory trying to perform compactions (data file merges).
    The size of what is stored in memory is (S + 16) * (N + M) where S
    is the size of the key (usually 2 bytes per character), N is the
    number of keys and M, is the map overhead (which can be guestimated
    at around 32 bytes per key).

    So, if you have 10-character keys and 1GB of headroom in your heap
    space for compaction, you can expect to store about 17M keys
    before running into problems.

    See https://issues.apache.org/jira/browse/CASSANDRA-208

 2. Because fixing #1 requires a data file format change, 0.4 will not
    be binary-compatible with 0.3 data files. A client-side upgrade
    can be done relatively easily with the following algorithm:

for key in old_client.get_key_range(everything):
          columns = old_client.get_slice or get_slice_super(key, all columns)
new_client.batch_insert or batch_insert_super(key, columns)

    The inner loop can be trivially parallelized for speed.

 3. Commitlog does not fsync before reporting a write successful.
    Using blocking writes mitigates this to some degree, since all
    nodes that were part of the write quorum would have to fail
    before sync for data to be lost.

    See https://issues.apache.org/jira/browse/CASSANDRA-182


Additionally, row size (that is, all the data associated with a single
key in a given ColumnFamily) is limited by available memory, for
two reasons:

 1. get_slice offsets are not indexed. Every time you do a get_slice,
    Cassandra has to deserialize the entire ColumnFamily row into
    memory. (This is already fixed in trunk.)

    See https://issues.apache.org/jira/browse/CASSANDRA-172

 2. Compaction deserializes each row before merging.

    See https://issues.apache.org/jira/browse/CASSANDRA-16
    
Something went wrong with that request. Please try again.