GitHub - edanuff/CassandraCompositeType: CompositeType for composite column names in Cassandra

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src		src
.gitignore		.gitignore
README.txt		README.txt
pom.xml		pom.xml

Repository files navigation

CompositeType for Cassandra

For a more detailed description of how this might be used, please read:

http://www.slideshare.net/edanuff/indexing-in-cassandra

http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html

For an example implementation of using it for indexing, see:

https://github.com/edanuff/CassandraIndexedCollections

-------------------------------------------------------------------------------
Note:

Be aware that most of the capabilities provided by original CompositeType comparer
have been incorporated in the proposed patch described at
https://issues.apache.org/jira/browse/CASSANDRA-2231 and are now incorporated
into Cassandra 0.8.1.

A version of new CompositeType and DynamicCompositeType that has been backported
to 7.x is available at:

https://github.com/riptano/hector-composite

The index-building techniques described here will still be applicable
though and should be used in place of SuperColumn-based approaches and,
once support for composite-based indexes are incorporated into the major
client libraries, will often be preferable to the built-in secondary
indexes as well.
-------------------------------------------------------------------------------

Although Cassandra provides SuperColumns which allow you to have columns
containing columns, it's often desirable to use regular columns and to
be able to combine two or more ids into a sortable column name.  This
is especially important for building inverted indexes for searches or
being able to do things like creating a table to map user names sorted
by last name then first name to user ids.

This can accomplished by concatenating two ids together with a separator,
such as "last.first.randomuuid" or "item.version", and assuming that each
id is properly passed and compares correctly via bytewise comparison, this
approach might work.  But, if the ids are numerical or time-based UUIDs for
example, the results are usually less than satisfactory.

CompositeType is a comparer that allows you to combine the existing
Cassandra types into a composite type that will then be compared correctly
for each of the component types.

To use this, you must specify the comparer in your cassandra.yaml file:

  column_families:
    - name: Stuff
      compare_with: compositecomparer.CompositeType

To construct a composite name for a new column, use the following:

Composite c = new Composite();
c.addUTF8("smith").addUTF8("bob").addLong(System.currentTimeMillis());
ByteBuffer column_name = c.serializeToByteBuffer();

A convenience method is provided as well, although it makes certain assumptions
that you might want to verify are applicable.  You use it like this:

import static compositecomparer.Composite.serialize;

byte[] cname = serialize("smith", "bob", new Long(System.currentTimeMillis()));

If you wanted to find all users with the last name "smith" whose name started
with "b", you could do the following:

byte[] slice_start = serialize("smith", "b");
byte[] slice_end = serialize("smith", "b\uFFFF");

This has also been updated to work with ByteBuffers for Cassandra 7.0:

import static compositecomparer.Composite.serializeToByteBuffer;

ByteBuffer cname = serializeToByteBuffer("document", version);

The composite type is encoded as a byte array consisting of a four byte prefix
containing an identifier and a version number, followed by each component part.
Each component part starts with 1 byte to specify the component type, and then
for variable length types such as ASCII strings, 2 bytes are used for the length
of the string.