Reading Data

opuneet edited this page Jun 5, 2014 · 7 revisions
Clone this wiki locally

Query a single column

ColumnFamily<String, String> CF_STANDARD1 = new ColumnFamily<String, String>("cdstandard1", StringSerializer.get(), StringSerializer.get(). StringSerializer.get());
Column<String> result = keyspace.prepareQuery(CF_STANDARD1)
    .getKey(rowKey)
    .getColumn("Column1")
    .execute().getResult();
String value = result.getStringValue();

Query an entire row

ColumnList<String> result = keyspace.prepareQuery(CF_STANDARD1)
    .getKey(rowKey)
    .execute().getResult();
if (!result.isEmpty()) {
   ...
}

Paginate through all columns in a row

ColumnList<String> columns;
int pageize = 10;
try {
	RowQuery<String, String> query = keyspace
		.prepareQuery(CF_STANDARD1)
		.getKey("A")
		.autoPaginate(true)
		.withColumnRange(new RangeBuilder().setLimit(pageize).build());

	while (!(columns = query.execute().getResult()).isEmpty()) {
		for (Column<String> c : columns) {
		}
	}
} catch (ConnectionException e) {
}

Iterate all rows in a column family

The Rows object returned by the query transparently paginates through all rows in the column family. Since queries to the keyspace are actually done through the iteration it is necessary to set an ExceptionCallback for your application to handle the exceptions. Return true from the callback to retry or false to exit the iteration loop.

Rows<String, String>> rows;
try {
    rows = keyspace.prepareQuery("ColumnFamilyName")
        .getAllRows()
        .setBlockSize(10)
        .withColumnRange(new RangeBuilder().setMaxSize(10).build())
        .setExceptionCallback(new ExceptionCallback() {
             @Override
             public boolean onException(ConnectionException e) {
                 try {
                     Thread.sleep(1000);
                 } catch (InterruptedException e1) {
                 }
                 return true;
             }})
        .execute().getResult();
} catch (ConnectionException e) {
}

// This will never throw an exception
for (Row<String, String> row : rows.getResult()) {
    LOG.info("ROW: " + row.getKey() + " " + row.getColumns().size());
}

If you just want to get the keys then simply add a column slice with size 0

OperationResult<Rows<String, String> result =
keyspace.prepareQuery(CF_STANDARD1)
  .getAllRows()
  .withColumnRange(new RangeBuilder().setLimit(0).build())  // RangeBuilder will be available in version 1.13
  .execute();

Query all with callback

This query breaks up the keys into token ranges and queries each range in a separate thread.

keyspace.prepareQuery(CF_STANDARD1)
    .getAllRows()
    .setRowLimit(100)  // Read in blocks of 100
    .setRepeatLastToken(false)
    .withColumnRange(new RangeBuilder().setLimit(2).build())
    .executeWithCallback(new RowCallback<String, String>() {
        @Override
        public void success(Rows<String, String> rows) {
            // Do something with the rows that were fetched.  Called once per block.
        }

        @Override
        public boolean failure(ConnectionException e) {
            return true;  // Returning true will continue, false will terminate the query
        }
    });

Counting number of columns in a response

Cassandra provides an API to count the number of columns in a reponse without returning the query data.  This is not a constant time operation because Cassandra actually has to read the row and count the columns.  This will be optimized in a future version.

int count = keyspace.prepareQuery(CF_STANDARD1)
    .getKey(rowKey)
    .getCount()
    .execute().getResult();

Column slice queries

Use a column slice to narrow down the range of columns returned in a query.  A column slice can be added to any of the queries by calling setColumnSlice on the query object prior to calling execute().   Columns slices come in two flavors, column slice and column range. Use wtihColumnSlice to return a non-contiguous set of columns. Use withColumnRange to return an ordered range of slices.

This is the general format of a column slice.

ColumnList<String> result;
result = keyspace.prepareQuery(CF_STANDARD1)
   .getKey(rowKey)
   .withColumnRange(new RangeBuilder().setStart("firstColumn").setEnd("lastColumn").setMaxSize(100).build())
   .execute().getResult();
if (!result.isEmpty()) {
    ...
}

Query columns with prefix

Let's assume you have data that looks like this,

CF_STANDARD1:{
    "Prefixes":{
        "Prefix1_a":1,
        "Prefix1_b":2,
        "Prefix2_a":3,
    }
}

To get a slice of columns that start with "Prefix1", perform the following query

OperationResult<ColumnList<String>> r = keyspace.prepareQuery(CF_STANDARD1)
    .getKey("Prefixes")
    .withColumnRange(new RangeBuilder()
        .setStart("Prefix1_\u00000")
        .setEnd("Prefix1_\uffff")
        .setLimit(Integer.MAX_VALUE).build())
    .execute();

Query for first 5 columns

OperationResult<ColumnList<String>> r = keyspace.prepareQuery(CF_STANDARD1)
    .getKey(rowKey)
    .withColumnRange(new RangeBuilder().setMaxSize(5).build())
    .execute();

Query for last N columns

OperationResult<ColumnList<String>> r = keyspace.prepareQuery(CF_STANDARD1)
    .getKey(rowKey)
    .withColumnRange(new RangeBuilder().setReversed().setLimit(5).build())
    .execute();

Query for specific column names

Use this type of column slice when you have fixed column names.

OperationResult<ColumnList<String>> r = keyspace.prepareQuery(CF_STANDARD1)
    .getKey(rowKey)
    .withColumnSlice("First", "Last", "Age")
    .execute();

How to use an index query

To use secondary indexes you must first configure your column family with column metadata that tells cassandra for which columns to create to secondary index. Cassandra currrently only supports KEYS index types which is essentially a hash lookup.

create column family UserInfo with
  comparator = UTF8Type and
  column_metadata =
  [
    {column_name: first, validation_class: UTF8Type},
    {column_name: last, validation_class: UTF8Type},
    {column_name: age, validation_class: UTF8Type, index_type: KEYS}
  ];
OperationResult<Rows<String, String>> result;
result = keyspace.prepareQuery(CF_STANDARD1)
    .searchWithIndex()
    .setLimit(100)   // Number of rows returned
    .addExpression()
        .whereColumn("age").equals().value(26)
    .execute();

In the event that you want to reuse an index expression then you can create a prepared index expression from the column family and then provide it to the index query by calling addPreparedExpressions. Expressions in the list are anded (sorry there is no OR in Cassandra).

PreparedIndexExpression<String, String> clause = CF_STANDARD1.newIndexClause().whereColumn("Index1").equals().value(26);
OperationResult<Rows<String, String>> result;
result = keyspace.prepareQuery(MockConstants.CF_STANDARD1)
	.searchWithIndex()
	.setStartKey("")
	.addPreparedExpressions(Arrays.asList(clause))
	.execute();

When using an index query to query large result sets it is best to paginate through the result otherwise you are likely to get timeout exceptions from cassandra.

IndexQuery<String, String> query = keyspace.prepareQuery(CF_STANDARD1)
	.searchWithIndex()
		.setLimit(10)  // This is the page size
		.setIsPaginating()
		.addExpression()
			.whereColumn("Index2").equals().value(42);

	while (!(result = query.execute()).getResult().isEmpty()) {
		pageCount++;
		rowCount += result.getResult().size();
		for (Row<String, String> row : result.getResult()) {
		}
	}