Skip to content

AllRowsReader All rows query

miket-ap edited this page Mar 24, 2014 · 3 revisions

A common (and arguably bad) use case for cassandra clients is to read all the data in a column family. Astyanax provides a recipe to perform this operation in parallel and using pagination so as not to put excessive heap pressure on the Cassandra nodes.

boolean result = new AllRowsReader.Builder<String, String>(keyspace, CF_STANDARD1)
        .withPageSize(100) // Read 100 rows at a time
        .withConcurrencyLevel(10) // Split entire token range into 10.  Default is by number of nodes.
        .withPartitioner(null) // this will use keyspace's partitioner
        .forEachRow(new Function<Row<String, String>, Boolean>() {
            @Override
            public Boolean apply(@Nullable Row<String, String> row) {
                // Process the row here ...
                // This will be called from multiple threads so make sure your code is thread safe
                return true;
            }
        })
        .build()
        .call();

Note: Astyanax uses the "Function" class from com.google.common.base. "@Nullable" comes from javax.annotations located in the com.google.code.findbugs:jsr305 artifact. The jsr305 is not included automatically by Maven because Google defines its scope as "provided". So if you get any errors in your IDE, make sure to include it.

Reading only the row keys

boolean result = new AllRowsReader.Builder<String, String>(keyspace, CF_STANDARD1)
        .withColumnRange(null, null, false, 0)
        .withPartitioner(null) // this will use keyspace's partitioner
        .forEachRow(new Function<Row<String, String>, Boolean>() {
            @Override
            public Boolean apply(@Nullable Row<String, String> row) {
                // Process the row here ...
                return true;
            }
        })
        .build()
        .call();
Clone this wiki locally