AllRowsReader should build tokenRange with given DC or RACK #243

cywjackson opened this Issue Mar 20, 2013 · 7 comments


None yet

2 participants


Currently it just uses
List ranges = keyspace.describeRing();

In a case where I have 2 DC (and replicating on both DC), the AllRowsReader would send the request to both DC, which is not what I really want as I only want it to be run on 1 DC (local to my client).

There should be field for DC and Rack (nullable) in the reader, and the builder would have the method to populate them. The "describeRing()" should be replaced with "describeRing(String dc, String rack)"

Basically anywhere A6x uses describeRing() should be deprecated to use describeRing(final String dc, final String rack) anyway...

any suggestion on workaround for now?

elandau commented Mar 22, 2013

Should be easy to add to the builder. I'll try to get this in by middle of next week.

elandau commented Mar 22, 2013

Actually, come to think of it, I think this might be an issue with how the connection pool is set up. The describe ring only gets back the unique token ranges which should be the same for both regions. The issue with calls going to the other DC have to do with nodes in the other regions being in the connection pool. When setting up your AstyanaxContext using ConnectionPoolConfigurationImpl try calling setLocalDatacenter with the desired DC. That will ensure that hosts from the other dc's are never in the connection pool. As a side note, I did notice that I'm using the default read consistency level. I'll add functionality to make that configurable via the builder.


"ConnectionPoolConfigurationImpl try calling setLocalDatacenter with the desired DC. "

thx for getting back! ya saw that after posting the issue. gonna give this a try now.


setting the local datacenter via ConnectionPoolConfigurationImpl does fix the connection issue. A quick follow-up question if you don't mind:

i have 3 us-east and 3 us-west nodes, the threadpool generated by default based on the subtask.size(), which is from the tokenrange. So I see I have 6 threads. But it looks like only 3 of them are actually used for executing:

"AstyanaxAllRowsReader-0" daemon prio=10 tid=0x00007fdc34003800 nid=0x1214 waiting on condition [0x00007fdc7412d000]
"AstyanaxAllRowsReader-1" daemon prio=10 tid=0x00007fdc34005000 nid=0x1215 runnable [0x00007fdc6cffe000]
"AstyanaxAllRowsReader-2" daemon prio=10 tid=0x00007fdc34009000 nid=0x1216 runnable [0x00007fdc6cefd000]
"AstyanaxAllRowsReader-3" daemon prio=10 tid=0x00007fdc3400a800 nid=0x1217 waiting on condition [0x00007fdc6cdfc000]
"AstyanaxAllRowsReader-4" daemon prio=10 tid=0x00007fdc3400c800 nid=0x1218 runnable [0x00007fdc6ccfb000]
"AstyanaxAllRowsReader-5" daemon prio=10 tid=0x00007fdc3400e800 nid=0x1219 waiting on condition [0x00007fdc6cbfa000]

The state of each thread remains consistent as I monitor via watching jstack (-0, -3, -5 always waiting for jobs to fill in)... normal? are you mapping each thread to a node(tokenRange) within the local dc? I see a localExecutor is created with the size, not sure how that would translate each thread for each tokenRange

In any case, should I set higher concurrencyLevel to speed things up? Or should I use custom executor? (actually don't see a builder method to set custom executor anyway...)

elandau commented Mar 27, 2013

The default is to allocate a thread per token range. It's possible that your ring is not balanced and that you have 3 large ranges and 3 small ranges. Can you send me a ring describe of your cluster using nodetool. I'll try to replicate your exact setup.


" you have 3 large ranges and 3 small ranges." possibly that's the problem. Here is the ring output: us-west 2a Up Normal 51.72 GB 33.33% 0 us-east 1a Up Normal 30.67 GB 0.00% 100 us-west 2b Up Normal 61.58 GB 33.33% 56713727820156410577229101238628035242 us-east 1b Up Normal 70.24 GB 0.00% 56713727820156410577229101238628035342 us-west 2c Up Normal 64.73 GB 33.33% 113427455640312821154458202477256070484 us-east 1c Up Normal 64.47 GB 0.00% 113427455640312821154458202477256070584

Cassandra version 1.0.8 , A6x version 1.56.26
for the KS I am running this allrowsreader, it is rf3 in each dc:
Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Durable Writes: true
Options: [us-west:3, us-east:3]

default read_acl is CL.ONE (which would actually mean the coordinator node can contact nodes in other dc... will try LQ next)
AstyanaxContext is constructed as:

    return new AstyanaxContext.Builder()
            .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
                    // infinite retry policy with 1 min interval
                    .setRetryPolicy(new ConstantBackoff(60000, -1)))
            .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl(host + "_" + keyspace + "_"
                    + localDatacenter).setLocalDatacenter(localDatacenter).setSeeds(host).setSocketTimeout(30000)
            .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())

AllRowsReader builder looking for 1 column in all rows.
new AllRowsReader.Builder<String, String>(getKeyspace(), getColumnFamily()).withColumnSlice(OPERATION)
.withPageSize(1000).forEachPage(new Function<Rows<String, String>, Boolean>(){...}).build().call();

elandau commented Mar 27, 2013

The latest code (not released) includes a method to specify the DC in the call to the AllRowsReader. That should help you filter out the other hosts. You can then set a higher concurrency level so your query runs faster. I'll need to think of a better automatic mechanism for multi-region clusters.

@elandau elandau closed this Mar 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment