getAllRows causes NumberFormatException inside Cassandra #219

Closed
dzello opened this Issue Feb 24, 2013 · 9 comments

Comments

Projects
None yet
3 participants

dzello commented Feb 24, 2013

On Cassandra 1.2 and Astyanax 1.56.26

getAllRows() results in a get_range_slices thrift call w/ a long number such as 146322535224263366150011057923360132641. Cassandra blows up with a NumberFormatException trying to parse it:

java.lang.NumberFormatException: For input string: "146322535224263366150011057923360132641"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:422)
at java.lang.Long.valueOf(Long.java:525)
at org.apache.cassandra.dht.Murmur3Partitioner$1.fromString(Murmur3Partitioner.java:186)
at org.apache.cassandra.thrift.ThriftValidation.validateKeyRange(ThriftValidation.java:489)
at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:919)

This is against a CF w/ a UTF8Type key_validation_class. I'm somewhat new to Cassandra/Astyanax so I'm not exactly sure what's up - any thoughts? Thanks!

I am having the same issue - it doesn't seem to matter what options (if any) I add to the query - basically once things get into this state (I'm not sure what triggers it but it doesn't usually take long after I reinitialize the keyspaces) any call to prepareQuery(CF_xx)..getAllRows().execute() will cause the problem. Any workarounds, please? Thanks!

Member

elandau commented Feb 25, 2013

This is a result of the client using RandomPartitioner with Cassandra configured with Murmur3Partitioner. I'm working on a fix to support Murmer3Partitioner.

dzello commented Feb 25, 2013

Cool, thanks!

Member

elandau commented Feb 25, 2013

If you are able to pick up the latest code please try the AllRowsReader. You will need to set the Partitioner by calling withPartitioner(new Murmur3Partitioner()) on the AllRowsReader.Builder. I'm working on getting the getAllRows call working. It actually has some other issues that need to be fixed. I'm actually thinking of deprecating the getAllRows() call since it has gotten too complex. Moving forward I'd prefer people us the AllRowsReader.

dzello commented Feb 25, 2013

Ok great, will try it out today.

dzello commented Feb 26, 2013

Didn't get a chance to try it out - actually using the CQL stuff for this. But thanks for responding.

I've switched to AllRowsReader and it works most of the time, but not always. I see an intermittent - though fairly frequent - failure that seems to land on the following handler:

    ...
    catch (Throwable t) {
        LOG.warn("AllRowsReader terminated", t);
        cancel();
        throw new RuntimeException("Error reading all rows", t);
    }

I've added "+ t.toString()" to the string being thrown to see more, and I see things like the following when it blows up:

com.netflix.astyanax.recipes.reader.AllRowsReader - Error process token/key range

... weird timeout stuff ...

WARN com.netflix.astyanax.recipes.reader.AllRowsReader - AllRowsReader terminated
java.util.concurrent.CancellationException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:250)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at com.netflix.astyanax.recipes.reader.AllRowsReader.waitForTasksToFinish(AllRowsReader.java:508)
at com.netflix.astyanax.recipes.reader.AllRowsReader.call(AllRowsReader.java:480)

I'll keep trying; fortunately we use this only for diagnostic dumps and such (never for real traffic), so it's not a huge problem, but it's kind of scary ...

Thanks
PeterK

P.S. For reference, my initialization call and an example of one of the "get all rows" calls:

AstyanaxContext<Keyspace> context = new AstyanaxContext.Builder()
    .forCluster(CC_name)
    .forKeyspace(ks_name)
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
        .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
        .setConnectionPoolType(ConnectionPoolType.ROUND_ROBIN)
    )
    .withConnectionPoolConfiguration(
        new ConnectionPoolConfigurationImpl("MyConnectionPool")
            .setSeeds(seeds)
            .setMaxConnsPerHost(100)
            .setInitConnsPerHost(10)
            .setSocketTimeout(30000)
            .setMaxTimeoutWhenExhausted(2000)
    )
    .withConnectionPoolMonitor(new Slf4jConnectionPoolMonitorImpl())
    .buildKeyspace(ThriftFamilyFactory.getInstance());

rslt = new AllRowsReader.Builder<Long, String>(kspc_xxxx, CF_xxxx)
.forEachRow(new Function<Row<Long, String>, Boolean>()
{
@override
public Boolean apply(Row<Long, String> row)
{
...
return true;
}
})
.withPartitioner(new Murmur3Partitioner())
.build()
.call();

BTW, in one of my Cassandra logs I see the same errors as before (these are at the tail so I'm pretty sure they're still happening even with the new stuff);

ERROR [Thrift:938] 2013-02-26 20:38:28,234 CustomTThreadPoolServer.java (line 217) Error occurred during processing of message.
java.lang.NumberFormatException: For input string: "46769749548241204642636717861545760179"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:422)
at java.lang.Long.valueOf(Long.java:525)
at org.apache.cassandra.dht.Murmur3Partitioner$1.fromString(Murmur3Partitioner.java:186)
at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:947)
at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:3454)
at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:3442)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

I'm not claiming to be 100% sure that I have all the new code deployed everywhere but I did check and I think I have no calls to getAllRows() anymore. FWIW ...

Member

elandau commented Feb 27, 2013

Should be all fixed now. I'm also auto detecting the partitioner so you don't have to specify it in the configuration.

elandau closed this Feb 27, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment