NodeDiscovery and TokenAwareConnectionPools assume RandomPartitioner #96

Closed
shawnsmith opened this Issue Jul 26, 2012 · 7 comments

Comments

Projects
None yet
4 participants

The NodeDiscoveryImpl class and Operation and Topology interfaces assume that tokens are represented as BigInteger, which is true only for the RandomPartitioner. Also, the RandomPartitioner is hard-coded in a few places such as ThriftAllRowsImpl, ThriftColumnFamilyQueryImpl, ThriftKeyspaceImpl.

Is Astyanax in general intended to work with Cassandra deployments that use order preserving partitioners such as the ByteOrderedPartitioner?

The Configuration wiki page should be updated to mention which features assume the RandomPartitioner and which features work with all partitioners.

FWIW, I have a prototype that doesn't assume RandomPartitioner or BigInteger here: https://github.com/bazaarvoice/astyanax/tree/partitioner

I'd be happy to send a pull request, but I'd like an opinion whether this is heading in the right direction.

Notes about the changes:

  • Most references to BigInteger have been replaced with Token. This affected some public interfaces like Operation and AstyanaxContext.withHostSupplier(). Custom code that uses those interfaces may need to be updated. In many places, the only changes required are to replace new BigInteger(n) with new BigIntegerToken(n).
  • The client calls describePartitioner() the first time it needs to know the partitioner and caches the result. Usually the query happens during start() when the RingDescribeHostSupplier needs to parse the tokens returned by describeRing().
  • It might be convenient if the TokenRange interface getStartToken() and getEndToken() methods returned Token instead of String. But making that change will break client code that calls Keyspace.describeRing() and could affect a lot of Astyanax users. A smaller impact change might be to keep the TokenRange string methods but add new token versions: Token getStart() and Token getEnd(). But that would add backward-compatibility cruft to the API, so for now I've left TokenRange unchanged.
  • All the tests still use RandomPartitioner.

Note, with the new Murmer3Partitioner coming in Cassandra 1.2, tokens will be 64-bit signed long values, not 127-bit unsigned BigInteger values. So Astyanax code that depends on RandomPartitioner and BigInteger will likely need to change to support the new partitioner.

I am using MD5 hash as a row keys for most of my data, this is easiest way to avoid duplicates in text content from different sources etc.

So that ByteOrderedPartitioner is natural requirement... although it is "not recommended" (for other use cases).

Contributor

DeltaFlight commented Feb 22, 2013

Astyanax fails in non-obvious way with Cassandra 1.2 Murmur3Partitioner. When I try to make a row scan, there is just a TransportException in client logs without any hint. And there is an exception in cassandra server log:
CustomTThreadPoolServer.java (line 217) Error occurred during processing of message.
java.lang.NumberFormatException: For input string: "158386711850734882849992246407603540419"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:444)
at java.lang.Long.valueOf(Long.java:540)
at org.apache.cassandra.dht.Murmur3Partitioner$1.fromString(Murmur3Partitioner.java:186)

Murmur3Partitioner is default in Cassandra 1.2, so hardcoded RandomPartitioner is a problem.

Member

elandau commented Feb 27, 2013

With the next release of astyanax just call setPartitioner(new Murmur3Partitioner()) on the connection pool configuration. Make sure to use the Murmur3Partitioner from astyanax (not cassandra)

elandau closed this Feb 27, 2013

I am using ByteOrderedPartitioner and there are some exceptions; as a workaround I am trying this:

{
keyspaceContext = new AstyanaxContext.Builder()
.forCluster(clusterName)
.forKeyspace(keyspaceName)
.withAstyanaxConfiguration(
new AstyanaxConfigurationImpl()
.registerPartitioner(org.apache.cassandra.dht.ByteOrderedPartitioner.class.getCanonicalName(),
BigInteger127Partitioner.get())
}

I think one can implement Partitioner interface... please confirm.. thanks

Another use case for BOP: we need to retrieve the most recent (wide) row. So that we can use (Long.MAX_VALUE - System.currentTimeInMillis()) as a RowKey, with BOP.

No way to implement is with Astyanax...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment