Skip to content
This repository has been archived by the owner. It is now read-only.

Use same load balancing policy as fortis-services #118

Merged
merged 1 commit into from Aug 28, 2017

Conversation

Projects
None yet
2 participants
@c-w
Copy link
Member

commented Aug 28, 2017

For some reason we're seeing ~80ms write times from fortis-services to Cassandra but multi-second write times from fortis-spark to Cassandra.

One of the differences between the two setups is that fortis-services uses the NodeJS connector which has a different default load-balancing policy than the Spark connector used in fortis-spark which assumes that Spark workers and Cassandra nodes are co-located on the same hosts (which is not the case in our deployment). This change makes the setup consistent so that both projects use the same load-balancing policy.

@c-w c-w requested a review from erikschlegel Aug 28, 2017

@c-w c-w added the in progress label Aug 28, 2017

@jcjimenez
Copy link
Contributor

left a comment

LGTM with question regarding copy-pasted code.


/**
* Copy-pasted version of DefaultConnectionFactory to over-write the load balancing policy
* There is only one change called out by START/END comments below

This comment has been minimized.

Copy link
@jcjimenez

jcjimenez Aug 28, 2017

Contributor

What if we do something like this instead of copy-pasting so much code?

class FortisConnectionFactory extends CassandraConnectionFactory {
  override def createCluster(conf: CassandraConnectorConf): Cluster = {
    val lbp = new TokenAwarePolicy(new DCAwareRoundRobinPolicy.Builder().build())
    DefaultConnectionFactory.clusterBuilder(conf).withLoadBalancingPolicy(lbp).build()
  }
}

This comment has been minimized.

Copy link
@c-w

c-w Aug 28, 2017

Author Member

Nice one. Fixed in e86952b.

@c-w

This comment has been minimized.

Copy link
Member Author

commented Aug 28, 2017

Our tests pass but running this in Spark actually crashes:

C:\Repos\project-fortis-spark>spark-submit --class CounterDemo --master local[3] target/scala-2.11/project-fortis-spark-assembly-0.0.6.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Exception in thread "main" java.lang.IllegalArgumentException: Singleton object not available: ConnectionFactory$
        at com.datastax.spark.connector.util.ReflectionUtil$.findGlobalObject(ReflectionUtil.scala:55)
        at com.datastax.spark.connector.cql.CassandraConnectionFactory$$anonfun$fromSparkConf$1.apply(CassandraConnectionFactory.scala:155)
        at com.datastax.spark.connector.cql.CassandraConnectionFactory$$anonfun$fromSparkConf$1.apply(CassandraConnectionFactory.scala:155)
        at scala.Option.map(Option.scala:146)
        at com.datastax.spark.connector.cql.CassandraConnectionFactory$.fromSparkConf(CassandraConnectionFactory.scala:155)

This is because for a Scala object (not class), getClass.getName appends a dollar sign at the end which leads the class lookup by name to fail.

Fixed this in 2f382aa.

@c-w c-w force-pushed the cassandra-load-balancing-policy branch from db9c4e7 to e86952b Aug 28, 2017

Use same load balancing policy as fortis-services
For some reason we're seeing ~80ms write times from fortis-services to
Cassandra but multi-second write times from fortis-spark to Cassandra.
One of the differences between the two setups is that fortis-services
uses the NodeJS connector which has a different default load-balancing
policy than the Spark connector used in fortis-spark which assumes that
Spark workers and Cassandra nodes are co-located on the same hosts
(which is not the case in our deployment). This change makes the setup
consistent so that both projects use the same load-balancing policy.

@c-w c-w force-pushed the cassandra-load-balancing-policy branch from e86952b to 2f382aa Aug 28, 2017

@c-w

This comment has been minimized.

Copy link
Member Author

commented Aug 28, 2017

This is a ~50% performance improvement over the previous implementation but still orders of magnitude slower than it should be.

@c-w c-w merged commit 2e05e3a into master Aug 28, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@c-w c-w deleted the cassandra-load-balancing-policy branch Aug 28, 2017

@c-w c-w removed the in progress label Aug 28, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.