Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra Source: internal queue isn't drained fast enough under load resulting in loop #300

Closed
CodeSmell opened this issue Oct 18, 2017 · 1 comment

Comments

@CodeSmell
Copy link
Contributor

CodeSmell commented Oct 18, 2017

When inserting data into Cassandra under even modest loads (approx >100 messages per minute), it can cause an "infinite loop" in the Cassandra Source that causes the connector to stop processing.

After firing off a query, the CassandraTableReader will take each row in the result set and place them onto an internal LinkedBlockingQueue. After the result set is processed, the CassandraSourceTask will drain some of the messages (default 100) off of the internal queue at the start of the next polling cycle and return them as a List of SourceRecord(s) to be published to Kafka.

Under load the internal queue continues to grow faster than the messages are being drained. Eventually it reaches the limit (default 10,000) and no more messages can be placed on the queue. However, the CassandraTableReader and the CassandraSourceTask get stuck in a loop where

  • nothing can be added to the internal queue until some messages are drained
  • no messages are drained from the queue until querying ends
  • the querying can't end until the result set is fully processed

Willing to submit PR for the fix

@andrewstevenson
Copy link
Contributor

@CodeSmell Can this one be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants