Cassandra Source: internal queue isn't drained fast enough under load resulting in loop #300

CodeSmell · 2017-10-18T02:14:36Z

When inserting data into Cassandra under even modest loads (approx >100 messages per minute), it can cause an "infinite loop" in the Cassandra Source that causes the connector to stop processing.

After firing off a query, the CassandraTableReader will take each row in the result set and place them onto an internal LinkedBlockingQueue. After the result set is processed, the CassandraSourceTask will drain some of the messages (default 100) off of the internal queue at the start of the next polling cycle and return them as a List of SourceRecord(s) to be published to Kafka.

Under load the internal queue continues to grow faster than the messages are being drained. Eventually it reaches the limit (default 10,000) and no more messages can be placed on the queue. However, the CassandraTableReader and the CassandraSourceTask get stuck in a loop where

nothing can be added to the internal queue until some messages are drained
no messages are drained from the queue until querying ends
the querying can't end until the result set is fully processed

Willing to submit PR for the fix

The text was updated successfully, but these errors were encountered:

andrewstevenson · 2017-12-18T10:30:09Z

@CodeSmell Can this one be closed?

This was referenced Oct 18, 2017

Cassandra Source: make time slice configurable #278

Closed

FIX: eliminate "deadlock" in Cassandra Source querying #305

Merged

Cassandra Source: added delay to the time slice upper bounds calculation #307

Merged

andrewstevenson closed this as completed Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cassandra Source: internal queue isn't drained fast enough under load resulting in loop #300

Cassandra Source: internal queue isn't drained fast enough under load resulting in loop #300

CodeSmell commented Oct 18, 2017 •

edited

Loading

andrewstevenson commented Dec 18, 2017

Cassandra Source: internal queue isn't drained fast enough under load resulting in loop #300

Cassandra Source: internal queue isn't drained fast enough under load resulting in loop #300

Comments

CodeSmell commented Oct 18, 2017 • edited Loading

andrewstevenson commented Dec 18, 2017

CodeSmell commented Oct 18, 2017 •

edited

Loading