CASSANDRA-18692 Fix bulk writes with Buffered RowBufferMode #13

frankgh · 2023-07-25T18:46:21Z

When setting Buffered RowBufferMode as part of the WriterOptions, org.apache.cassandra.spark.bulkwriter.RecordWriter ignores that configuration and instead uses the batch size to determine when to finalize an SSTable and start writing a new SSTable, if more rows are available.

In this commit, we fix org.apache.cassandra.spark.bulkwriter.RecordWriter#checkBatchSize to take into account the configured RowBufferMode. And in specific to the case of the UNBUFFERED RowBufferMode, we check then the batchSize of the SSTable during writes, and for the case of BUFFERED that check will take no effect.

Co-authored-by: Doug Rohrer doug@therohrers.org

When setting Buffered RowBufferMode as part of the `WriterOption`s, `org.apache.cassandra.spark.bulkwriter.RecordWriter` ignores that configuration and instead uses the batch size to determine when to finalize an SSTable and start writing a new SSTable, if more rows are available. In this commit, we fix `org.apache.cassandra.spark.bulkwriter.RecordWriter#checkBatchSize` to take into account the configured `RowBufferMode`. And in specific to the case of the `UNBUFFERED` RowBufferMode, we check then the batchSize of the SSTable during writes, and for the case of `BUFFERED` that check will take no effect. Co-authored-by: Doug Rohrer <doug@therohrers.org>

dineshjoshi · 2023-08-08T17:59:06Z

+1 LGTM.

yifan-c

some nits. Looks good to me.

yifan-c · 2023-08-08T20:25:43Z

cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RecordWriter.java

+            if (sstableWriter != null)
            {
-               finalizeSSTable(streamSession, partitionId, sstableWriter, batchNumber, batchSize);
+                finalizeSSTable(streamSession, partitionId, sstableWriter, batchNumber, batchSize);


nit: reset sstableWriter to null after finalizeSSTable

Given this is essentially the last step before we're done with the sstablewriter, we don't really need to set the sstablewriter to null here - this RecordWriter instance should be collected as soon as the upload/commits finish, and we close the sstablewriter on finalize so we should be good w/o nulling it out.

yifan-c · 2023-08-08T20:40:39Z

cassandra-four-zero/src/main/java/org/apache/cassandra/bridge/SSTableWriterImplementation.java

+        }
+        else if (rowBufferMode == RowBufferMode.BUFFERED)
+        {
+            builder.withBufferSizeInMB(bufferSizeMB);


nit: maybe decide a valid value range for bufferSizeMB and validate. CQLSSTableWriter accepts whatever int value.
For the upper bound, I think 1/2 of the spark.executor.memory is a good limit.

I think, for now, we leave this just a configuration option... Validating it given the Spark environment and picking a reasonable upper-bound would take some experimentation (and would likely involve more than just the executor.memory setting, as there are other config settings that deal w/ memory like memory overhead).

frankgh force-pushed the CASSANDRA-18692 branch from 4a1cf7a to b7e6bea Compare July 25, 2023 18:47

frankgh force-pushed the CASSANDRA-18692 branch from b7e6bea to d4b5884 Compare July 25, 2023 19:41

frankgh added 2 commits July 25, 2023 13:35

Fix ReflectionUtils

2a13a00

Fix TokenPartitioner to only split token range by DC-local instances

9a38263

yifan-c reviewed Aug 8, 2023

View reviewed changes

yifan-c closed this Aug 9, 2023

frankgh deleted the CASSANDRA-18692 branch February 14, 2024 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CASSANDRA-18692 Fix bulk writes with Buffered RowBufferMode #13

CASSANDRA-18692 Fix bulk writes with Buffered RowBufferMode #13

Uh oh!

frankgh commented Jul 25, 2023 •

edited

Loading

Uh oh!

dineshjoshi commented Aug 8, 2023

Uh oh!

yifan-c left a comment

Uh oh!

yifan-c Aug 8, 2023

Uh oh!

JeetKunDoug Aug 8, 2023

Uh oh!

yifan-c Aug 8, 2023

Uh oh!

JeetKunDoug Aug 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CASSANDRA-18692 Fix bulk writes with Buffered RowBufferMode #13

CASSANDRA-18692 Fix bulk writes with Buffered RowBufferMode #13

Uh oh!

Conversation

frankgh commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dineshjoshi commented Aug 8, 2023

Uh oh!

yifan-c left a comment

Choose a reason for hiding this comment

Uh oh!

yifan-c Aug 8, 2023

Choose a reason for hiding this comment

Uh oh!

JeetKunDoug Aug 8, 2023

Choose a reason for hiding this comment

Uh oh!

yifan-c Aug 8, 2023

Choose a reason for hiding this comment

Uh oh!

JeetKunDoug Aug 8, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

frankgh commented Jul 25, 2023 •

edited

Loading