New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-5340: Batch splitting should preserve magic and transactional flag #3162
Conversation
@becketqin Maybe you can take a look at this? |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
@@ -238,7 +238,7 @@ public RecordsInfo info() { | |||
} | |||
} | |||
|
|||
public void setProducerState(long producerId, short producerEpoch, int baseSequence) { | |||
public void setProducerState(long producerId, short producerEpoch, int baseSequence, boolean isTransactional) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, the reason I had to change this is that we need to close the record builder in order to split it. At that point, we don't have a producerId yet, so if isTransactional
is set to true, then MemoryRecordsBuilder.close()
will raise an exception. To get around that, this change ensures that we always have a producerId when we set isTransactional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a comment about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good general improvement, to set all this data right at the very end.
Thunk thunk = thunkIter.next(); | ||
if (batch == null) { | ||
batch = createBatchOffAccumulatorForRecord(record, splitBatchSize); | ||
while (recordBatchIter.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this logic: we expect memoryRecords.batches()
to only have one batch but here we are expecting it to have many?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the message format is v0 or v1, then we could have multiple batches (each record is a batch of size 1). Obviously the batch splitting is intended for batches with multiple records, but it felt a little awkward and unnecessary to restrict this function to only magic >= 2 or compression != NONE.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
…ion before sending records
@guozhangwang Comments addressed. Please take another look. |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
… splitting batches
RecordBatch recordBatch = recordBatchIter.next(); | ||
if (recordBatch.magic() < MAGIC_VALUE_V2 && !recordBatch.isCompressed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the check in the Sender.completeBatch()
as well to note call split in this case? Otherwise if the producer was sending uncompressed messages and one of the message in a batch a too large, it seems the producer will not fire the callback with correct exception.
This would probably be a rare case because a big message will typically get sent in a dedicated batch if compression is none. But it is theoretically possible if user configured the producer batch size to be larger than the max.message.size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that makes sense. I will update the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@becketqin Actually, in my original patch, I modified this code to handle all cases. Perhaps it would be a little simpler to revert to that behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I'm just going to add the check in Sender.completeBatch()
and follow the behavior prior to KIP-126 if it is the old message format without compression.
@@ -218,7 +226,7 @@ private ProducerBatch createBatchOffAccumulatorForRecord(Record record, int batc | |||
record.key(), record.value(), record.headers()), batchSize); | |||
ByteBuffer buffer = ByteBuffer.allocate(initialSize); | |||
MemoryRecordsBuilder builder = MemoryRecords.builder(buffer, magic(), recordsBuilder.compressionType(), | |||
TimestampType.CREATE_TIME, 0L, recordsBuilder.isTransactional()); | |||
TimestampType.CREATE_TIME, 0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, it seems that the transaction is the same as the parent batch before the change. Wouldn't that preserve the transactional flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I attempted to fix this previously, but the logic was incorrect. The builder requires that the producerId and the producer epoch are both set if isTransactional
is set to true. I felt like this was a useful sanity check, so I changed the logic to pass isTransactional
at the same time that we set the producer id and epoch. Does that make sense?
@@ -232,6 +232,15 @@ public RuntimeException lastError() { | |||
return lastError; | |||
} | |||
|
|||
public synchronized boolean ensurePartitionAdded(TopicPartition tp) { | |||
if (isInTransaction() && !partitionsInTransaction.contains(tp)) { | |||
transitionToFatalError(new IllegalStateException("Attempted to dequeue a record batch to send " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this ever happen assuming we do not have bugs? If yes then we should probably throw IllegalStateException directly to indicate a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not happen normally, but if there is a bug in the code, then the result is basically a corrupted topic, so I felt it is worth having the check. Transitioning to the fatal error state ensures that the user will see the error and that no further progress can be made. If we just threw the exception, the Sender would simply die apparently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I was following the ordinary rule for using RTE for any unexpected bugs, but this is definitely better in operations.
if (transactionManager != null) | ||
isTransactional = transactionManager.isInTransaction(); | ||
return MemoryRecords.builder(buffer, maxUsableMagic, compression, TimestampType.CREATE_TIME, 0L, isTransactional); | ||
return MemoryRecords.builder(buffer, maxUsableMagic, compression, TimestampType.CREATE_TIME, 0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the builder()
function with isTransactional
at line 377 in MemoryRecords
are not externally used any more; could we remove it then?
int maxRetries = 1; | ||
String topic = "testSplitBatchAndSend"; | ||
String topic = tp.topic(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup!
LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a minor comment, but otherwise this looks good to me. Thanks!
RecordBatch recordBatch = recordBatchIter.next(); | ||
if (recordBatch.magic() < MAGIC_VALUE_V2 && !recordBatch.isCompressed()) | ||
throw new IllegalArgumentException("Batch splitting cannot be used with non-compressed messages " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording could be improved: "Batch splitting cannot be used with non-compressed messages, NOR with message format versions v0 and v1"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rewording is not quite right. Batch splitting can be used for v0 and v1 if compression is enabled.
@@ -238,7 +238,7 @@ public RecordsInfo info() { | |||
} | |||
} | |||
|
|||
public void setProducerState(long producerId, short producerEpoch, int baseSequence) { | |||
public void setProducerState(long producerId, short producerEpoch, int baseSequence, boolean isTransactional) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good general improvement, to set all this data right at the very end.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Pushed a fix to address @becketqin's comments. If there are no further comments, I will merge later this evening. |
Thanks for the patch. LGTM. |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Author: Jason Gustafson <jason@confluent.io> Reviewers: Apurva Mehta <apurva@confluent.io>, Jiangjie Qin <becket.qin@gmail.com>, Guozhang Wang <wangguoz@gmail.com> Closes #3162 from hachikuji/KAFKA-5340 (cherry picked from commit e4a6b50) Signed-off-by: Jason Gustafson <jason@confluent.io>
No description provided.