-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24720][STREAMING-KAFKA] add option to align ranges with offset having records to support kafka transaction #21917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@QuentinAmbard, thanks! I am a bot who has found some folks who might be able to help with the review:@tdas, @zsxwing and @koeninger |
jenkins, ok to test |
Test build #93803 has finished for PR 21917 at commit
|
Test build #94055 has finished for PR 21917 at commit
|
Test build #94056 has finished for PR 21917 at commit
|
Test build #94058 has finished for PR 21917 at commit
|
val offsetAndCount = localRw.getLastOffsetAndCount(localOffsets(tp), tp, o) | ||
(tp, offsetAndCount) | ||
} | ||
}).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the benefit gained by doing a duplicate read of all the messages?
With this solution we don't read the data another time "just to support transaction."
if use an offset range of [1,5), you will try to read 4 but won't receive any data. In this scenario you can't tell if data is missing (so it's ok) or you lose some data because kafka is down (not ok) To deal with this situation, we first scan the offset and stop to the last offset where we had data, in the example instead of [1,5) we would go with [1,4) because 3 has data so it's safe to stop at 3
does that make sense? |
Still playing devil's advocate here, I don't think stopping at 3 in your example actually tells you anything about the cause of the gaps in the sequence at 4. I'm not sure you can know that the gap is because of a transaction marker, without a modified kafka consumer library. If the actual problem is that when allowNonConsecutiveOffsets is set we need to allow gaps even at the end of an offset range... why not just fix that directly? Master is updated to kafka 2.0 at this point, so we should be able to write a test for your original jira example of a partition consisting of 1 message followed by 1 transaction commit. |
I'm not sure to understand your point. The cause of the gap doesn't matter, we just want to stop on an existing offset to be able to poll it. It can be because of a transaction marker, a transaction abort or even just a temporary poll failure it's not relevant in this case. |
If the last offset in the range as calculated by the driver is 5, and on the executor all you can poll up to after a repeated attempt is 3, and the user already told you to allowNonConsecutiveOffsets... then you're done, no error. Why does it matter if you do this logic when you're reading all the messages in advance and counting, or when you're actually computing? To put it another way, this PR is a lot of code change and refactoring, why not just change the logic of e.g. how CompactedKafkaRDDIterator interacts with compactedNext? |
If you are doing it in advance you'll change the range, so for example you read until 3 and don't get any extra results. Maybe it's because of a transaction offset, maybe another issue, it's ok in both cases. |
By failed, you mean returned an empty collection after timing out, even though records should be available? You don't. You also don't know that it isn't just lost because kafka skipped a message. AFAIK from the information you have from a kafka consumer, once you start allowing gaps in offsets, you don't know. I understand your point, but even under your proposal you have no guarantee that the poll won't work in your first pass during RDD construction, and then fail on the executor during computation, right?
Have you tested comparing the results of consumer.endOffsets for consumers with different isolation levels? Your proposal might end up being the best approach anyway, just because of the unfortunate effect of StreamInputInfo and count, but I want to make sure we think this through. |
if (nonConsecutive) { | ||
val localRw = rewinder() | ||
val localOffsets = currentOffsets | ||
context.sparkContext.parallelize(offsets.toList).mapPartitions(tpos => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this isn't a kafka rdd, it isn't going to take advantage of preferred locations, which means it's going to create cached consumers on different executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting I should create a new kafkaRDD instead, and consume from this RDD to get the last offset range?
} | ||
|
||
/** | ||
* Similar to compactedStart but will return None if poll doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean compactedNext?
buffer.previous() | ||
} | ||
|
||
def seekAndPoll(offset: Long, timeout: Long): ConsumerRecords[K, V] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
val fromOffset: Long, | ||
val untilOffset: Long) extends Serializable { | ||
val untilOffset: Long, | ||
val recordNumber: Long) extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mima actually complain about binary compatibility if you just make recordNumber count? It's just an accessor either way...
If so, and you have to do this, I'd name this recordCount consistently throughout. Number could refer to a lot of things that aren't counts.
private def records(offsets: Option[Long]*) = { | ||
offsets.map(o => o.map(new ConsumerRecord("topic", 0, _, "k", "v"))).toList | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests aren't really testing the actual scenario we care about (transaction markers at the end of an offset range), which should be directly testable now that kafka has been upgraded to 2.0
Ok that's interesting, my understanding was that if you successfully poll and get results you are 100% sure that you don't lose anything. Do you have more details on that? Why would kafka skip a record while consuming?
endOffsets returns the last offset (same as seekToEnd). But you're right that the easiest solution for us would be to have something like seekToLastRecord method instead. Maybe something we could also ask ? |
Example report of skipped offsets in a non-compacted non-transactional
situation
http://mail-archives.apache.org/mod_mbox/kafka-users/201801.mbox/%3CCAKWX9VXc1cDosqWwWjK3qmyy3SVvtmH+RJDrjyvsBeJSds8ewQ@mail.gmail.com%3EFo
I asked on the kafka list about ways to tell if an offset is a
transactional marker. I also asked about endOffset alternatives, although
I think that doesn't totally solve the problem (for instance, in cases
where the batch size has been rate limited)
…On Mon, Aug 6, 2018 at 2:57 AM, Quentin Ambard ***@***.***> wrote:
By failed, you mean returned an empty collection after timing out, even
though records should be available? You don't. You also don't know that it
isn't just lost because kafka skipped a message. AFAIK from the information
you have from a kafka consumer, once you start allowing gaps in offsets,
you don't know.
Ok that's interesting, my understanding was that if you successfully poll
and get results you are 100% sure that you don't lose anything. Do you have
more details on that? Why would kafka skip a record while consuming?
Have you tested comparing the results of consumer.endOffsets for consumers
with different isolation levels?
endOffsets returns the last offset (same as seekToEnd). But you're right
that the easiest solution for us would be to have something like
seekToLastRecord method instead. Maybe something we could also ask ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21917 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAGAB2FVhHp_76l0WnRg_2WPgzSx1LlSks5uN_bxgaJpZM4VmlWm>
.
|
Recursively creating a Kafka RDD during creation of a Kafka RDD would need
a base case, but yeah, some way to have appropriate preferred locations.
…On Mon, Aug 6, 2018 at 2:58 AM, Quentin Ambard ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/
DirectKafkaInputDStream.scala
<#21917 (comment)>:
> - val fo = currentOffsets(tp)
- OffsetRange(tp.topic, tp.partition, fo, uo)
+ /**
+ * Return the offset range. For non consecutive offset the last offset must have record.
+ * If offsets have missing data (transaction marker or abort), increases the
+ * range until we get the requested number of record or no more records.
+ * Because we have to iterate over all the records in this case,
+ * we also return the total number of records.
+ * @param offsets the target range we would like if offset were continue
+ * @return (totalNumberOfRecords, updated offset)
+ */
+ private def alignRanges(offsets: Map[TopicPartition, Long]): Iterable[OffsetRange] = {
+ if (nonConsecutive) {
+ val localRw = rewinder()
+ val localOffsets = currentOffsets
+ context.sparkContext.parallelize(offsets.toList).mapPartitions(tpos => {
Are you suggesting I should create a new kafkaRDD instead, and consume
from this RDD to get the last offset range?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21917 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAGAB_EelzeJDa36_SAKaH8trQC5bTnGks5uN_cugaJpZM4VmlWm>
.
|
SPARK-25005 has actually a far better solution to detect message lost. Will try to apply same logic... |
Based on my little understanding, I think this PR will fix this issue - https://stackoverflow.com/q/48344055/2272910 We're facing the same issue and would love to see some solution out in Spark-Kafka streaming package. |
Can one of the admins verify this patch? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Update with a better fix:
With this fix, the offsets are scanned to determine the ranges. We ensure that the when we determine the range [fromOffset, untilOffset), untilOffset is always an offset with an existing record that we've been able to fetch at least once.
This logic is applied as soon as allowNonConsecutiveOffsets is enabled.
Since we scan all the record a first time, we use this to count the number of records. OffsetRange now contains the number of record in each partition and the rdd.count() is a free operation.
Isolation level of uncomitted read is unsafe: untilOffset might become "empty" if the transaction is abort just after the offset range creation. The same thing could happen if the "untilOffset" gets compacted (it's also a potential issue before this change)
How was this patch tested?
Unit test for the offset scan. No integration test for transaction since the current kafka version doesn't support transactions. Tested against a custom streaming use-case.