KAFKA-7548: KafkaConsumer should not throw away already fetched data for paused partitions (v2)#6988
Conversation
|
@hachikuji @ijuma Do either of you have some time to review this PR? |
|
@rhauch looks like it will be useful for Kafka Connect too? |
|
Thanks @gwenshap. |
|
Retest this please |
|
Hi @hachikuji . I'm just putting this on your radar again for a review. Thanks for picking it up. |
|
Hi @gwenshap @hachikuji @rhauch, please let me know if you have any questions about the PR or how to reproduce the issue. Thanks. |
hachikuji
left a comment
There was a problem hiding this comment.
Sorry for the delay. The approach makes sense. I had a couple ideas how to simplify it. Let me know what you think.
There was a problem hiding this comment.
nit: could be package private
There was a problem hiding this comment.
I have a few suggestions to simplify this a little bit:
-
The
PartitionRecordsparses records in a streaming fashion. The constructor doesn't do any parsing, so there's no harm constructing it immediately when we find aCompletedFetchfor a paused partition. So rather than leaving the completed fetch insidecompletedFetches, it seems like we could keep them all in the other queue (maybepausedCompletedFetchesinstead ofparsedFetchesCache)? As a matter of fact, I think we can change the type ofcompletedFetchestoConcurrentLinkedQueue<PartitionRecords>. -
Below we have logic to basically iterate through the
parsedFetchesCacheand move all the partitions which are still paused to the end. Would it be simpler to leave the collection in the same order and -
Once a partition is unpaused, does it make sense to just move it back to
completedFetches? Then there is only a single collection that we are pulling the next records from.
So then the logic would be just to iterate through the paused data and check for unpaused partitions which can be moved over to completedFetches. Then everything is done as usual.
There was a problem hiding this comment.
Thanks for the review @hachikuji ! I think it would be nice to have the single completedFetches of type ConcurrentLinkedQueue<PartitionRecords>. To do this we would have to change the fetch parsing workflow a little bit, but I'm not sure what ramifications this may have. I've made some assumptions below I hope you can confirm before I refactor the PR.
-
In the callback for
sendFetcheswe would immediately callparseCompletedFetchto create aPartitionRecordsand add it to thecompletedFetches, instead of callingparseCompletedFetchlater infetchedRecords. The javadoc comment onparseCompletedFetchmethod suggests that at one point in the past this may have been the case: "The callback for fetch completion". If we move the call here I don't think there's any reason to have aCompletedFetchtype at all. -
The callsite to
parseCompletedFetchis wrapped in exception handling which discards the fetch in some scenarios when an exception is raised. If we move the callsite tosendFetchesthen I assume we would move the exception handling along with it. One issue is that in this exception handling we check to see if we've accumulated any parsed records to return to the userif(fetched.isEmpty()), but since we haven't parsed any records in thesendFetchescallback I'm not sure how to translate this logic, or if it's still necessary.EDIT: I looked at this again and the comment says this check is just for the purpose of making sure data from a previous fetch, but that's not applicable when we move the callsite to
sendFetchesbecause no records have been parsed yet. -
In
parseCompletedFetchif the partition is unfetchable then no records are returned. Based on comments in this method this could be caused by a rebalance that happened while a fetch request was in flight, or when a partition is paused. We would need to change this so that thePartitionRecordsare still created when a partition is paused, but not in other unfetchable scenarios.
There was a problem hiding this comment.
I tried refactoring the PR with the assumptions I made above, but it began to reveal a lot of regression bugs when running FetcherTest. I think my idea about moving the parseCompletedFetch callsite to sendFetches seems to miss a lot of nuance accounted for in the tests.
There was a problem hiding this comment.
I created a PR on my fork with the refactor: https://github.com/seglo/kafka/pull/3/files
There was a problem hiding this comment.
@seglo Yeah, this is a little trickier than I suggested. The draft looks like it's heading in the right direction. One thought I had is to take the parseCompletedFetch logic and turn it into an initializePartitionRecords function. So when we first construct PartitionRecords, we consider it uninitialized. Then the logic in fetchedRecords could look something like this:
nextInLineRecords = completedFetches.peek();
if (nextInLineRecords.notInitialized()) {
try {
initializePartitionRecords(nextInLineRecords);
} catch (Exception e) {
}
}
// now check in nextInLineRecords has any data to fetchThe benefit is that the same logic is still executed from the same location. Would that help?
There was a problem hiding this comment.
@hachikuji Yes, that suggestion helped, thanks. I no longer have any FetcherTest regression failures. I split up parseCompletedFetch into:
- A
parseCompletedFetchthat always returns aPartitionRecords, but does no other validation. Called fromsendFetches. - An
initializePartitionRecordswhich does all the validation thatparseCompletedFetchdid before. Called fromfetchedRecordswhere exceptions are handled.
completedFetches is now used not just as a queue, but as a cache. The queue usage semantics aren't really applicable anymore (we no longer peek or poll), but since we want the ordering I don't think there's a better datastructure in java.util.concurrent to use.
I've updated this PR.
There was a problem hiding this comment.
I made an update to cache each paused completed fetch for the lifetime of a call to fetchRecords and then add them back at the end so that completedFetches.poll() can be used instead of removing by object reference completedFetches.remove(records). I'm not sure if this is any more efficient, but it preserves the original implementation of fetchedRecords better. The change is in this commit: 7cb7943 .
7cb7943 to
55f3fab
Compare
|
Retest this please |
|
It's interesting that all the tests ran with no failures, but the check stil failed. |
|
@seglo Thanks for the updates. I think there might be some redundant checking. What do you think about this? hachikuji@d46d000. I think the only other thing is that it seems like we have a chance to consolidate |
…records when partitions no longer assigned.
…es as cache of parsed PartitionRecords
|
@hachikuji Good find. I wish I had seen that! I included your commit. I can follow up with another PR for consolidating |
hachikuji
left a comment
There was a problem hiding this comment.
Thanks, I think this is just about ready. Just a couple more comments.
| assertTrue(client.requests().isEmpty()); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
Do we have a test case which covers the case where the user seeks to a new offset while a partition is paused with data available to return? In this case, we expect the data to be discarded when the partition is resumed.
There was a problem hiding this comment.
I did a pass over FetcherTest and didn't see this scenario exactly. I added another test called testFetchDiscardedAfterPausedPartitionResumedAndSeekedToNewOffset (a bit of a mouthful). Does it create the scenario you were thinking?
|
This could be done as a separate PR, but would it be helpful to have a benchmark that exercises this use case? The perf improvement could be significant from the previous discussion and it would be nice to be able to quantify that. |
|
@ijuma I can take a crack at implementing a performance test. Before I created the PR I created a small test app that I measured before and after a patch. You can find a reference to the test app and the Grafana dashboard with the comparison in my update of the original Jira issue KAFKA-7548. |
|
That's great! |
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the contribution! I will merge after the build completes.
|
Thanks for your review and advice, @hachikuji. It was fun working on this PR. It gave me a great opportunity to dig into the Consumer internals. I'll follow up with you soon on some of the loose ends. |
|
Retest this please |
|
The two failures are known to be flaky. I will go ahead and merge. |
This is an updated implementation of #5844 by @MayureshGharat (with Mayuresh's permission).
I've reviewed the original PR's feedback from @hachikuji and reimplemented this solution to add completed fetches that belong to paused partitions back to the queue. I also rebased against the latest trunk which caused more changes as a result of subscription event handlers being removed from the fetcher class.
You can find more details in my updated notes in the original Jira issue KAFKA-7548.