-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-2978: consumer stops fetching when consumed and fetch positions get out of sync #666
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -214,14 +214,6 @@ public Set<String> groupSubscription() { | |
return this.groupSubscription; | ||
} | ||
|
||
public Long fetched(TopicPartition tp) { | ||
return assignedState(tp).fetched; | ||
} | ||
|
||
public void fetched(TopicPartition tp, long offset) { | ||
assignedState(tp).fetched(offset); | ||
} | ||
|
||
private TopicPartitionState assignedState(TopicPartition tp) { | ||
TopicPartitionState state = this.assignment.get(tp); | ||
if (state == null) | ||
|
@@ -270,20 +262,20 @@ public boolean partitionsAutoAssigned() { | |
return !this.subscription.isEmpty(); | ||
} | ||
|
||
public void consumed(TopicPartition tp, long offset) { | ||
assignedState(tp).consumed(offset); | ||
public void position(TopicPartition tp, long offset) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, it seems like we could have also solved this problem by fixing the places where the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree this is the main question with this simplification. However, keep in mind that we cannot send more than one fetch request at a time because we don't know the following offset to fetch from. So to get any advantage from allowing the fetched position to get farther than a single fetch ahead of the consumed position, we would need to initiate fetches after the last fetch returned from the server, but before the results were returned to the user. I don't see a lot of opportunity for optimization here, but I could be missing something. I think instead the way for consumers to tune the amount of data fetched is with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, I was trying to figure out under what conditions it would be useful to allow for multiple fetch requests while you haven't asked for the data back. I can think of two cases:
Both are probably fairly niche and you probably need to hit an extreme case to warrant adjusting consumer settings much, but I don't think we have a way to control it at the moment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These kind of cases are probably best left to be handled in user code. The user can collect the messages in their own batch until they have enough data to process. Until we have a really compelling use case, I think I prefer the simpler approach in this patch since maintaining There is one unfortunate side effect of this change which doesn't appear to impact current code, but should mentioned anyway. If you call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another thing to consider is memory limit on the consumer (we currently do not have such management as we did in producer, but it is tracked in KAFKA-2045). I agree that pre-fetch would be helpful with bursty network, or more generally speaking you would probably want to get some data in each selector.poll() even if there is data buffered for all partitions, as long as buffered data does not exceed the memory limit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's also worth saying that if we need to bring back this distinction in order to implement prefetching improvements, it's straightforward to bring it back. I'm also in favour of simplifying for now. |
||
assignedState(tp).position(offset); | ||
} | ||
|
||
public Long consumed(TopicPartition tp) { | ||
return assignedState(tp).consumed; | ||
public Long position(TopicPartition tp) { | ||
return assignedState(tp).position; | ||
} | ||
|
||
public Map<TopicPartition, OffsetAndMetadata> allConsumed() { | ||
Map<TopicPartition, OffsetAndMetadata> allConsumed = new HashMap<>(); | ||
for (Map.Entry<TopicPartition, TopicPartitionState> entry : assignment.entrySet()) { | ||
TopicPartitionState state = entry.getValue(); | ||
if (state.hasValidPosition) | ||
allConsumed.put(entry.getKey(), new OffsetAndMetadata(state.consumed)); | ||
allConsumed.put(entry.getKey(), new OffsetAndMetadata(state.position)); | ||
} | ||
return allConsumed; | ||
} | ||
|
@@ -356,8 +348,7 @@ public ConsumerRebalanceListener listener() { | |
} | ||
|
||
private static class TopicPartitionState { | ||
private Long consumed; // offset exposed to the user | ||
private Long fetched; // current fetch position | ||
private Long position; | ||
private OffsetAndMetadata committed; // last committed position | ||
|
||
private boolean hasValidPosition; // whether we have valid consumed and fetched positions | ||
|
@@ -367,8 +358,7 @@ private static class TopicPartitionState { | |
|
||
public TopicPartitionState() { | ||
this.paused = false; | ||
this.consumed = null; | ||
this.fetched = null; | ||
this.position = null; | ||
this.committed = null; | ||
this.awaitingReset = false; | ||
this.hasValidPosition = false; | ||
|
@@ -378,29 +368,21 @@ public TopicPartitionState() { | |
private void awaitReset(OffsetResetStrategy strategy) { | ||
this.awaitingReset = true; | ||
this.resetStrategy = strategy; | ||
this.consumed = null; | ||
this.fetched = null; | ||
this.position = null; | ||
this.hasValidPosition = false; | ||
} | ||
|
||
private void seek(long offset) { | ||
this.consumed = offset; | ||
this.fetched = offset; | ||
this.position = offset; | ||
this.awaitingReset = false; | ||
this.resetStrategy = null; | ||
this.hasValidPosition = true; | ||
} | ||
|
||
private void fetched(long offset) { | ||
private void position(long offset) { | ||
if (!hasValidPosition) | ||
throw new IllegalStateException("Cannot update fetch position without valid consumed/fetched positions"); | ||
this.fetched = offset; | ||
} | ||
|
||
private void consumed(long offset) { | ||
if (!hasValidPosition) | ||
throw new IllegalStateException("Cannot update consumed position without valid consumed/fetched positions"); | ||
this.consumed = offset; | ||
this.position = offset; | ||
} | ||
|
||
private void committed(OffsetAndMetadata offset) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to consider dropping some of these
log.debug
s tolog.trace
. Some of the logs in error conditions make sense atdebug
, but logging every fetch request and response atdebug
might make changing frominfo
todebug
a bit overwhelming.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trace level is fine with me.