New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-16156: beginningOrEndOffsets does not need to build an OffsetAndTimestamps object upon completion #15525
KAFKA-16156: beginningOrEndOffsets does not need to build an OffsetAndTimestamps object upon completion #15525
Conversation
c103fdd
to
fc04cc0
Compare
...nts/src/test/java/org/apache/kafka/clients/consumer/internals/OffsetsRequestManagerTest.java
Outdated
Show resolved
Hide resolved
...nts/src/test/java/org/apache/kafka/clients/consumer/internals/OffsetsRequestManagerTest.java
Show resolved
Hide resolved
Hey @lucasbru - Would it be possible to ask you to review this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly looking good to me. I'm just wondering if we could get by without using generics here, as there are just two cases.
clients/src/main/java/org/apache/kafka/clients/consumer/internals/MembershipManagerImpl.java
Outdated
Show resolved
Hide resolved
clients/src/main/java/org/apache/kafka/clients/consumer/internals/events/ListOffsetsEvent.java
Outdated
Show resolved
Hide resolved
bfadb18
to
e76789d
Compare
clients/src/main/java/org/apache/kafka/clients/consumer/internals/OffsetsRequestManager.java
Show resolved
Hide resolved
e76789d
to
2f973a3
Compare
Hey @lucasbru - Thanks for taking the time to review this PR. Let me know if there's anything to add to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. One Q about timeout handling (sorry, didn't notice it before). Otherwise looking good to me.
Map<TopicPartition, OffsetAndTimestamp> offsetAndTimestampMap = applicationEventHandler.addAndGet( | ||
|
||
// shortcut the request if the timeout is zero. | ||
if (timeout.isZero()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, only noticed this now, but the original consumer seems to send a list offset request even if the timeout is 0, and you are specifically introducing code to avoid that. Isn't that going against what Kirk is trying to achieve? cc @kirktrue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to dig into this a bit. So short-cutting is definitely the right thing to do, since otherwise we'll run into a time-out exception. The old consumer is a bit weird in that it fires the list offset request, but never returns a result. But I also couldn't find a cache that that is influenced by the list offsets request, so what's the point of sending the request?
Replicating the old consumer behavior would mean creating the event for the background thread, but not waiting for the result. We can consider changing the behavior here, but let's make sure we do it consciously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lucasbru - It seems like one of the handlers would also update the subscription state upon completion. See the snippet below:
public void onSuccess(ListOffsetResult value) {
synchronized (future) {
result.fetchedOffsets.putAll(value.fetchedOffsets);
remainingToSearch.keySet().retainAll(value.partitionsToRetry);
offsetFetcherUtils.updateSubscriptionState(value.fetchedOffsets, isolationLevel);
}
}
I think addAndGet seems to be sufficient to handle such logic so I'll revert this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scratch off the previous comment - addAndGet actually doesn't. We will need to explicitly return an empty result. See the code change.
@lucasbru - Thanks again for reviewing the PR. Sorry about the misinterpretation on short circuting logic so here I updated the beginningOrEndOffsets API. It seems like the right thing to do here is to still send out the request but return it immediately for zero timeout (a bit strange because it does throw timeout when time runs out which seems inconsistent). |
*/ | ||
public class ListOffsetsEvent extends CompletableApplicationEvent<Map<TopicPartition, OffsetAndTimestamp>> { | ||
|
||
public class ListOffsetsEvent extends CompletableApplicationEvent<Map<TopicPartition, Long>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KInd of a general comment looking for simplification: couldn't we just have an internal new class OffsetAndTimestampInternal
(better named), that allows negatives and knows how to build an OffsetAndTimestamp
? Seems to solve the problem we have, without having to split the ListOffsets
into 2 events, with separate paths for beginning/endOffsets and offsetsForTimes, where in reality they have everything in common, except for the object we use to encapsulate the result (same result). These new splitted path leak down to the OffsetsManager event, when in reality, at the request/response level the manager is responsible for, everything is the same for both paths. With this approach the change would only be at the API level, on the consumer, where the result of the event would build the map with Longs for the beginning/end, or the map with OffsetAndTimestamp for the offsetsForTimes (data is the same, we just need to change how we return it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, sounds like a good idea to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally not concerned about having two events, because they are very simple. The alternative is to have a common code-path that carries a requiresTimestamp
boolean to differentiate behavior again, which isn't really any simpler. But I agree there is a certain amount of code duplication here that we could eliminate using your approach @lianetm , so I'm not against it.
Yes, the behavior of the existing consumer is a bit curious, but it's not the only place where a zero duration is treated different from 0.000001s. Either way, we probably have to do it this way for compatibility. This part looks good to me now. |
@lucasbru - If I'm not mistaken, the current implementation for both beginningOrEndOffsets and OffsetsForTimes both need to send out a request upon getting ZERO duration. Seems like both code paths are invoking this logic
But the offsetsForTime in the AsyncConsumer seems to short circuit it here:
Here is the ticket: https://issues.apache.org/jira/browse/KAFKA-16433 |
@philipnee Okay, thanks for creating the ticket. Not sure if it's blocker priority though. If it's a quick thing, you could address it in this PR. Are you going to implement Lianets suggestion? |
hi @lucasbru - Let me address Lianets comment in this PR and have a separated PR for the behavior inconsistency as it does require some changes to the unit test |
wip wip not working refactor p1 wip tmp wip tmp Update OffsetsRequestManager.java blah clean up catch the exception and do nothing
Update based on comments wip
6834721
to
0bde117
Compare
Update AsyncKafkaConsumer.java
0bde117
to
6fb9bd8
Compare
@@ -946,7 +956,7 @@ public void testOffsetsForTimesWithZeroTimeout() { | |||
@Test | |||
public void testWakeupCommitted() { | |||
consumer = newConsumer(); | |||
final HashMap<TopicPartition, OffsetAndMetadata> offsets = mockTopicPartitionOffset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just cleaning up.
0abf86d
to
80e6903
Compare
@lucasbru - Thanks for taking time reviewing this PR. This PR is ready for another pass. |
Update ListOffsetsEvent.java clean up clean up
80e6903
to
8205c79
Compare
|
||
Map<TopicPartition, OffsetAndTimestampInternal> offsetAndTimestampMap; | ||
if (timeout.isZero()) { | ||
applicationEventHandler.add(listOffsetsEvent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if I get it right we are intentionally leaving this? generating an event to get offsets, when in the end we return right away without waiting for a response? I do get that the old consumer does it, and I could be missing the purpose of it, but seems to me an unneeded request, even considering the side effect of the onSuccess handler. The handler just updates the positions to reuse the offsets it just retrieved, and it does make sense to reuse the result when we do need to make a request, but I wouldn't say we need to generate an unneeded event/request just for that when the user requested offsets with max-time-to-wait=0.
In any case, if we prefer to keep this, I would suggest 2 things:
- to add a comment explaining why (handler), because it looks like a weird overhead to add the event and return,
- to be consistent and generate the event also in the case of the
offsetsForTimes
before the early return (ln 1104). In the case of the old consumer, it's a common logic so both path,offsetsForTimes
andbeginning/endOffsets
do the same request+return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @lianetm thanks for the comment. There's a ticket to align the behavior of the two apis per your suggestions there. The plan is to do that in a separated pr. https://issues.apache.org/jira/browse/KAFKA-16433
Back to your first comment, it is not immediately obvious to see why people use these two apis with zero timeout. The only thing sensible thing it does to updating the local highwatermark as you mentioned. I think it is worth addressing this ambiguity after 4.0 release. So I'll leave a comment per your request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Totally ok to tackle it with that separate Jira.
import java.util.Optional; | ||
|
||
/** | ||
* Internal representation of {@link OffsetAndTimestamp}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add : Internal representation of {@link OffsetAndTimestamp} that allows negative offsets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timestamps I assume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm...what OffsetsAndTimestamp
does not allow is negative offsets here, and that's the requirement this new one is removing. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's actually both! he he, so let's maybe add negative offsets and timestamps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is negative timestamp in the response causing org.apache.kafka.common.KafkaException: java.lang.IllegalArgumentException: Invalid negative timestamp
. More specifically is this part that was complaining:
if (timestamp < 0)
throw new IllegalArgumentException("Invalid negative timestamp");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, agree that's the failure we noticed on the sys tests, but conceptually we're creating a new OffsetAndTimestampInternal
class that is the same as the existing OffsetAndTimestamp
, with the only difference that the former does not throw on negative offsets or negative timestamps, right? so for the class doc makes sense to mention it.
static Map<TopicPartition, OffsetAndTimestamp> buildOffsetsForTimesResult(final Map<TopicPartition, Long> timestampsToSearch, | ||
final Map<TopicPartition, ListOffsetData> fetchedOffsets) { | ||
HashMap<TopicPartition, OffsetAndTimestamp> offsetsByTimes = new HashMap<>(timestampsToSearch.size()); | ||
static <T> Map<TopicPartition, T> buildListOffsetsResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generic buildListOffsetsResult
is currently only being used from buildOffsetsForTimesResult
, was the intention to used it also from buildOffsetsForTimeInternalResult
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch.
|
||
@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames) | ||
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersAll")) | ||
def testSubscribeAndCommitSync(quorum: String, groupProtocol: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have split the consumer tests into separate files grouped by feature, and there is now one PlaintextConsumerCommitTest
, I would expect this test should go there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an error from rebase. so this should be removed from the PR. Thanks for catching this.
|
||
@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames) | ||
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersAll")) | ||
def testTimestampsToSearch(quorum: String, groupProtocol: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including the func name we're testing (offsetsAndTimestamps
) would probably make the test name clearer... maybe something around testOffsetsAndTimestampsTargetTimestamps
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe testFetchOffsetsForTime
, which already implies searching at a given timestamps.
@@ -304,6 +304,60 @@ class PlaintextConsumerCommitTest extends AbstractConsumerTest { | |||
consumeAndVerifyRecords(consumer = otherConsumer, numRecords = 1, startingOffset = 5, startingTimestamp = startingTimestamp) | |||
} | |||
|
|||
@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames) | |||
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersAll")) | |||
def testEndOffsets(quorum: String, groupProtocol: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this one (and the one below) are related to partition's offsets, not committed offsets, so I would say they need to stay in the PlaintextConsumer, where you had them (I was only suggesting to move the testSubscribeAndCommitSync
here, because it relates to committed offsets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - wasn't looking carefully at it. Putting things back to the original place.
Update AsyncKafkaConsumerTest.java reverting unintentional changes
a6c9a42
to
f6b7d48
Compare
@@ -808,16 +808,55 @@ class PlaintextConsumerTest extends BaseConsumerTest { | |||
|
|||
@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames) | |||
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersAll")) | |||
def testSubscribeAndCommitSync(quorum: String, groupProtocol: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one being removed intentionally? the suggestion was only to move it to the PlainTextConsumerCommit
file, where all tests related to committing offsets are now. Ok for me if you think it's not worth keeping, but just to make sure it's intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears in the PR because i didn't rebase correctly. You've actually moved the test to here:
kafka/core/src/test/scala/integration/kafka/api/PlaintextConsumerCommitTest.scala
Line 261 in 21479a3
def testSubscribeAndCommitSync(quorum: String, groupProtocol: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good then, just wanting to make sure we're not loosing it
Hey @philipnee, thanks for the updates, just one minor comment left above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @philipnee! Test failures seem unrelated. LGTM.
A subtle difference in the behavior of the two API causes the failures with
Invalid negative timestamp
.In this PR, the list offsets response will be processed differently based on the API. For beginingOffsets/endOffsets - the offset response should be directly returned.
For offsetsForTimes - A
OffsetAndTimestamp
object is constructed for each requested TopicPartition before being returned.The reason beginningOffsets and endOffsets - We are expecting a -1 timestamp from the response which subsequently causes the invalid timestamp exception because the original code tries to construct an
OffsetAndTimestamp
object upon returning.In this PR, the following missing tasks are added: