Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-13794: Follow up to fix comparator #12006

Merged
merged 2 commits into from
Apr 9, 2022
Merged

Conversation

ddrid
Copy link
Contributor

@ddrid ddrid commented Apr 7, 2022

Follow up to original PR #11991

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ddrid
Copy link
Contributor Author

ddrid commented Apr 7, 2022

Hi @artemlivshits, please take a look if you have time

@ddrid
Copy link
Contributor Author

ddrid commented Apr 7, 2022

also cc @hachikuji

Copy link
Contributor

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddrid , thanks for the PR. Left a comment.

@@ -187,7 +187,7 @@ private void startSequencesAtBeginning(TopicPartition topicPartition, ProducerId
private static final Comparator<ProducerBatch> PRODUCER_BATCH_COMPARATOR = (b1, b2) -> {
if (b1.baseSequence() < b2.baseSequence()) return -1;
else if (b1.baseSequence() > b2.baseSequence()) return 1;
else return b1.equals(b2) ? 0 : 1;
else return b1.equals(b2) ? 0 : Integer.compare(b1.hashCode(), b2.hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should override the equals and hashcode method in ProducerBatch class? Otherwise, the hashcode result could have possibility to be identical and makes the original issue happened again.

Copy link
Contributor Author

@ddrid ddrid Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @showuon, thanks for you comment. I found some trouble when adding equals method in ProducerBatch. The only two final fields I can use is createdMs and topicPartition, I don't think it make sense if I use these two fields since they usually have duplicates. Do you have any suggestions? Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProducerBatch doesn't override hashCode, so a default implementation is used. It's not fully specified what the default implementation of hashCode has to return, but it looks like the suggested implementation is to return a value based on the object address, so it should be different for different objects.
BTW, we could probably just do

   else return Integer.compare(b1.hashCode(), b2.hashCode());

if the objects are equal, their hash code must be equal too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artemlivshits done, thanks for your suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Thanks for the update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why do we have to limit ourselves to these 3 options? We can do the equals check and then do the hashCode thing if unequal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If equals is true, then hash codes must be equal too, so comparing hashCode for equal objects would return 0 in that case. If equals is false, then we do hashCode comparison. If we fold these 2 cases, then if we just do hashCode comparison, the result is the same as comparing equals first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, we can't rely on hashcode if equals is false either. I'll take a look at the code a bit later, but it should be possible to avoid the latent bug here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the release perspective, it improves over the previous release, so for now we could take this as is and file a bug to work on a more precise solution in a future release. We could either add a unique long id (for every new object just increment a static atomic and get a new value) or instead of a producer batch use a HashSet of batches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that there are a couple of simple solutions, see #12096.

@ddrid ddrid requested a review from showuon April 8, 2022 01:36
@artemlivshits
Copy link
Contributor

LGTM, thank you for the fix.

Copy link
Contributor

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@showuon
Copy link
Contributor

showuon commented Apr 9, 2022

Failed tests are unrelated

Build / ARM / kafka.network.ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit()

@showuon showuon merged commit 9596c7b into apache:trunk Apr 9, 2022
showuon pushed a commit that referenced this pull request Apr 9, 2022
In comparator, objects that are not equal need to have a stable order otherwise, binary search may not find the objects. Improve the producer batch comparator

Reviewers: Luke Chen <showuon@gmail.com>
showuon pushed a commit that referenced this pull request Apr 9, 2022
In comparator, objects that are not equal need to have a stable order otherwise, binary search may not find the objects. Improve the producer batch comparator

Reviewers: Luke Chen <showuon@gmail.com>
showuon pushed a commit that referenced this pull request Apr 9, 2022
In comparator, objects that are not equal need to have a stable order otherwise, binary search may not find the objects. Improve the producer batch comparator

Reviewers: Luke Chen <showuon@gmail.com>
@showuon
Copy link
Contributor

showuon commented Apr 9, 2022

Cherry-pick back to 3.2, 3.1, 3.0 as #11991 did.

@dajac
Copy link
Contributor

dajac commented Apr 9, 2022

@tombentley We might want to get this one into 3.1.1 to fix the previous fix.

@@ -187,7 +187,7 @@ private static class TopicPartitionEntry {
private static final Comparator<ProducerBatch> PRODUCER_BATCH_COMPARATOR = (b1, b2) -> {
if (b1.baseSequence() < b2.baseSequence()) return -1;
else if (b1.baseSequence() > b2.baseSequence()) return 1;
else return b1.equals(b2) ? 0 : 1;
else return Integer.compare(b1.hashCode(), b2.hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, hashCode can have collisions. This doesn't seem right either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question like you. But then, I think since we can't have a perfect way to compare two ProducerBatch object, the hashCode could be an alternative approach. That's my thought. Thanks.

Copy link
Contributor

@ijuma ijuma Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to the original discussion clarifying that some of the assumptions there are incorrect. We should not return true for the case where hashCode is true and equals is false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to the other comment as well. I wouldn't say the assumptions are incorrect -- the assumption is that hashCode would provide enough differentiation to significantly improve over original logic, without introducing regressions or requiring more complex implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption about object addresses and hashCode is what I was referring to.

@tombentley
Copy link
Contributor

@ijuma did you have any ideas about how to fix this without the problem with hash collisions?

@ijuma
Copy link
Contributor

ijuma commented Apr 21, 2022

@tombentley I had a chat with @hachikuji about this and it seems that the behavior we want is slightly differently. I'll see if I can submit a PR today for discussion.

@ijuma
Copy link
Contributor

ijuma commented Apr 28, 2022

@tombentley See #12096.

jeffkbkim pushed a commit to confluentinc/kafka that referenced this pull request May 12, 2022
In comparator, objects that are not equal need to have a stable order otherwise, binary search may not find the objects. Improve the producer batch comparator

Reviewers: Luke Chen <showuon@gmail.com>
jeffkbkim added a commit to confluentinc/kafka that referenced this pull request May 12, 2022
…cs-11-may-2022

* apache-kafka/3.1: (51 commits)
  MINOR: reload4j build dependency fixes (apache#12144)
  KAFKA-13255: Use config.properties.exclude when mirroring topics (apache#11401)
  KAFKA-13794: Fix comparator of inflightBatchesBySequence in TransactionsManager (round 3) (apache#12096)
  KAFKA-13794: Follow up to fix producer batch comparator (apache#12006)
  fix: make sliding window works without grace period (#kafka-13739) (apache#11980)
  3.1.1 release notes (apache#12001)
  KAFKA-13794; Fix comparator of `inflightBatchesBySequence` in `TransactionManager` (apache#11991)
  KAFKA-13782; Ensure correct partition added to txn after abort on full batch (apache#11995)
  KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default (apache#11908)
  KAFKA-13775: CVE-2020-36518 - Upgrade jackson-databind to 2.12.6.1 (apache#11962)
  ...
jeffkbkim added a commit to confluentinc/kafka that referenced this pull request May 12, 2022
…cs-12-may-2022

* apache-kafka/3.0: (14 commits)
  fix: make sliding window works without grace period (#kafka-13739) (apache#11980)
  KAFKA-13794: Follow up to fix producer batch comparator (apache#12006)
  KAFKA-13794; Fix comparator of `inflightBatchesBySequence` in `TransactionManager` (apache#11991)
  KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default (apache#11908)
  KAFKA-13418: Support key updates with TLS 1.3 (apache#11966)
  KAFKA-13770: Restore compatibility with KafkaBasedLog using older Kafka brokers (apache#11946)
  KAFKA-13761: KafkaLog4jAppender deadlocks when idempotence is enabled (apache#11939)
  KAFKA-13759: Disable idempotence by default in producers instantiated by Connect (apache#11933)
  MINOR: Fix `ConsumerConfig.ISOLATION_LEVEL_DOC` (apache#11915)
  KAFKA-13750; Client Compatability KafkaTest uses invalid idempotency configs (apache#11909)
  ...
@sql888
Copy link

sql888 commented Apr 10, 2023

Failed tests are unrelated

Build / ARM / kafka.network.ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit()

How to fix this failed test? getting this in M1 ARM Macbook. Thanks

kafka.network.ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit() failed, log available in /Users/gli/git/qaas-kafka/core/build/reports/testOutput/kafka.network.ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit().test.stdout

ConnectionQuotasTest > testListenerConnectionRateLimitWhenActualRateAboveLimit() FAILED
    java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: Expected rate (30 +- 7), but got 37.15860531368056 (600 connections / 16.147 sec) ==> expected: <30.0> but was: <37.15860531368056>
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at kafka.network.ConnectionQuotasTest.$anonfun$testListenerConnectionRateLimitWhenActualRateAboveLimit$3(ConnectionQuotasTest.scala:411)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at kafka.network.ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit(ConnectionQuotasTest.scala:411)

        Caused by:
        org.opentest4j.AssertionFailedError: Expected rate (30 +- 7), but got 37.15860531368056 (600 connections / 16.147 sec) ==> expected: <30.0> but was: <37.15860531368056>
            at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
            at org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
            at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:86)
            at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1003)
            at kafka.network.ConnectionQuotasTest.acceptConnectionsAndVerifyRate(ConnectionQuotasTest.scala:903)
            at kafka.network.ConnectionQuotasTest.$anonfun$testListenerConnectionRateLimitWhenActualRateAboveLimit$2(ConnectionQuotasTest.scala:409)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants