Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8264: decrease the record size for flaky test #8885

Closed
wants to merge 1 commit into from

Conversation

showuon
Copy link
Contributor

@showuon showuon commented Jun 17, 2020

This flaky test exists for a long time, and it happened more frequently recently. (also happened in my PR testing!! :( ) In KAFKA-8264 and KAFKA-8460, it described the issue for this test is that

Timed out before consuming expected 2700 records. The number consumed was xxxx

I did some investigation. This test is to test:

we consume all partitions if fetch max bytes and max.partition.fetch.bytes are low.

And what it did, is to create 3 topics and 30 partitions for each. And then, iterate through all 90 partitions to send 30 records for each. Finally, verify the we can consume all the records successfully.

What the error message saying is that it cannot consume all the records in time (might be the busy system) So, we can actually decrease the record size to avoid it. I checked all the error messages we collected in KAFKA-8264 and KAFKA-8460, the failed cases can always consume at least 1440 up (total is 2700). So, I set the records half size of the original setting, it'll become 1350 records in total. It should make this test more stable.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@showuon
Copy link
Contributor Author

showuon commented Jun 17, 2020

@nepal @mjsax @abbccdda , could you review this PR to fix flaky test? Thanks.

@mjsax mjsax added core Kafka Broker tests Test fixes (including flaky tests) labels Jun 18, 2020
@mjsax
Copy link
Member

mjsax commented Jun 18, 2020

I am not familiar with the details of this test and would leave it to others to review and merge. Maybe @omkreddy can help?

@showuon
Copy link
Contributor Author

showuon commented Jun 22, 2020

@omkreddy , could you help review this small PR? Thanks.

@omkreddy
Copy link
Contributor

retest this please

@@ -800,7 +800,7 @@ class PlaintextConsumerTest extends BaseConsumerTest {
awaitAssignment(consumer, partitions.toSet)

val producer = createProducer()
val producerRecords = partitions.flatMap(sendRecords(producer, partitionCount, _))
val producerRecords = partitions.flatMap(sendRecords(producer, numRecords = 15, _))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test aims to send records to all the partitions (30). with this change, we only send to 15 partitions. maybe we need to enable debug logs to understand the root cause.

Copy link
Contributor Author

@showuon showuon Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @omkreddy , no, with this change, we won't only sent to 15 partitions .

The variable partitions in this line, is the total partitions in all topics, that is, we have 3 topics, and 30 partitions for each in the test, and the variable partitions is with size of 90. And here, we send the records to all the 90 partitions with numRecords records for each.

So, in the test, we first collect all the partitions, and then send each partition with the numRecords.

def testLowMaxFetchSizeForRequestAndPartition(): Unit = {
    val topic1 = "topic1"
    val topic2 = "topic2"
    val topic3 = "topic3"
    val partitionCount = 30
    val topics = Seq(topic1, topic2, topic3)
    ....
    // we collect all the TopicPartition info, that is, the partitions.size() will be 90 after this line
    val partitions = topics.flatMap { topic =>
      (0 until partitionCount).map(new TopicPartition(topic, _))
    }
    ....
    // so later, we send the records to all the 90 partitions with `numRecords` records for each. 
    // that is, the change to the number of the records sent won't affect the test itself
    val producerRecords = partitions.flatMap(sendRecords(producer, partitionCount, _))
   ...
}

Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I am still not sure, whey we are not able consume in 60seconds. Let us see, if can reproduce the issues with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thank you.

@omkreddy
Copy link
Contributor

retest this please

1 similar comment
@omkreddy
Copy link
Contributor

retest this please

@showuon
Copy link
Contributor Author

showuon commented Jul 6, 2020

hi @omkreddy , after running tests for 2 times, do you think we should run more tests for it? Thanks.

@omkreddy
Copy link
Contributor

omkreddy commented Jul 7, 2020

retest this please

@omkreddy omkreddy requested a review from hachikuji July 8, 2020 10:44
@showuon
Copy link
Contributor Author

showuon commented Jul 10, 2020

@hachikuji , could you help review this small PR to fix flaky test? Thanks.

@showuon
Copy link
Contributor Author

showuon commented Jul 15, 2020

@hachikuji , please help review this small PR. Thanks.

@showuon
Copy link
Contributor Author

showuon commented Jul 21, 2020

hi @omkreddy , looks like @hachikuji is not available recently. Do you think we still need other people's comment? I think this change should be pretty straightforward and safe. Thanks.

Copy link
Contributor

@omkreddy omkreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@showuon Thanks for the PR. LGTM.

@omkreddy omkreddy closed this in 9b36de3 Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Kafka Broker tests Test fixes (including flaky tests)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants