-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-5281: System tests for transactions #3149
KAFKA-5281: System tests for transactions #3149
Conversation
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
self.num_brokers = 3 | ||
|
||
# Test parameters | ||
self.num_input_partitions = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for using a single input partition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's temporary. The idea is to have 2 parallel copy jobs so that we have 2 inflight transactions at any given time. So each needs to process a single partition to ensure no overlap in the copied data.
Right now, we have only a single copy job, so there is only one input partition.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
The hard bounce test seems to have revealed at least one bug: https://issues.apache.org/jira/browse/KAFKA-5339 |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
… because we can't find the coordinator
…e copy is actually finished even when the copier is bounced.
…sary error messages to print on jenkins
4df0f6b
to
e26e872
Compare
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
…tderr and stdout from transactional message copier
A quick update: The tests have revealed a bug: The The Unfortunately, due to an oversight in the python test code, we never collected the stderr stream from the message copier, so there was nothing suspicious in the logs collected. I only found this out because very occasionally the background sender thread would run before the process was shut down, detect that the transaction manager is in error state, and log it. Since we collected the transactional_message_copier logs (ie. the actual producer logs), I could get alerted to this issue . I have now updated the code to both handle the REQUEST_TIMED_OUT error on the TxnOffsetCommitRequest (just like the |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
return self.message_copy_finished | ||
|
||
def progress_percent(self): | ||
if self.remaining < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need the lock here since you access to mutable variables.
remainingMessages.set(maxMessages - numMessagesProcessed.addAndGet(messagesInCurrentTransaction)); | ||
} | ||
} catch (Exception e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In VerifiableProducer
and VerifiableConsumer
, we pass uncaught exceptions to the argument parser handleError
. It would be nice to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that just for parsing arguments though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that might be right. Looks like we just let application errors bubble up. Guess we could do the same here since the exception would get printed anyway. Up to you.
|
||
@cluster(num_nodes=8) | ||
@matrix(failure_mode=["clean_bounce", "hard_bounce"], | ||
bounce_target=["brokers", "clients"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to disable the brokers bounce tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so: with the bug fix for KAFKA-5351, they seem to be quite stable.
The latest full system test suite passed, except for the log appender test, which I think is unrelated: |
The bounce test failures are being fixed in : #3184 |
The bounce tests seem to be stable after incorporating #3184 |
…copy as gets very close to the limit on hard bounces
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests and fixes. LGTM.
Author: Apurva Mehta <apurva@confluent.io> Reviewers: Jason Gustafson <jason@confluent.io> Closes #3149 from apurvam/KAFKA-5281-transactions-system-tests (cherry picked from commit 1959835) Signed-off-by: Jason Gustafson <jason@confluent.io>
No description provided.