New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-744] A runner should be able to override KafkaIO max wait prope… #1125
Conversation
Jenkins failures seem to be related to |
I think that this is generally on the wrong path. Runners should not need to override temporal constants in specific transforms to get sane behavior. I believe the simple rule of thumb should be "readers should return as soon as they are able" + "runners may poll advance() in a loop for a certain period of time if it returned too fast" + "runners must tolerate sources that take a long time to start or advance, because real systems operate that way". I think we're violating all of these in various places, but that combined these principles add up to a good solution. Thoughts? (Also, if we reach agreement we should probably summarize to dev@ list?) |
I'd be happy to summarize this once we have something, and I agree with what you write Dan, but it seems that this was an issue for the |
per conversation with @tgroh , I believe we can and should do this now. |
The |
@@ -757,7 +757,7 @@ public void validate() { | |||
|
|||
private static final Duration KAFKA_POLL_TIMEOUT = Duration.millis(1000); | |||
// how long to wait for new records from kafka consumer inside start() | |||
private static final Duration START_NEW_RECORDS_POLL_TIMEOUT = Duration.standardSeconds(5); | |||
private static final Duration START_NEW_RECORDS_POLL_TIMEOUT = Duration.millis(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to remove this then.
Recently when I modified KafkaIOTest, I removed a bit of extra code that handled 'false' from start(). I need to put that back. I can send a separate PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we'll only have NEW_RECORDS_POLL_TIMEOUT
, sure why not.
I like these. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Suggested a minor improvement to comment.
I will send another CL with a fix to KafkaIOTest (otherwise it would occasionally flake).
// how long to wait for new records from kafka consumer inside start() | ||
private static final Duration START_NEW_RECORDS_POLL_TIMEOUT = Duration.standardSeconds(5); | ||
// how long to wait for new records from kafka consumer inside advance() | ||
// how long to wait for new records from kafka consumer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add 'inside advance()' or 'inside advance()/start()' to this comment? Would make it more clear where the time out is.
Fixed KafkaIOTest in #1133 |
@@ -968,7 +966,7 @@ public void run() { | |||
|
|||
// Wait for longer than normal when fetching a batch to improve chances a record is available | |||
// when start() returns. | |||
nextBatch(START_NEW_RECORDS_POLL_TIMEOUT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can you remove this arg for nextBatch and use NEW_RECORDS_POLL_TIMEOUT
diretly inside nextBatch().
@@ -968,7 +966,7 @@ public void run() { | |||
|
|||
// Wait for longer than normal when fetching a batch to improve chances a record is available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment.
@rangadi I've addressed your comments, PTAL. |
TimeUnit.MILLISECONDS); | ||
// poll available records, wait (if necessary) up to the specified timeout. | ||
records = availableRecordsQueue.poll(NEW_RECORDS_POLL_TIMEOUT.getMillis(), | ||
TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional/minor : align the args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't mind but it seems like we don't have a consensus on arg-alignment..
I'll align and commit, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM.
Thanks for the updates.
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull request
mvn clean verify
. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>
in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
…rties.
Add KafkaOptions for the UnboundedKafkaReader.