New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-1489] Fixes blocking scheduleOrUpdateConsumers message calls #378
[FLINK-1489] Fixes blocking scheduleOrUpdateConsumers message calls #378
Conversation
Very nice. I will have a detailed look later. @zentol Can you also test it with the Python API? I think you initially noticed the problem. |
|
||
// double check to resolve race conditions | ||
if(consumerVertex.getExecutionState() == RUNNING){ | ||
consumerVertex.sendPartitionInfos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to verify: the double check & send relies on the fact that update messages at the task manager are idempotent, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UpdateTask messages are idempotent in the BufferReader
. But my intention was not to send any UpdateTask messages twice. The ConcurrentLinkedQueue
should make sure that every element is only dequeued once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, true. :-)
Looks good to me. +1 We chatted about batching update task calls. Did you realize a problem with it or can we open an "improvement" issue for it? |
You're right. At the moment there is no aggregation of messages. I'll add it. |
There is a problem: https://travis-ci.org/apache/flink/jobs/50215407
|
dd6208b
to
bf94b4f
Compare
…s with asynchronous futures. Buffers PartitionInfos at the JobManager in case that the respective consumer has not been scheduled. Conflicts: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala Adds TaskUpdate message aggregation before sending the messages to the TaskManagers
bf94b4f
to
1827d02
Compare
I added the UpdateTask message aggregation. I also had to rework the PartitionInfo creation to make it work with the concurrent task updates. This requires another review of the code before we can merge it. |
Cool. I'm testing this PR on a cluster now. |
The job that was previously failing is fixed with this change. We should merge this change ASAP, because its kinda impossible right now to seriously use flink 0.9-SNAPSHOT without it. |
…s with asynchronous futures. Buffers PartitionInfos at the JobManager in case that the respective consumer has not been scheduled. Conflicts: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala Adds TaskUpdate message aggregation before sending the messages to the TaskManagers This closes apache#378
Replaces the blocking calls with futures which in case of an exception let the respective task fail. Furthermore, the PartitionInfos are buffered on the JobManager in case that some of the consumers are not yet scheduled. Once the state of the consumers switched to running, all buffered partition infos are sent to the consumers.