-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-1019] Implementation of akka based RPC system #149
Conversation
I vote to merge this rather soon:
|
What are the plans for merging this change? |
There are some minor issues we discovered when going over the code with Stephan. I first have to address them. These include amongst others:
|
3048efc
to
10a1a3a
Compare
import scala.concurrent.duration.FiniteDuration | ||
import scala.concurrent.{Future, Await} | ||
|
||
abstract class FlinkMiniCluster(userConfiguration: Configuration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user config argument is a great idea. I was checking this, because I need something similar for my changes. I will add a similar thing to my branch for now.
Look like rebase is needed for this PR? |
Yes it is. But there are still some performance issues I have to figure out first. |
49a28ca
to
d396915
Compare
val nextInputSplit = currentJobs.get(jobID) match { | ||
case Some((executionGraph,_)) => executionGraph.getJobVertex(vertexID) match { | ||
case vertex: ExecutionJobVertex => vertex.getSplitAssigner match { | ||
case splitAssigner: InputSplitAssigner => splitAssigner.getNextInputSplit(null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When passing a null
hostname here, input split localization is impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I probably just copied the old bug.
a9c7381
to
077fdfb
Compare
import org.junit.BeforeClass; | ||
import org.junit.Test; | ||
|
||
//TODO: Update test case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I forgot to delete it.
Looks like a big ass change ;-) My inline comments were more or less random. I just wanted to get a feeling of the changes. I can have a look in the next days. :-) |
Thanks for the feedback Ufuk. I tried to address the points you have mentioned. Concerning the problem of failing travis builds, it turned out to be race condition in the execution graph I stumbled upon. Due to this race condition, it was possible for a job to finish before all vertices have properly reached a finished state. It is fixed with the latest commit. |
It looks like it was really hard to catch. :-) Congrats! I don't know how much work it is, but we might want to make sure to have a test case for this (if possible) or add an extra comment that no one messes with the patch. ;-) |
a9f4379
to
221f100
Compare
… blocking calls. Fixed ExecutionVertexCancelTest after removing submitTask and cancelTask.
…s reregistration in case of disconnect. Introduced akka.ask.timeout config parameter to configure akka timeouts.
…er their tasks. Replaced the scheduler's execution service with akka's futures. Introduced TestStreamEnvironment to use ForkableFlinkMiniCluster for test execution.
…output for maven verify.
…ch mechanism is the current mean to detect dead instances.
…AllocatedSlot, Instance, CoLocationConstraint, SharedSlot and SlotSharingGroupAssignment serializable. Integrated Kryo to be used to serialize Akka messages.
…ificant performance loss.
…integration tests. Increase akka logger startup timeout.
…travis fork count to 2.
…connection manager if a single task manager is used for local execution. Remove synchronized blcok in getReceiverList of ChannelManager which effectively serialized the connection lookup calls of a single task manager. Fix Java6 problem that File has no method toPath
…urrently within futures
…ndency conflicts. Adjust code to comply to respective Akka API. Remove obsolete TODO.
… dependency conflict
… all vertices have called the finalizeOnMaster method.
…nager is still alive. Terminate waiting for a response in case of a job manager outage.
105f576
to
88e64fc
Compare
This closes apache#149
…park Module closes apache#149
Replaced the old Nephele RPC service with akka based system. Thus, several components are now implemented as actors. This includes the JobManager, TaskManager, MemoryArchivist, JobClient. The legacy RPC service and the corresponding protocols are removed.
Replaced also the execution service of the ExecutionGraph by akka's futures to unify the system.
Removed the LocalInstanceManager whose task is now handled by the InstanceManager. The responsibility to create local task managers is now delegated to the FlinkMiniCluster.
The EventCollector was removed and the respective event classes. The events are now directly sent to the respective listeners.
Moved the resources of the WebInfoServer and the WebInterfaceServer to the resource folders of the corresponding projects. As a consequence these resources are bundled with the jars and directly served from them by Jetty.
The yarn client was adapted to communicate with the actors. The former ApplicationMaster is combined with the JobManager to simplify the system. The uber-jar is now created with maven's shading plugin.
Since this is a big change I would be happy if another pair of eyes could take a look at it.