-
Notifications
You must be signed in to change notification settings - Fork 97
[REEF-726] Race condition with completed Containers #476
Conversation
This addressed the issue by * Not releasing Evaluators directly in ``YarnContainerManager`` and instead delegate the job to ``EvaluatorManager`` on container complete. JIRA: [REEF-726](https://issues.apache.org/jira/browse/REEF-726)
Tested on HDInsight. |
@afchung Did you run the whole Java test suite as well? |
@markusweimer Yes. Please have a look. Thanks! |
@afchung Awesome! I'll do a pass. |
REEF-pull-request-windows3 #406 SUCCESS |
This assumes that the release message from YARN is the last message, correct? What if we get the YARN message first and then the final heartbeat? I believe that is what |
@markusweimer No, this works in either case because both end up calling |
In other words, if we get the final heartbeat later, we will just ignore the message because |
Reef-pull-request-ubuntu #505 SUCCESS |
@markusweimer Are there any concerns that I should address here? |
@afchung Not really, I will do another pass now. |
This removes the releasing of Evaluators directly in `YarnContainerManager` and instead delegates the job to `EvaluatorManager` on container complete. JIRA: [REEF-726](https://issues.apache.org/jira/browse/REEF-726) Pull Request: This closes apache#476
This addressed the issue by
YarnContainerManager
and instead delegate the job toEvaluatorManager
on container complete.JIRA:
REEF-726