Skip to content
This repository has been archived by the owner on Aug 4, 2022. It is now read-only.

[REEF-726] Race condition with completed Containers #476

Closed
wants to merge 1 commit into from

Conversation

afchung
Copy link
Contributor

@afchung afchung commented Sep 9, 2015

This addressed the issue by

  • Not releasing Evaluators directly in YarnContainerManager and instead delegate the job to EvaluatorManager on container complete.

JIRA:
REEF-726

This addressed the issue by
  * Not releasing Evaluators directly in ``YarnContainerManager`` and instead delegate the job to ``EvaluatorManager`` on container complete.

JIRA:
  [REEF-726](https://issues.apache.org/jira/browse/REEF-726)
@afchung
Copy link
Contributor Author

afchung commented Sep 9, 2015

Tested on HDInsight.

@markusweimer
Copy link
Contributor

@afchung Did you run the whole Java test suite as well?

@afchung
Copy link
Contributor Author

afchung commented Sep 9, 2015

@markusweimer Yes. Please have a look. Thanks!

@markusweimer
Copy link
Contributor

@afchung Awesome! I'll do a pass.

@asfbot
Copy link

asfbot commented Sep 9, 2015

REEF-pull-request-windows3 #406 SUCCESS
This pull request looks good

@markusweimer
Copy link
Contributor

This assumes that the release message from YARN is the last message, correct? What if we get the YARN message first and then the final heartbeat? I believe that is what EvaluatorManager.isResourceReleased was introduced for.

@afchung
Copy link
Contributor Author

afchung commented Sep 9, 2015

@markusweimer No, this works in either case because both end up calling close(), and whoever wins the race will be the one to release the Evaluator.

@afchung
Copy link
Contributor Author

afchung commented Sep 9, 2015

In other words, if we get the final heartbeat later, we will just ignore the message because this.stateManager.isDoneOrFailedOrKilled() would have been set to true. onResourceStatusMessage() and onEvaluatorHeartbeatMessage() are both synchronized on this.evaluatorDescriptor.

@asfbot
Copy link

asfbot commented Sep 10, 2015

Reef-pull-request-ubuntu #505 SUCCESS
This pull request looks good

@afchung
Copy link
Contributor Author

afchung commented Sep 14, 2015

@markusweimer Are there any concerns that I should address here?

@markusweimer
Copy link
Contributor

@afchung Not really, I will do another pass now.

@asfgit asfgit closed this in 48c16b8 Sep 14, 2015
markusweimer pushed a commit to markusweimer/reef that referenced this pull request Sep 16, 2015
This removes the releasing of Evaluators directly in `YarnContainerManager` and
instead delegates the job to `EvaluatorManager` on container complete.

JIRA:
  [REEF-726](https://issues.apache.org/jira/browse/REEF-726)

Pull Request:
  This closes apache#476
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants