Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding known issue for MESOS-1688 #1860

Closed
wants to merge 3 commits into from
Closed

Adding known issue for MESOS-1688 #1860

wants to merge 3 commits into from

Conversation

MartinWeindel
Copy link
Contributor

When using Mesos with the fine-grained mode, a Spark job can run into a dead lock on low allocatable memory on Mesos slaves. As a work-around 32 MB (= Mesos MIN_MEM) are allocated for each task, to ensure Mesos making new offers after task completion.
From my perspective, it would be better to fix this problem in Mesos by dropping the constraint on memory for offers, but as temporary solution this patch helps to avoid the dead lock on current Mesos versions.
See [MESOS-1688] No offers if no memory is allocatable for details for this problem.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

Hey Martin,

I'm having a bit of trouble seeing how this works around the issue. From what I can tell the issue is that if someone creates Executors that consume all memory, Mesos will refuse to make offers for the tasks. However, this fix just adds 32MB of memory as a requirement for the task... but it seems like if the offer is never made in the first place, this will make no difference. Can you describe a sequence of offers where this change alters the execution? Thanks for looking into this!

  • Patrick

@MartinWeindel
Copy link
Contributor Author

Hey Patrick,

first of all let me emphasize again that this is only a work-around. The
real problem is that Mesos only makes offers if there are at least 32 MB
memory available which conflicts with allocating memory only for Spark
worker executors and none for tasks.
You seem to be right, this work-around does not help if executors
already consume all memory (up to a remainder of <= 31 MB).
So I don't know if it will avoid dead locks in all cases.

I can only argue from an experimental point of view, that I have not
seen the dead lock in my cluster anymore after applying this patch (I
have tested under very heavy work load).
I suspect the chance is very small that another executor starts before
at least one task of the first executor is started.
In any case, after a task is finished, there are at least 32 MB memory
allocatable so that Mesos always will make offers and the dead lock is
avoided.

BTW, I have also played with changing the executor memory so that there
must always be some Mesos slave memory left over, but to my surprise
this did not avoid the dead locks reliable.

So I'm not sure if this patch should be integrated into the Spark source
code.
But I hope it helps to understand the issue. And maybe it makes the
fine-grained mode usable for similar setups like mine until a better
solution has been found.

If I can help in any way, just tell me.

Best regards,
Martin

Am 24.08.2014 19:16, schrieb Patrick Wendell:

Hey Martin,

I'm having a bit of trouble seeing how this works around the issue.
From what I can tell the issue is that if someone creates Executors
that consume all memory, Mesos will refuse to make offers for the
tasks. However, this fix just adds 32MB of memory as a requirement for
the task... but it seems like if the offer is never made in the first
place, this will make no difference. Can you describe a sequence of
offers where this change alters the execution? Thanks for looking into
this!

  • Patrick


Reply to this email directly or view it on GitHub
#1860 (comment).

@mateiz
Copy link
Contributor

mateiz commented Aug 25, 2014

From my knowledge of Mesos, this seems like a good fix. I think we should do this until MESOS-1688 is fixed.

@mateiz
Copy link
Contributor

mateiz commented Aug 25, 2014

Jenkins, test this please

@mateiz
Copy link
Contributor

mateiz commented Aug 25, 2014

BTW @MartinWeindel one small request -- can you update the docs/running-on-mesos.md page to explain that each task will consume 32 MB? Otherwise people might set Spark's executor memory to be all of the memory on the Mesos worker, which is going to mean no tasks launched.

@SparkQA
Copy link

SparkQA commented Aug 25, 2014

QA tests have started for PR 1860 at commit d9d2ca6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 25, 2014

QA tests have finished for PR 1860 at commit d9d2ca6.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "$
    • $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "$

@mateiz
Copy link
Contributor

mateiz commented Aug 25, 2014

BTW this failure is due to a style check -- you can run sbt scalastyle locally to find all style issues (the Jenkins log also lists the problem).

@iven
Copy link

iven commented Aug 25, 2014

@MartinWeindel I think you should check if there's enough memory in the offer first.

@mateiz
Copy link
Contributor

mateiz commented Aug 25, 2014

That's true, now that we take 32 MB extra you need to change the logic about how many tasks we can allocate. That will make it trickier.

@pwendell
Copy link
Contributor

Hey @MartinWeindel - I'm curious, which of the following cases are you in:

Case 1. You have individual executors that attempt to acquire all the memory on the node.

Case 2. You have multiple executors per node, but their total memory adds up the total amount of memory on the node.

I could see how this would help with Case 2 because it could prevent a second executor from being launched in a way that acquires all of the host memroy. But I'm still wondering whether it affects Case 1.

@MartinWeindel
Copy link
Contributor Author

Yes, this becomes tricky. And I don't see a satisfying solution, as I would
have to predict how many tasks will run in parallel to ensure that there is
enough memory for each task.

This patch solves one problem, but will introduce new ones. Because it's
only dealing on the symptoms not on the cause.
I think it is better not to integrate it.

I've already created a pull request to get the cause fixed in Mesos:
apache/mesos#24

On Mon, Aug 25, 2014 at 7:58 AM, Matei Zaharia notifications@github.com
wrote:

That's true, now that we take 32 MB extra you need to change the logic
about how many tasks we can allocate. That will make it trickier.


Reply to this email directly or view it on GitHub
#1860 (comment).

@mateiz
Copy link
Contributor

mateiz commented Aug 25, 2014

After thinking about this more, it seems that another workaround is to make sure your executors always leave 32 MB free on each node (even if you launch multiple executors, make sure their sizes don't add up to quite the full memory). Would that work? If so, we can just add that to the docs.

@MartinWeindel MartinWeindel changed the title work around for problem with Mesos offering semantic Adding known issue for MESOS-1688 Aug 25, 2014
@MartinWeindel
Copy link
Contributor Author

OK, so I have reverted the work-around patch and added a known issue paragraph to the running-on-mesos documentation.

@mateiz
Copy link
Contributor

mateiz commented Aug 27, 2014

Cool, thanks, that looks great.

@asfgit asfgit closed this in be043e3 Aug 27, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
When using Mesos with the fine-grained mode, a Spark job can run into a dead lock on low allocatable memory on Mesos slaves. As a work-around 32 MB (= Mesos MIN_MEM) are allocated for each task, to ensure Mesos making new offers after task completion.
From my perspective, it would be better to fix this problem in Mesos by dropping the constraint on memory for offers, but as temporary solution this patch helps to avoid the dead lock on current Mesos versions.
See [[MESOS-1688] No offers if no memory is allocatable](https://issues.apache.org/jira/browse/MESOS-1688) for details for this problem.

Author: Martin Weindel <martin.weindel@gmail.com>

Closes apache#1860 from MartinWeindel/master and squashes the following commits:

5762030 [Martin Weindel] reverting work-around
a6bf837 [Martin Weindel] added known issue for issue MESOS-1688
d9d2ca6 [Martin Weindel] work around for problem with Mesos offering semantic (see [https://issues.apache.org/jira/browse/MESOS-1688])
@timothysc
Copy link

Just for crossref MESOS-1688 has been committed and will be part of 0.21.0 release cycle.

@mateiz
Copy link
Contributor

mateiz commented Sep 20, 2014

Great! I'll create a JIRA to update Spark to it when that comes out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants