-
Notifications
You must be signed in to change notification settings - Fork 1.1k
mesos: no resources available to schedule container #1183
Comments
Hi there, I think there might be some problems around resource accounting in the swarm mesos scheduler that we need to figure out. Another thing is that we're refactoring the scheduler so we don't use a full offer per task, but try to run multiple tasks on a offer when we can. So short answer is that once we have the refactoring merged hopefully this will be fixed too. |
+1 |
This makes Swarm+Mesos completely unusable. What is even worse is that SWARM is sucking up all the resources from other frameworks too. @tnachen can you point me at the refactoring work you mention, this is a blocker for me right now and I'd like to follow along and at least provide some testing, if not patches. |
@ahmetalpbalkan hey, sorry for the delay, we are aware of the issue. Can you try with #1212 to see if it helps ? thanks. |
This will not work with the current PR for now. Will update here as soon as the PR is updated with a reasonable solution, it's more tricky than expected because:
TL;DR swarm only Also it feels wrong to lock a |
Not sure what you mean by relying on docker/runc for the resource reservation, but I think there is a bit of impedence mismatch here since if you rely only on docker for resource on the other hand your available resources is not given by docker but by Mesos, so you could get out of sync. And about locking, If you don't lock than effectively you use compareAndInc/Dec for accurate accounting, but since you have multiple resources then I'm not really sure this makes your logic even clearer. |
Sorry for the confusion, Mesos was a bad example for the whole second part. Because we rely on Mesos for resource offers and to inform us of available resources.. But as the refactoring (in #1212) does not concern only Mesos I got lost in my thoughts... Second part concerns mostly swarm and hypothetical cluster drivers relying on |
I meet a similar issue in my cluster (Swarm 0.5.0-dev + Mesos 0.25.0); when I run sleep 10000000, docker cli return "no resources available ", but it's running in the slave host & showing in Mesos GUI. |
I confirm that this appears to work. At least on the tests I've done so far I'm not seeing the issue. Thank you @vieux |
Hi, I have a Mesos cluster consisting of 2 agent nodes:
I started 2 containers yesterday like the following:
and I can see they are completed now:
However when I try to schedule another container I'm getting this “no resources available” error:
I tried cleaning up some stopped containers but
is taking forever (been a few minutes, not returned yet,
docker rm
is getting stuck). I'm setting-c 1 -m 100
it seemed okay and the cluster has plenty of resources offered. Any ideas what's going wrong here?The text was updated successfully, but these errors were encountered: