add 'balanced' scheduler strategy #227

phemmer · 2015-01-07T17:04:37Z

This adds a 'balanced' strategy. This is a very basic strategy which will evenly distribute containers among the docker hosts.

MrGossett · 2015-01-07T18:12:06Z

+1

vieux · 2015-01-07T19:51:34Z

@phemmer don't you need the overcommit parameter here as well ?

Docker doesn't return the exact amount of memory available, a little less (the system returns a little less)

phemmer · 2015-01-07T20:16:38Z

Not sure. I opted to leave it out as I thought the strategy should be basic.

The idea behind the binpacking algorithm is to fill a single 'bin' to it's max, and then continue to the next one, so overcommit is necessary there as you need to know maximum capacity (I'm sure you're aware of this, just adding here for completeness).

This algorithm is meant to be a lot simpler in that it evenly distributes everything, so a maximum isn't as important, and it won't refuse to start a container due to lack of available resources. Because it won't refuse to start a container, overcommit doesn't make much sense.

My thoughts on why it shouldn't be limited is that I think it would be unexpected for swarm to refuse to start containers, as this is not how docker behaves. There's also currently nothing that warns the user that they are approaching the maximum capacity. So a sudden refusal to start containers can be very bad.

I can easily add this in if desired. Perhaps with an insanely high overcommit default value so the same effect is reached.

vieux · 2015-01-07T20:19:28Z

This thing is, if all the machines in your cluster have 2gigs of RAM and you want to run a container with -m 2g with your strategy, it won't work.

phemmer · 2015-01-07T20:21:30Z

I think you have an incomplete thought: "if all the machines in your cluster" what?

vieux · 2015-01-07T20:22:10Z

@phemmer sorry, updated

vieux · 2015-01-07T20:24:04Z

but we are thinking about moving the overcommit outside of the strategies, it would be a top level thing.

phemmer · 2015-01-07T20:25:27Z

Ah, yes, because of the pre-check node filter. Valid point, I'll add it in.

phemmer · 2015-01-07T20:54:57Z

Ok, added in.

However one comment on the overcommitness used by swarm is that it has a critical difference from linux's overcommit. In linux, 100 overcommit means no overcommit, and to use exactly how much memory is available. Swarm is treating 0 as no overcommit. The advantage of using 100 as no overcommit is that if you want to prevent the system from launching something that uses nearly all the available resource, you can, by setting the overcommit value to 90 or so.
I think this would be a good idea for swarm to adopt, but I kept this strategy using the same behavior as binpack for consistency.

I'll start on some tests for the strategy if there are no further changes requested.

vieux · 2015-01-07T21:20:28Z

if we use this, it means our default whould be 105 ?

phemmer · 2015-01-07T21:56:18Z

You could keep the scale if you wanted, and use 1.00 as no overcommit, or you can use 100.
So either 1.05 or 105 would be the equivalent of the current 0.05. 1.05 is probably more intuitive as then it becomes standard multiplication (100mb * 1.05 = 105mb)

chanwit · 2015-01-08T12:12:57Z

+1
@phemmer I'm about to propose the "least running containers" strategy to balance my cluster. Hopefully I can use yours instead of inventing a new one. Cheers!

phemmer · 2015-01-08T13:12:26Z

Well there is one thing that might be unexpected. This strategy doesn't consider whether the container is running or not. If you have 10 stopped containers on one node, but 0 running, and 5 running containers on another node, it will place the new container on the one with 5 containers.
This was done as it's how the binpack strategy behaves. Though it might be a good idea to add a flag to control the behavior.

phemmer · 2015-01-14T13:43:55Z

Will rebase for #228 and work on adding some tests.

vieux · 2015-01-14T19:05:05Z

thanks @phemmer, sorry about that.

vieux · 2015-01-17T02:28:39Z

@phemmer could you please add some tests similar to binpacking_test.go ?

vieux · 2015-01-17T02:29:28Z

flags.go

@@ -57,7 +57,7 @@ var (
 	}
 	flStrategy = cli.StringFlag{
 		Name:  "strategy",
-		Usage: "placement strategy to use [binpacking, random]",
+		Usage: "PlacementStrategy to use [balanced, binpacking, random]",


While you add the tests, can you switch this back to placement strategy (lowercase + space)

phemmer · 2015-01-17T05:07:45Z

Sorry for the delay. Rebased, node.go fixed, and tests added.

phemmer · 2015-01-17T05:32:09Z

I'm also still uncertain about the overcommit thing. As mentioned earlier, this scheduler will allow you to add 2x 2gb containers to a 3gb node, and this isn't how binpack behaves.
I don't much like the idea of having it refuse to launch containers if they exceed the limits, but the reasons for doing so are pretty strong, and consistency is another big factor.

I see a few possible solutions:

Leave as is (perhaps open a ticket to continue discussion on the matter)
Enforce the limit, and add an option to toggle it.
Have the binpack scheduler fall back to the balanced scheduler if all nodes are full, so that it gets the same behavior.

aluzzardi · 2015-01-19T22:18:39Z

@phemmer Can you please clarify?

phemmer · 2015-01-19T22:35:46Z

@aluzzardi Beyond the explanation already provided, i'm not sure how. Perhaps an example:

Lets say you have 2 nodes, each with 1.5gb memory.
You start one container using 1gb, it goes to node A.
You start another container using 1gb, it goes to node B.
You start a third container using 1gb, it goes to node A.
Node A's total reserved memory is now 2gb.

aluzzardi · 2015-01-19T23:08:52Z

@phemmer With how much overcommit?

phemmer · 2015-01-19T23:11:46Z

none, or even the default 0.05. But less than 150%.

vieux · 2015-01-19T23:14:33Z

Ok, In my opinion, it should work the same way as binpacking: the 3rd commit shouldn't start because there is not enough ressource.

// what do you think of the name "spread" instead of balanced ?

phemmer · 2015-01-19T23:17:35Z

But then why doesn't docker behave that way? Docker will happily let you launch a container even if it exceeds the node's capacity.

As for the name, i'm not too fond of "spread", as I think the term is rather ambiguous. The "random" strategy spreads containers mostly evenly based on count, so the name should differentiate how the strategy behaves.

vieux · 2015-01-21T00:42:56Z

In any case, we going to postpone the merge of this PR after the RC, so we can try to find a solution that works for everybody.

rgbkrk · 2015-01-21T03:55:35Z

Looking forward to this one so long as it doesn't overcommit. For JupyterHub's dockerspawner and tmpnb we expect to fill up some expected amount of memory, pooling them in advance. We'd rather not see it overcommit. If it's configurable though, that's a different thing.

vieux · 2015-01-27T00:36:26Z

@jhamrick are you using this PR as is, or tweaked in some ways ?

jhamrick · 2015-01-27T18:23:03Z

@vieux I'm not actually using this PR; I just made a small modification to the binpacking strategy (just reverses the sort order, so it does more of a round robin thing). I may switch to this strategy once the PR is merged, though.

dustbyte · 2015-01-27T21:12:53Z

If I may suggest, why not considering composability through a pipeline of applicable strategies?

That is, each strategy returns a set of potential candidates that are passed to the next one. In fine, the first element of the resulting set (into which members are considered equivalent) is chosen.

It would add a little more complexity within each strategy but would remove the need to repeat code.

In the cli point of view, this could be expressed as such:

swarm manage --strategies=binpacking,balanced ...

in which case the binpacking strategy would be applied before the balanced strategy.

tnachen · 2015-01-27T22:14:14Z

@mota I'm not sure strategies really compose since they often have competing priorities , so at least I don't see a good use case for it yet.
And IMO it becomes harder to implement strategy, since each strategy shouldn't simply take a list of candidates from the last one, lots of them still has look at the global state (ie: balanced needs to balance across cluster) as a whole and then trying to see if any of the ones matches the passed in candidates.
I'm more in favor with a single strategy, and if really need to support configurations per strategy that favors different scenarios.

dustbyte · 2015-01-28T01:03:20Z

@tnachen You're right, strategies interfere with each other.

Nevertheless, I don't agree with your idea of a one monolithic strategy that is applied statefully.

In the first place because filters are applied before strategies. They cannot operate on a cluster as a whole, but only on a subset of it.

Secondly because I think it is best to let the user chose which priority he or she values the most.

However, I'd like to revise my paper and add a little subtlety regarding the implementation I propose.

The idea would be not to associate to each node a pipeline score at each pipeline execution.
Each strategy would still be free to remove unsatisfying candidates, but would add to the remaining nodes a value to their score.

At the end of the pipeline process, the node with the best score is chosen.

That way, the balanced strategy would be implemented such as it only adds to each node's score a value of its own.
Therefore, the current implementation of balanced could be either matched with balanced,binpacking or binpacking,balanced.

tnachen · 2015-01-28T02:17:54Z

Since a filter is user defined, it's supporting cases where users explicitly can prune the selection to what they want, and that sunset is what I refer to as global state.

I think what's missing in your proposal is a concrete use case that deems this necessary, I'm not against the idea, but I'm hoping to keep the scheduler simple to begin with as it can become very hard to reason with what you described.

tnachen · 2015-01-28T02:19:05Z

And balanced with binpack composed
Together doesn't make much sense to me too.

phemmer · 2015-01-28T02:37:46Z

@mota Perhaps you can provide some examples of how you expect a combination of schedulers to work. Because binpacking is pretty much the complete opposite of balanced. I don't see how they can co-exist.

dustbyte · 2015-01-28T20:27:50Z

@phemmer sure thing.

Maybe my view is flawed but as things evolve, I don't see strategies as a code of conduct but more as a best-effort behavior.

First thing, I think the elimination of incapable nodes in terms of resource usage should be made in the filtering phase, not in the strategy phase as it is done currently.

Thus, what I propose is simply to tune the behavior of the scheduler. Let's say you want to favor spreading instead of stacking, you should use the balanced behavior. If in contrary you favor stacking, then you use the binpacking behavior. And finally, if you want to reach a best effort candidate, you apply both.

Tell me if I'm wrong but as I see it, the implementation of balanced you provide is pretty much the binpacking strategy with one added dimension.

bacongobbler · 2015-02-13T09:42:55Z

The strategy README should be updated to reflect this new strategy as well

bacongobbler · 2015-02-13T10:54:47Z

scheduler/strategy/balanced.go

+		return nil, ErrNoResourcesAvailable
+	}
+
+	sort.Sort(scores)


Instead of re-implementing the wheel here, you can just call sort.Sort(sort.Reverse(scores)) on a scores structure. That should clean up a lot of the boilerplate.

aluzzardi · 2015-03-10T23:17:58Z

@phemmer I just merged #458 since it's more recent and merges properly (there have been many changes to the scheduler since this PR was opened).

Does it fits your needs?

vieux · 2015-03-17T17:34:39Z

ping @phemmer ?

phemmer · 2015-03-17T17:40:30Z

Sorry, haven't had a chance to actually build and use it. But from looking at the implementation, this appears to have the same effect as the strategy in this PR, so I think it's good.

vieux · 2015-03-25T00:17:26Z

cool, @phemmer I'm closing this, please comment if you have any issue.

vieux mentioned this pull request Jan 8, 2015

Move overcommit outside of binpacking #228

Merged

vieux added kind/enhancement area/scheduler labels Jan 14, 2015

vieux reviewed Jan 17, 2015
View reviewed changes

vieux added this to the Swarm Beta 0.1.0 milestone Jan 19, 2015

phemmer mentioned this pull request Jan 19, 2015

Bugs in binpacking strategy #253

Closed

vieux removed this from the Swarm Beta 0.1.0 milestone Jan 21, 2015

jhamrick mentioned this pull request Jan 21, 2015

Help test 0.1.0 #267

Closed

chanwit mentioned this pull request Feb 2, 2015

Proposal: Swarm plugin system #347

Closed

aluzzardi added the next label Feb 5, 2015

tnachen mentioned this pull request Feb 13, 2015

add Round-Robin placement strategy #396

Closed

bacongobbler reviewed Feb 13, 2015
View reviewed changes

bacongobbler mentioned this pull request Feb 13, 2015

refactor score to WeightedNode structure #398

Merged

bacongobbler mentioned this pull request Feb 23, 2015

swarm not load balancing to other node #414

Closed

vieux added current and removed next labels Mar 17, 2015

aluzzardi removed current labels Mar 18, 2015

vieux closed this Mar 25, 2015

add 'balanced' scheduler strategy #227

add 'balanced' scheduler strategy #227

Conversation

phemmer commented Jan 7, 2015

MrGossett commented Jan 7, 2015

vieux commented Jan 7, 2015

phemmer commented Jan 7, 2015

vieux commented Jan 7, 2015

phemmer commented Jan 7, 2015

vieux commented Jan 7, 2015

vieux commented Jan 7, 2015

phemmer commented Jan 7, 2015

phemmer commented Jan 7, 2015

vieux commented Jan 7, 2015

phemmer commented Jan 7, 2015

chanwit commented Jan 8, 2015

phemmer commented Jan 8, 2015

phemmer commented Jan 14, 2015

vieux commented Jan 14, 2015

vieux commented Jan 17, 2015

vieux Jan 17, 2015

Choose a reason for hiding this comment

phemmer commented Jan 17, 2015

phemmer commented Jan 17, 2015

aluzzardi commented Jan 19, 2015

phemmer commented Jan 19, 2015

aluzzardi commented Jan 19, 2015

phemmer commented Jan 19, 2015

vieux commented Jan 19, 2015

phemmer commented Jan 19, 2015

vieux commented Jan 21, 2015

rgbkrk commented Jan 21, 2015

vieux commented Jan 27, 2015

jhamrick commented Jan 27, 2015

dustbyte commented Jan 27, 2015

tnachen commented Jan 27, 2015

dustbyte commented Jan 28, 2015

tnachen commented Jan 28, 2015

tnachen commented Jan 28, 2015

phemmer commented Jan 28, 2015

dustbyte commented Jan 28, 2015

bacongobbler commented Feb 13, 2015

bacongobbler Feb 13, 2015

Choose a reason for hiding this comment

aluzzardi commented Mar 10, 2015

vieux commented Mar 17, 2015

phemmer commented Mar 17, 2015

vieux commented Mar 25, 2015