Feature: crmd: Support "actions-limit" - the number of jobs that the TE is allowed to execute in parallel on a node #360

gao-yan · 2013-09-26T10:36:48Z

Except "batch-limit", which is a cluster-wide concurrency limit,
there are two per-node concurrency limits can be configured now:

actions-limit:
The number of jobs that the TE is allowed to execute in parallel on a
node. If set to zero or unset, defaults to at least 4 or twice the
number of CPU cores on the DC. -1 means unlimited.
migration-limit:
The number of migration (migrate_to/migrate_from) jobs that the TE is
allowed to execute in parallel on a node. If set to zero or unset,
defaults to half the number of CPU cores on the DC. -1 means unlimited.

They can be configured as a global property in crm_config.
They can be configured as a per-node attribute, which can override
the global property.
In practical, also depending on the configuration, any of them can
be reached first.

l-mb · 2013-09-26T11:49:47Z

Looks good. A few minor suggestions on the wording of the limits are in-line. Thank you!

…TE is allowed to execute in parallel on a node Except "batch-limit", which is a cluster-wide concurrency limit, there are two per-node concurrency limits can be configured now: * actions-limit: The number of jobs that the TE is allowed to execute in parallel on a node. If set to zero or unset, defaults to at least 4 or twice the number of CPU cores on the DC. -1 means unlimited. * migration-limit: The number of migration (migrate_to/migrate_from) jobs that the TE is allowed to execute in parallel on a node. If set to zero or unset, defaults to half the number of CPU cores on the DC. -1 means unlimited. 1. They can be configured as a global property in crm_config. 2. They can be configured as a per-node attribute, which can override the global property. 3. In practical, also depending on the configuration, any of them can be reached first.

gao-yan · 2013-09-26T12:43:13Z

Gooe suggestions. Changed, thanks!

beekhof · 2013-10-02T05:17:18Z

Can we put this on hold for a bit?
I'd like to see if we can auto-tune this (David and I have been kicking around some ideas) instead of adding a new option.

actions-limit=0 <--- wouldn't that mean the node can't run any actions by default?

gao-yan · 2013-10-02T12:50:54Z

I left "-1" for unlimited, which could still be needed by some users. In practice, no user would want to block the cluster transition by setting it to zero, I think. So I chose zero for the default, which means to choose an appropriate default limit for user.

Personally, I don't think adding a new option would be a problem. We'd just need to make sure the default behavior is sane.

l-mb · 2013-10-02T18:09:43Z

I do not believe auto-tuning this is feasible for all scenarios. We can try to choose a sane default, yes (whatever factor times number of cores), but nodes have various capabilities and resources different load impact. Users have a legitimate need to tune this as required for their configuration.

I like choosing good defaults and auto-tuning what we can, but that doesn't mean that not exposing tunables is always a good idea. We should not err from the "lets make everything configurable" to the "oh no, a tunable" philosophy ;-) ("What power users hate about GNOME/Apple, chapter 1")

l-mb · 2013-10-02T18:10:58Z

(And, for what it is worth, this is an upgrade regression for some users that can't switch, or that hit node overload situations after an update. So for us, it's fairly high priority.)

beekhof · 2013-10-03T04:02:18Z

On 03/10/2013, at 4:09 AM, Lars Marowsky-Brée notifications@github.com wrote:

I do not believe auto-tuning this is feasible for all scenarios. We can try to choose a sane default, yes (whatever factor times number of cores), but nodes have various capabilities and resources different load impact. Users have a legitimate need to tune this as required for their configuration.

I like choosing good defaults and auto-tuning what we can, but that doesn't mean that not exposing tunables is always a good idea. We should not err from the "lets make everything configurable" to the "oh no, a tunable" philosophy ;-) ("What power users hate about GNOME/Apple, chapter 1")

Yep, fair call.

I need to figure out how to start a few hundred LXC containers, so it happens to be an area I'll be looking at closely RealSoonNow.

Just give me a couple of days to see what comes out of my auto tuning ideas.
I'm not saying we don't need any options, but maybe we need some different ones or for them to work in a different way.

beekhof · 2013-10-03T04:03:32Z

On 02/10/2013, at 10:50 PM, "Gao,Yan" notifications@github.com wrote:

I left "-1" for unlimited, which could still be needed by some users. In practice, no user would want to block the cluster transition by setting it to zero, I think. So I chose zero for the default,

But doesn't that make it a bad default? Everyone getting the value no-one would want?

which means to choose an appropriate default limit for user.

Personally, I don't think adding a new option would be a problem. We'd just need to make sure the default behavior is sane.

—
Reply to this email directly or view it on GitHub.

gao-yan · 2013-10-03T05:03:16Z

Since actually the default value is not fixed, it depends on the CPU cores on the DC. The issue is how we should indicate the "default_value" field in pe_opts[]. Shouldn't we still indicate it with a special integer?

l-mb · 2013-10-03T08:00:05Z

Twice the number of cores is not an unreasonable default. I think most users won't have to touch it further, so I hope its not the default noone wants ;-)

Ultimately, users may ask for per resource type limits. We could add class/provider/type then. This might help with tons of vms and still starting everything else at higher concurrency. But we didn't want to make this too complex in v1, instead waiting for actual requests.

beekhof · 2013-10-28T06:50:36Z

I think most are reasonably happy with the throttling code now. Closing.

beekhof closed this Oct 28, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: crmd: Support "actions-limit" - the number of jobs that the TE is allowed to execute in parallel on a node #360

Feature: crmd: Support "actions-limit" - the number of jobs that the TE is allowed to execute in parallel on a node #360

gao-yan commented Sep 26, 2013

l-mb commented Sep 26, 2013

gao-yan commented Sep 26, 2013

beekhof commented Oct 2, 2013

gao-yan commented Oct 2, 2013

l-mb commented Oct 2, 2013

l-mb commented Oct 2, 2013

beekhof commented Oct 3, 2013

beekhof commented Oct 3, 2013

gao-yan commented Oct 3, 2013

l-mb commented Oct 3, 2013

beekhof commented Oct 28, 2013

Feature: crmd: Support "actions-limit" - the number of jobs that the TE is allowed to execute in parallel on a node #360

Feature: crmd: Support "actions-limit" - the number of jobs that the TE is allowed to execute in parallel on a node #360

Conversation

gao-yan commented Sep 26, 2013

l-mb commented Sep 26, 2013

gao-yan commented Sep 26, 2013

beekhof commented Oct 2, 2013

gao-yan commented Oct 2, 2013

l-mb commented Oct 2, 2013

l-mb commented Oct 2, 2013

beekhof commented Oct 3, 2013

beekhof commented Oct 3, 2013

gao-yan commented Oct 3, 2013

l-mb commented Oct 3, 2013

beekhof commented Oct 28, 2013