New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: crmd: Support "actions-limit" - the number of jobs that the TE is allowed to execute in parallel on a node #360
Conversation
Looks good. A few minor suggestions on the wording of the limits are in-line. Thank you! |
…TE is allowed to execute in parallel on a node Except "batch-limit", which is a cluster-wide concurrency limit, there are two per-node concurrency limits can be configured now: * actions-limit: The number of jobs that the TE is allowed to execute in parallel on a node. If set to zero or unset, defaults to at least 4 or twice the number of CPU cores on the DC. -1 means unlimited. * migration-limit: The number of migration (migrate_to/migrate_from) jobs that the TE is allowed to execute in parallel on a node. If set to zero or unset, defaults to half the number of CPU cores on the DC. -1 means unlimited. 1. They can be configured as a global property in crm_config. 2. They can be configured as a per-node attribute, which can override the global property. 3. In practical, also depending on the configuration, any of them can be reached first.
Gooe suggestions. Changed, thanks! |
Can we put this on hold for a bit? actions-limit=0 <--- wouldn't that mean the node can't run any actions by default? |
I left "-1" for unlimited, which could still be needed by some users. In practice, no user would want to block the cluster transition by setting it to zero, I think. So I chose zero for the default, which means to choose an appropriate default limit for user. Personally, I don't think adding a new option would be a problem. We'd just need to make sure the default behavior is sane. |
I do not believe auto-tuning this is feasible for all scenarios. We can try to choose a sane default, yes (whatever factor times number of cores), but nodes have various capabilities and resources different load impact. Users have a legitimate need to tune this as required for their configuration. I like choosing good defaults and auto-tuning what we can, but that doesn't mean that not exposing tunables is always a good idea. We should not err from the "lets make everything configurable" to the "oh no, a tunable" philosophy ;-) ("What power users hate about GNOME/Apple, chapter 1") |
(And, for what it is worth, this is an upgrade regression for some users that can't switch, or that hit node overload situations after an update. So for us, it's fairly high priority.) |
On 03/10/2013, at 4:09 AM, Lars Marowsky-Brée notifications@github.com wrote:
Yep, fair call. I need to figure out how to start a few hundred LXC containers, so it happens to be an area I'll be looking at closely RealSoonNow. Just give me a couple of days to see what comes out of my auto tuning ideas. |
On 02/10/2013, at 10:50 PM, "Gao,Yan" notifications@github.com wrote:
But doesn't that make it a bad default? Everyone getting the value no-one would want?
|
Since actually the default value is not fixed, it depends on the CPU cores on the DC. The issue is how we should indicate the "default_value" field in pe_opts[]. Shouldn't we still indicate it with a special integer? |
Twice the number of cores is not an unreasonable default. I think most users won't have to touch it further, so I hope its not the default noone wants ;-) Ultimately, users may ask for per resource type limits. We could add class/provider/type then. This might help with tons of vms and still starting everything else at higher concurrency. But we didn't want to make this too complex in v1, instead waiting for actual requests. |
I think most are reasonably happy with the throttling code now. Closing. |
Except "batch-limit", which is a cluster-wide concurrency limit,
there are two per-node concurrency limits can be configured now:
The number of jobs that the TE is allowed to execute in parallel on a
node. If set to zero or unset, defaults to at least 4 or twice the
number of CPU cores on the DC. -1 means unlimited.
The number of migration (migrate_to/migrate_from) jobs that the TE is
allowed to execute in parallel on a node. If set to zero or unset,
defaults to half the number of CPU cores on the DC. -1 means unlimited.
the global property.
be reached first.