Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an optional cpu hard limit #1775

Merged
merged 37 commits into from Apr 19, 2018

Conversation

Projects
None yet
3 participants
@ssalinas
Copy link
Member

ssalinas commented Apr 4, 2018

If a hard limit on cpu is not enabled on the mesos slave/agent, cpu usage is only controlled by cpu shares. It can be useful to also add an additional hard limit, such that a single task cannot overwhelm a slave. This adds a cpuHardLimit and cpuHardLimitScaleFactor configuration parameter in the base SingularityConfiguration, which will:

  • propagate a hard limit to ExecutorData for the task. If the hard limit is below the requested task cpus, that task's new hard limit is requested cpus * cpuHardLimitScaleFactor
  • triggers the executor to echo a calculated cfs quota and cfs period (defaults to the recommended 100000) to the cgroup settings for the task in runner.sh. For docker tasks, this adds the --cpu-quota/--cpu-period flags to the docker run command

@ssalinas ssalinas added the hs_staging label Apr 4, 2018

@ssalinas

This comment has been minimized.

Copy link
Member Author

ssalinas commented Apr 5, 2018

Additionally started tracking cpu throttled time in the metrics and reporting it in the UI. Also fixed our UI bug where the cpu resource usage wasn't updating frequently enough

@ssalinas

This comment has been minimized.

Copy link
Member Author

ssalinas commented Apr 5, 2018

Interesting tidbit, the nr_throttled and nr_periods do not actually get collected by the mesos slave when cgroups_enable_cfs is not set

ssalinas added some commits Apr 6, 2018

{{#if runContext.cfsQuota }}
function get_base_cgroup_directory {
if [ -d "/cgroup" ]; then
echo "/cgroup"

This comment has been minimized.

@PaulFurtado

PaulFurtado Apr 9, 2018

Member

These are just mountpoints and people/distros are free to mount the cgroup controllers anywhere. You can get this robustly with findmnt from util-linux (which is installed by default on pretty much every distro).

findmnt --kernel --first-only --types cgroup --options cpu --noheadings --output TARGET

will output /cgroup/cpu on a HubSpot machine.

This comment has been minimized.

@ssalinas

ssalinas Apr 9, 2018

Author Member

👍 thanks, will update

ssalinas added some commits Apr 9, 2018

@ssalinas ssalinas added the hs_qa label Apr 9, 2018

ssalinas added some commits Apr 9, 2018

@ssalinas ssalinas added this to the 0.20.0 milestone Apr 11, 2018

cfsChecker.watch();
cgroupCheckers.put(task.getTaskId(), cfsChecker);
} catch (Throwable t) {
LOG.error("Could not start cgorup checker for task {}", task.getTaskId(), t);

This comment has been minimized.

@baconmania

baconmania Apr 11, 2018

Contributor

minor typo

ssalinas added some commits Apr 11, 2018

@baconmania

This comment has been minimized.

Copy link
Contributor

baconmania commented Apr 12, 2018

🚢

@ssalinas ssalinas added the hs_stable label Apr 18, 2018

@baconmania

This comment has been minimized.

Copy link
Contributor

baconmania commented Apr 18, 2018

🚢

@ssalinas ssalinas merged commit 8e67cad into master Apr 19, 2018

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ssalinas ssalinas deleted the cpu_hard_limit branch Apr 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.