Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.2: backport cgroup related changes to improve CPU and memory accounting #59184

Merged

Conversation

darinpp
Copy link
Contributor

@darinpp darinpp commented Jan 20, 2021

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release

itsbilal and others added 3 commits January 19, 2021 20:05
Enhances the util/cgroups package to also support getting
CPU limits from the process' current cgroup. The cpu limit / share
for the current cgroup is denoted in two separate variables:
the cpu period, and the cpu quota. When (quota / period) = numCPUs,
this cgroup has access to all the CPUs on the system.

This PR also updates SampleEnvironment to call this method
and adjust the cpu usage % accordingly.

Release note (bug fix): Improve accuracy of reported CPU usage
when running in containers.
The cgroup information is constructed by looking at `/proc/self/cgroup`,
matching with the mount point in `/proc/self/mountinfo`
and then inpsecting the `memory.stat` file. The matching between the
cgroup and the mount point was working only in case that the mount and
the cgroup are exact match relative to cgroup NS. This is however
inadequate as while true for the docker case, it isn't true in the general
case. In this example determining the cgroup memory limit isn't possible:
```
cgcreate -t $USER:$USER -a $USER:$USER -g memory:crdb_test
echo 3999997952 > /sys/fs/cgroup/memory/crdb_test/memory.limit_in_bytes
cgexec -g memory:crdb_test ./cockroach start-single-nod
```
To address this, this patch instead constructs the expected
relative path of the cgroup's memory info by starting with the
mount path and then adding the cgroup.

Release note: None
Having a GOMAXPROCS value defaulting to the Go default
(numCPUs) is inefficient if there's a lower CPU limit
set for the cgroup the cockroach process is in. The
Go runtime could be scheduling multiple OS-level
threads, not all of which would get to run concurrently.

This change sees if the GOMAXPROCS env variable
was set, denoting an overriden value. If it isn't
set, and we're inside a cgroup, the value of
GOMAXPROCS is now lowered to the maximum

This is an implementation of part of the advice given
in cockroachdb/docs#9001 .
The CPU requests/shares part of that equation cannot
be implemented in Cockroach, so the docs issue remains
unchanged.

Release note (general change): Run fewer threads in parallel if
running inside a container with a CPU limit.
@darinpp darinpp requested a review from a team as a code owner January 20, 2021 04:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@darinpp darinpp merged commit d18660d into cockroachdb:release-20.2 Jan 20, 2021
@knz knz added this to In progress in DB Server & Security via automation Jan 20, 2021
@darinpp darinpp requested a review from itsbilal January 20, 2021 21:31
Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks for the backport!

Reviewed 3 of 3 files at r1, 2 of 2 files at r2, 3 of 3 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @darinpp)


pkg/cli/start.go, line 361 at r3 (raw file):

	// The GO default for GOMAXPROCS is runtime.NumCPU(), however this is less
	// than ideal if the cgruop is limited to a number lower than that.
	cgroups.AdjustMaxProcs(ctx)

Does this (and maybe the cpuShares substitution in runtime) need to be behind a feature flag if it's a change being made to a stable release? I'm fine with it either way, just confirming if this is required under our release rules.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell)


pkg/cli/start.go, line 361 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Does this (and maybe the cpuShares substitution in runtime) need to be behind a feature flag if it's a change being made to a stable release? I'm fine with it either way, just confirming if this is required under our release rules.

@bdarnell can we get your input on this question? (Context is that the PR was merged without checking this first; if you think we need to do something about configurability here, this needs to be done switfly.)

@bdarnell
Copy link
Member

I'm OK calling this a bugfix and doing it without a feature flag for a patch release.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks 💯

@knz knz moved this from In progress to Done 21.2 in DB Server & Security Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
DB Server & Security
  
Done 21.2
Development

Successfully merging this pull request may close these issues.

None yet

5 participants