Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes / containers: Advise adjusting GOMAXPROCS based on cpu requests/limits #9001

Open
itsbilal opened this issue Nov 26, 2020 · 3 comments

Comments

@itsbilal
Copy link
Member

itsbilal commented Nov 26, 2020

Bilal Akhtar (itsbilal) commented:

Currently, our docs on using Cockroach with Kubernetes mention CPU requests and limits:

https://www.cockroachlabs.com/docs/v20.2/kubernetes-performance#resource-requests

The same knobs also exist in all cases of running Cockroach in cgroup-driven clusters (eg. Docker), but for simplicity I'm going to use k8s as an umbrella term to refer to all container-based setups with CPU resource limiting in this issue.

Our docs make little reference made to the need to adjust the GOMAXPROCS environment variable based on either setting. The Cockroach process has visibility into CPU limits when it reports CPU usage percentages (since cockroachdb/cockroach#56461), however it doesn't adjust GOMAXPROCS on its own based on CPU limits. Furthermore, the cockroach process cannot calculate the percentage of all CPU shares in the parent cgroup that it has access to, as it doesn't have visibility into the shares given to other sibling cgroup controllers / k8s pods.

If GOMAXPROCS is higher than the CPU limit set, then some goroutines will get scheduled for execution by the Go runtime but will then have to block for one of the (fewer) available cpus to be ready to execute it, resulting in degraded concurrent performance.

Similarly, for CPU requests, let's say if there are 8 CPUs on a machine, and there are 4 pods that all request the same CPU request amount, each pod would get 25% of CPUs (so ~2 cores or core-equivalent runtime) when all pods are maxing out CPU utilization. If such a workload persists for a while, and GOMAXPROCS is greater than 2, then it's possible for performance to be less ideal than if GOMAXPROCS was right at 2.

The ideal GOMAXPROCS setting is going to vary based on the workload, but it's probably a good idea to advise setting it to something like this:

GOMAXPROCS = floor(num_cores * min(cpu_limit_ratio_for_this_container, 2 * cpu_share_ratio_for_this_container))

The 2 * is there to allow for some burst utilization, in cases where cpu requests are much lower than cpu limits.

Jira Issue: DOC-834

@itsbilal
Copy link
Member Author

Assigning @taroface for now, feel free to re-assign as appropriate

@knz
Copy link
Contributor

knz commented Dec 1, 2020

Discussed offline - @itsbilal to fish out how this is computed in CC, as an example best practice.

@itsbilal
Copy link
Member Author

itsbilal commented Dec 2, 2020

Had a conversation with CC engineers, seems like I was mistaken on CC adjusting for this. cockroachdb/cockroach#57390 is what they expect Cockroach to do in this situation, so I've gone ahead and implemented it. The advice outlined in this issue would still be necessary for all users of containers, especially those that use cpu weights/shares.

itsbilal added a commit to itsbilal/cockroach that referenced this issue Dec 2, 2020
Having a GOMAXPROCS value defaulting to the Go default
(numCPUs) is inefficient if there's a lower CPU limit
set for the cgroup the cockroach process is in. The
Go runtime could be scheduling multiple OS-level
threads, not all of which would get to run concurrently.

This change sees if the GOMAXPROCS env variable
was set, denoting an overriden value. If it isn't
set, and we're inside a cgroup, the value of
GOMAXPROCS is now lowered to the maximum

This is an implementation of part of the advice given
in cockroachdb/docs#9001 .
The CPU requests/shares part of that equation cannot
be implemented in Cockroach, so the docs issue remains
unchanged.

Release note (general change): Run fewer threads in parallel if
running inside a container with a CPU limit.
itsbilal added a commit to itsbilal/cockroach that referenced this issue Dec 3, 2020
Having a GOMAXPROCS value defaulting to the Go default
(numCPUs) is inefficient if there's a lower CPU limit
set for the cgroup the cockroach process is in. The
Go runtime could be scheduling multiple OS-level
threads, not all of which would get to run concurrently.

This change sees if the GOMAXPROCS env variable
was set, denoting an overriden value. If it isn't
set, and we're inside a cgroup, the value of
GOMAXPROCS is now lowered to the maximum

This is an implementation of part of the advice given
in cockroachdb/docs#9001 .
The CPU requests/shares part of that equation cannot
be implemented in Cockroach, so the docs issue remains
unchanged.

Release note (general change): Run fewer threads in parallel if
running inside a container with a CPU limit.
itsbilal added a commit to itsbilal/cockroach that referenced this issue Dec 4, 2020
Having a GOMAXPROCS value defaulting to the Go default
(numCPUs) is inefficient if there's a lower CPU limit
set for the cgroup the cockroach process is in. The
Go runtime could be scheduling multiple OS-level
threads, not all of which would get to run concurrently.

This change sees if the GOMAXPROCS env variable
was set, denoting an overriden value. If it isn't
set, and we're inside a cgroup, the value of
GOMAXPROCS is now lowered to the maximum

This is an implementation of part of the advice given
in cockroachdb/docs#9001 .
The CPU requests/shares part of that equation cannot
be implemented in Cockroach, so the docs issue remains
unchanged.

Release note (general change): Run fewer threads in parallel if
running inside a container with a CPU limit.
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Dec 4, 2020
55829: kvserver: improve logging of draining lease transfer failures  r=andreimatei a=andreimatei

The respective log message now tells you more about why the transfer
failed; in particular it tells you when no transfer targets were found.

Release note: None

56468: server: include more heap info in runtime stats r=andreimatei a=andreimatei

This patch includes a few more memory stats in the runtime infor that we
log periodically. This is to attempt to clarify the relationship between
the go heap and the go "total" more. For example, currently you see
lines like this:
runtime stats: 13 GiB RSS, 544 goroutines, 5.3 GiB/3.6 GiB/8.2 GiB GO alloc/idle/total

The relationship between the idle and the total is ambiguous; some part
of idle is included, but another part (HeapReleased) was already
released to the OS and not included. By printing more, we'll hopefully
disambiguate, as well as get info on Go heap "fragmentation" - space
reserved in the heap spans but not currently in use.

Release note: None

57295: sql: support regr_avgx() r=yuzefovich a=obahareth

Support aggregate function `regr_avgx()`. It referred to the PostgreSQL implementation, but the implementation details follow SQL 2003. See: postgresql.org/docs/10/functions-aggregate.html

See #41274

57390: cli: Set GOMAXPROCS on start if in a CPU-limited cgroup r=knz a=itsbilal

Having a GOMAXPROCS value defaulting to the Go default
(numCPUs) is inefficient if there's a lower CPU limit
set for the cgroup the cockroach process is in. The
Go runtime could be scheduling multiple OS-level
threads, not all of which would get to run concurrently.

This change sees if the GOMAXPROCS env variable
was set, denoting an overriden value. If it isn't
set, and we're inside a cgroup, the value of
GOMAXPROCS is now lowered to the maximum

This is an implementation of part of the advice given
in cockroachdb/docs#9001 .
The CPU requests/shares part of that equation cannot
be implemented in Cockroach, so the docs issue remains
unchanged.

Release note (general change): Run fewer threads in parallel if
running inside a container with a CPU limit.

57412: backupccl: modify show backup privileges r=dt a=adityamaru

Previously, show backup was admin only. This change makes it in line
with BACKUP which determines the need for an admin or not based on the
URI and creds.

Fixes: #57294

Release note (sql change): Show backup is no longer admin only. It
depends on the URI construct and the creds specified in the SHOW BACKUP
query.

57544: jobs: quiet logging an error when context is canceled r=ajwerner a=ajwerner

This was showing up during shutdown. Presumably this indicates that we should
be waiting for these jobs to terminate and is a hint that there's a graceful
shutdown problem, however, we already know that we have these.

Release note: None

57555: sql: fix ProducerMetadata use after Release r=RaduBerinde,yuzefovich a=asubiotto

The DistSQL receiver was reading ProducerMetadata fields after releasing it
back to a sync.Pool. This would cause races. This commit fixes the ordering
of events and moves the release of the ProducerMetadata object outside a
conditional (it was previously only released when Metrics were set).

Release note: None

Closes #57527 

cc @RaduBerinde 

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Omar Bahareth <omar@omar.engineer>
Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Co-authored-by: Aditya Maru <adityamaru@gmail.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com>
darinpp pushed a commit to darinpp/cockroach that referenced this issue Jan 20, 2021
Having a GOMAXPROCS value defaulting to the Go default
(numCPUs) is inefficient if there's a lower CPU limit
set for the cgroup the cockroach process is in. The
Go runtime could be scheduling multiple OS-level
threads, not all of which would get to run concurrently.

This change sees if the GOMAXPROCS env variable
was set, denoting an overriden value. If it isn't
set, and we're inside a cgroup, the value of
GOMAXPROCS is now lowered to the maximum

This is an implementation of part of the advice given
in cockroachdb/docs#9001 .
The CPU requests/shares part of that equation cannot
be implemented in Cockroach, so the docs issue remains
unchanged.

Release note (general change): Run fewer threads in parallel if
running inside a container with a CPU limit.
darinpp pushed a commit to darinpp/cockroach that referenced this issue Jan 20, 2021
Having a GOMAXPROCS value defaulting to the Go default
(numCPUs) is inefficient if there's a lower CPU limit
set for the cgroup the cockroach process is in. The
Go runtime could be scheduling multiple OS-level
threads, not all of which would get to run concurrently.

This change sees if the GOMAXPROCS env variable
was set, denoting an overriden value. If it isn't
set, and we're inside a cgroup, the value of
GOMAXPROCS is now lowered to the maximum

This is an implementation of part of the advice given
in cockroachdb/docs#9001 .
The CPU requests/shares part of that equation cannot
be implemented in Cockroach, so the docs issue remains
unchanged.

Release note (general change): Run fewer threads in parallel if
running inside a container with a CPU limit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants