-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
src/gc/unix/gcenv.unix.cpp
Outdated
@@ -530,6 +532,9 @@ uint32_t GCToOSInterface::GetCurrentProcessCpuCount() | |||
if (count == 0 || count > 64) | |||
count = 64; | |||
|
|||
if (cgroup.GetCpuLimit(&cgroupCpuCount)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the opening brace missing a closing brace below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed sorry about that. I wonder how it could have passed all the checks.
I will update the code on Monday. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I was able to change it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gboucher90 This code (the files in src/gc/unix
) doesn't build by default, you can run build.cmd <arch> <config> buildstandalonegc
to test it and see if it builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I was able to check the build for this target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I was going to push a fix for the broken build but I see you just dit it here 8b82312)
@dotnet-bot test Ubuntu Checked standalone_gc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also bring your changes to this file to https://github.com/dotnet/coreclr/blob/master/src/pal/src/misc/cgroup.cpp ? There's some code duplication in this area because the GC has the option to build as a standalone dynamic library, and when it does so it does not depend on the runtime's PAL to interact with the operating system.
It's a little confusing at the moment (and I'm sorry for that, I'll strive to improve the docs a little), but when building standalone (build.cmd/sh <arch> <config> buildstandalonegc
), the GC uses the files in src\gc\unix
to provide its own implementation of GCToOSInterface
instead of using the runtime's implementation of GCToOSInterface
, which mostly delegates to the PAL (see the implementation here: https://github.com/dotnet/coreclr/blob/master/src/vm/gcenv.os.cpp#L353)
Also, if you want your changes to be visible from the runtime (the Environment.ProcessorCount
) scenario, you'll need to bring your changes to the PAL since the runtime will be using it to talk to the underlying OS for the forseeable future.
The code changes themselves look good to me!
5d3945b
to
139a61b
Compare
Thank you for this thorough explanation it helped me a lot. I made the changes ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes under src/gc
look fine to me (assuming you've validated this locally). I'll defer to others (particularly @janvorli) for a review on the PAL code, I'm not particularly familiar with PAL conventions.
src/vm/gcenv.os.cpp
Outdated
#else | ||
size_t cpuLimit; | ||
|
||
if (PAL_GetCpuLimit(&cpuLimit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need to do this if GetCurrentProcessCpuCount
also checks the CPU limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nop indeed this is useless
cd09507
to
3e8bf5d
Compare
@gboucher90 I'll take a look today or early tomorrow. I've just got back from a couple of weeks of vacation. |
src/gc/unix/cgroup.cpp
Outdated
if (errno != 0) | ||
goto done; | ||
|
||
result = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit - incorrect text alignment
src/pal/inc/pal.h
Outdated
@@ -2569,6 +2569,11 @@ PALAPI | |||
PAL_GetWorkingSetSize(size_t* val); | |||
|
|||
PALIMPORT | |||
BOOL | |||
PALAPI | |||
PAL_GetCpuLimit(size_t* val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please change the parameter type to unsigned int? There is no need for it to be bitness specific.
src/gc/unix/cgroup.cpp
Outdated
|
||
static char* FindMemoryCgroupPath(){ | ||
char* memory_cgroup_path = nullptr; | ||
char* memoryHierarchyMount = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit - could you please be consistent with naming convention of the locals?
src/gc/unix/cgroup.cpp
Outdated
cpu_count = quota / period; | ||
if (cpu_count >= 1 && cpu_count < UINT32_MAX){ | ||
*val = cpu_count; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit - the coding convention we use is to have the opening curly brace and also the else on separate lines like this:
if (x)
{
}
else
{
}
src/utilcode/util.cpp
Outdated
size_t cpuLimit; | ||
|
||
if (PAL_GetCpuLimit(&cpuLimit)) | ||
count = cpuLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit should only limit the count, but it should not be able to provide a larger value than the real count of processors if someone sets the period and quota so that their ratio is larger than the real count of processors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point
src/gc/unix/cgroup.cpp
Outdated
} | ||
|
||
static bool IsCpuSubsystem(const char *strTok){ | ||
return strncmp("cpu", strTok, 3) == 0 && strlen(strTok) == 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as for the memory case above.
src/gc/unix/cgroup.cpp
Outdated
result = ReadLongLongValueFromFile(cfs_period_filename, &period); | ||
free(cfs_period_filename); | ||
if (!result || period <= 0) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to extract the lines 82..91 into a separate function and using it for both the period and quota instead of repeating the same block of code twice. You could make it so that the return value would be either positive for success and negative for failure and the function would get the CFS_PERIOD_FILENAME or CFS_QUOTA_FILENAME as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, I introduced "long long ReadCpuCGroupValue(const char* subsystemFilename)" function.
src/gc/unix/cgroup.cpp
Outdated
if (cpu_count >= 1 && cpu_count < UINT32_MAX){ | ||
*val = cpu_count; | ||
} else { | ||
*val = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we want to apply that value as a limit as I have mentioned in another comment, it seems that we should rather return the UINT32_MAX in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, indeed, it makes more sense
size_t cpuLimit; | ||
|
||
if (PAL_GetCpuLimit(&cpuLimit)) | ||
processorCount = cpuLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit should only limit the count, but it should not be able to provide a larger value than the real count of processors if someone sets the period and quota so that their ratio is larger than the real count of processors.
src/gc/unix/gcenv.unix.cpp
Outdated
@@ -530,6 +532,9 @@ uint32_t GCToOSInterface::GetCurrentProcessCpuCount() | |||
if (count == 0 || count > 64) | |||
count = 64; | |||
|
|||
if (GetCpuLimit(&cgroupCpuCount)) | |||
count = cgroupCpuCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit should only limit the count, but it should not be able to provide a larger value than the real count of processors if someone sets the period and quota so that their ratio is larger than the real count of processors.
ea38399
to
d7800f1
Compare
Hello @janvorli, thank you for the review! I made the suggested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for making this change!
There are 2 Docker CLI command line options available that we are interested in here: - `--cpus`: this limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) - `--cpuset-cpus`: this limits the number of processors we have access to on the CPU; it also specifies which specific processor we have access to, but that’s irrelevant here All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All components but `Environment.ProcessorCount` above are aware and take advantage of the values passed to `--cpus` and `--cpuset-cpus`. **`--cpus`** dotnet#12797 has already been done. It impacts all above runtime components (allowing to optimize performance in a container/machine with limited resources). This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797. **`--cpuset-cpus`** The work has been done here for all runtime components except `Environment.ProcessorCount`. The work consist in fixing `PAL_GetLogicalCpuCountFromOS` to use `sched_getaffinity`. Fixes https://github.com/dotnet/coreclr/issues/22302
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
Virtualization containers systems (such as Docker, Mesos, Kubernetes...) often relies on cgroup cpu to limit resources.
Currently "Environment.ProcessorCount" in such systems return the physical number of processors. The problem is that this variable is used to properly scale the thread pool.
In our case the cpu limit in Mesos was set to 2 while the machine had 24 processors. Then the application (running a Kestrel HTTP Server) performed very badly and was often irresponsive.
Memory limit was already correctly implemented. Thus it makes sense to do the same for CPU.