New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvm: Add support for cgroupv2 #8252
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## 4.18 #8252 +/- ##
============================================
+ Coverage 13.02% 13.13% +0.10%
- Complexity 9032 9141 +109
============================================
Files 2720 2720
Lines 257080 257710 +630
Branches 40088 40172 +84
============================================
+ Hits 33476 33838 +362
- Misses 219400 219582 +182
- Partials 4204 4290 +86 ☔ View full report in Codecov by Sentry. |
Thanks @BryanMLima , for picking this up and for the extensive explanation. I have two questions:
regards, |
Oh, and number 3
|
@DaanHoogland, regarding the first question, ACS calculates the shares by multiplying the frequency by the number of cores, both specified in the compute offering; this is done in method About the second question, I think I did not understand it fully; could you add more details? By live systems, do you mean hosts or VMs?
This PR already address the migration of VMs between hosts with different versions, as the shares value is calculated in this process considering the VM's host destination. Thus, two VMs with the exact same compute offering will have different shares values for cgroups v1 and v2. The shares value is only a proportional weighted; as long as all VMs in the same hosts are in the same scale, the CPU time will be distributed accordingly. If the shares value is not set in the domain XML for libvirt (this never happens in ACS, it is always set), it will use the OS default value, which, for cgroupv2, is 1002; thus, the default behaviour for processes in the same cgroup is to have proportional CPU access time. This PR, however, does not address updating the shares of VMs on hosts with cgroupv2 that are already running, requiring restarting, migrating or scaling the VM. Footnotes |
thanks @BryanMLima good work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
not tested yet
@blueorangutan package |
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7890 |
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
@blueorangutan package |
@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
This makes sense from an implementation perspective. Have not dug into the code in great detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shares are all relative, so as long as we calculate some proportional scale that is compatible with the KVM host, it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, will need testing.
@BryanMLima @DaanHoogland @rohityadavcloud @weizhouapache should this be considered for 4.19? I think we should
agreed, or even for the 4.18 branch |
yes, @DaanHoogland @shwstppr |
@blueorangutan package |
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 7956 |
@blueorangutan package |
@vladimirpetrov a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 7964 |
@blueorangutan package |
@vladimirpetrov a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7973 |
@blueorangutan test alma9 kvm-alma9 |
@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests |
[SF] Trillian test result (tid-8523)
|
@blueorangutan package |
@BryanMLima a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8016 |
@blueorangutan test |
@shwstppr a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on manual testing against 4.19 and 4.18.
Testing with Ubuntu 22 environment, host with 6 CPUs x 2000 MHz, the caluclated shares is 9524 (overprovisioning factor 1). Without the fix it's 12000 and deployment fails.
Just one clarification - the actual formula for calculating shares is
shares = NUMBER_CPUS * MIN_SPEED_OR_SPEED (SPEED is a legacy parameter for compatibility with ACS 4.0/4.1)
when MIN_SPEED is calculated like this:
int minspeed = (int)(offering.getSpeed() / (divideCpuByOverprovisioning ? vmProfile.getCpuOvercommitRatio() : 1));
So the overprovisioning factor is also used in the equation.
Let's merge this once integration test results are in |
Test results from backend,
|
1. Problem description In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM hypervisor, an XML file is created in the assigned host, which has a property shares that defines the weight of the VM to access the host CPU. The value of this property has no unit, and it is a relative measure to calculate how much CPU a given VM will have in the host. However, this value has a limit, which depends on the version of cgroup utilized by the host's kernel. The problem lies at the range value of shares that varies between both versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 2. Currently, ACS calculates the value of shares using Equation 1, presented below, where CPU is the number of cores and speed is the CPU frequency; both specified in the VM's compute offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, the shares value will be 12000 and an exception will be thrown by libvirt if the host utilizes cgroup v2. The second version is becoming the default one in current Linux distributions; thus, it is necessary to address this limitation. Equation 1 shares = CPU * speed Fixes: apache#6744 2. Proposed changes To address the problem described, we propose to apply a scale conversion considering the max shares of the host. Using the same formula currently utilized by ACS, it is possible to calculate the maximum shares of a VM for a given host. In other words, using the number of cores and the nominal speed of the host's CPU as the upper limit of shares allowed to a VM. Then, this value will be scaled to the allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion. The VM shares would be calculated as Equation 2, presented below, where VM requested shares is the requested shares value calculated using Equation 1, cgroup upper limit is fixed with a value of 10000 (cgroups v2 upper limit), and host max shares is the maximum shares value of the host, calculated using Equation 1. Using Equation 2, the only case where a VM passes the cgroup v2 limit is when the user requests more resources than the host has, which is not possible with the current implementation of ACS. Equation 2 shares = (VM requested shares * cgroup upper limit)/host max shares To implement the proposal, the following APIs will be updated: deployVirtualMachine, migrateVirtualMachine and scaleVirtualMachine. When a VM is being deployed, a new verification will be added to find a suitable host. The max shares of each host will be calculated, and the VM calculated shares will be verified if it does not surpass the host's value. Likewise, the migration of VMs will have a similar new verification. Lastly, the scale of VMs will also have the same verification for the VM's host. To determine the max shares of a given host, we will use the same equation currently used in ACS for calculating the shares of VMs, presented in Section 1. When Equation 1 is used to determine the maximum shares of a host, CPU is the number of cores of the host, and speed is the nominal CPU speed, i.e., considering the CPU's base frequency. It is important to note that these changes are only for hosts with the KVM hypervisor using cgroup v2 for now.
Description
In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM hypervisor, an XML file is created in the assigned host, which has a property
shares
that defines the weight of the VM to access the host CPU. The value of this property has no unit, and it is a relative measure to calculate how much CPU a given VM will have in the host. However, this value has a limit, which depends on the version of cgroup utilized by the host's kernel. The problem lies at the range value of shares that varies between both versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 2. Currently, ACS calculates the value ofshares
using Equation 1, presented below, whereCPU
is the number of cores andspeed
is the CPU frequency; both specified in the VM's compute offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, theshares
value will be 12000 and an exception will be thrown by libvirt if the host utilizes cgroup v2. The second version is becoming the default one in current Linux distributions; thus, it is necessary to address this limitation.shares = CPU * speed
Fixes: #6744
To address the problem described, we propose to apply a scale conversion considering the max
shares
of the host. Using the same formula currently utilized by ACS, it is possible to calculate the maximumshares
of a VM for a given host. In other words, using the number of cores and the nominal speed of the host's CPU as the upper limit ofshares
allowed to a VM. Then, this value will be scaled to the allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion.The VM
shares
would be calculated as Equation 2, presented below, whereVM requested shares
is the requestedshares
value calculated using Equation 1,cgroup upper limit
is fixed with a value of 10000 (cgroups v2 upper limit), andhost max shares
is the maximumshares
value of the host, calculated using Equation 1. Using Equation 2, the only case where a VM passes the cgroup v2 limit is when the user requests more resources than the host has, which is not possible with the current implementation of ACS.shares = (VM requested shares * cgroup upper limit)/host max shares
To implement the proposal, the following APIs will be updated:
deployVirtualMachine
,migrateVirtualMachine
andscaleVirtualMachine
. When a VM is being deployed, a new verification will be added to find a suitable host. The maxshares
of each host will be calculated, and the VM calculatedshares
will be verified if it does not surpass the host's value. Likewise, the migration of VMs will have a similar new verification. Lastly, the scale of VMs will also have the same verification for the VM's host.To determine the max
shares
of a given host, we will use the same equation currently used in ACS for calculating theshares
of VMs, presented in Section 1. When Equation 1 is used to determine the maximumshares
of a host,CPU
is the number of cores of the host, andspeed
is the nominal CPU speed, i.e., considering the CPU's base frequency.It is important to note that these changes are only for hosts with the KVM hypervisor using cgroup v2 for now.
To exemplify the proposed changes, consider a host with the following specification: 32 CPU cores with nominal speed of 2 GHz; and a VM with a compute offering with 8 CPU cores and with speed of 2 GHz. With the current ACS implementation, the
shares
of the VM would be calculated as Equation 1. Thus, the VMshares
would be 16000, over the cgroup v2 limit of 10000.With the proposed changes, the VM
shares
would be calculated as Equation 2. In this example,VM requested shares
is 16000,cgroup upper limit
is fixed with a value of 10000, andhost max shares
is 64000. Therefore, the VMshares
results in 2500, well below the cgroup v2 limit.To demonstrate real case scenarios, consider the following hosts:
Table 1 below presents a set of VMs with their requested resources, alongside the
shares
values considering the current implementation, and the newshares
value, for each host, considering the proposed change using Equation 2.Table 2 below presents if the same VMs in Table 1 would be allowed to be allocated to a given host, or if an exception would be thrown, considering current and proposed implementations. As we can see, with the current ACS implementation, VMs 3 through 6 would throw an exception when deploying in host A; even though the host has enough resources. VM 6 should throw an exception when trying to deploy it in host B in both implementations, as the host does not have enough resources to allocate it.
It is important to note that Equation 2 rounds up the
shares
value; thus, there is a precision loss with the conversion. Nevertheless, this precision loss should not be noticeable to the end user, as theshares
value would need to be in a very close interval, e.g.shares
values of3997
,3998
and3999
would be considered as1249
in host B with the new implementation. However, the precision loss is a small drawback for enabling support of cgroup v2 to ACS.With the current proposal, only cgroups version 2 is addressed, as it has impactful limitations. Thus, as future work, cgroups version 1 will also be addressed using the same strategy of linear scale conversion.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
How Has This Been Tested?
Consider the host
host-1
with cgroup v2,host-2
with cgroup v1 and the following custom constrained compute offering below:Deploy of VMs
I created the VM
vm-tmpfs
with 5 cores and allocated it to hosthost-2
.I created the VM
vm-cgroupv2
with 5 cores and allocated it to hosthost-1
.As expected, ACS considered the host resources to set the
shares
values. When the host utilizes the cgroup v1, the default behavior is not changed.VM live scale
I lived scale the VM
vm-tmpfs
, changing its number of cores from 5 to 6.I lived scale the VM
vm-cgroupv2
, changing its number of cores from 5 to 6.As expected, ACS considered the host resources to set the
shares
values. When the host utilizes the cgroup v1, the default behavior is not changed.VM migration
I migrated VM
vm-tmpfs
from hosthost-2
to hosthost-1
(from cgroupv1 to cgroupv2). After the migration, theshares
values was changed to8000
, as expected.I migrated VM
vm-cgroupv2
from hosthost-1
to hosthost-2
(from cgroupv2 to cgroupv1). After the migration, theshares
values was changed to12000
, as expected.How did you try to break this feature and the system with this change?
I migrated VMs between hosts with different cgroup versions, the VM migration section above describes this.