-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high performace hook: irqbalance: incorrect handling of short affinity mask #6145
Comments
The high performance hook support the dynamic IRQ balancing thanks some well known annotation. Pods can opt out from IRQ handling exposing this annotation. On pod startup, crio will recompute the irqbalance ban mask and reapply irqbalance to make sure the cpus assigned to these pods will not handle IRQs - being added to the irqbalance ban mask. This works only for guaranteed pods requesting integral CPUs. To support this feature, crio needs to recompute the cpu ban mask. In turn, crio depends on encoding/hex.DecodeString for some of the heavy lifting. The function expects to deal with even-sized inbound strings (which are trivial transformations of the irqbalance ban mask). So crio does left-aligned zero padding of the irq affinity mask at the beginning of computation. The left-padded mask is then adjusted excluding any relevant cpu, and then inverted. Here lies the problem: the padding character should always be "0" (ASCII zero), not "f", which is obtained inverting the original correct zero character. The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the irqbalance config file, and they are never removed. Fixing the padding or the mask inversion require relatively invasive code changes, for example to pass around the expected mask length and act accordingly. While not particularly challenging per se, these change don't fit smoothly in the current code. Another option is to clamp the computed masks to the expected length once the computation and the inversion has been done. We pursue this approach in this change. Fixes: cri-o#6145 Signed-off-by: Francesco Romani <fromani@redhat.com>
Testing the irqbalance handling on tuned restart highlighted a crio bug when handling odd-numbered cpu affinity masks. We expect this bug to have little impact on production environments because it's unlikely they will have a number of CPUs multiple by 4 but not multiple by 8, but still we filed cri-o/cri-o#6145 In order to have a backportable change, we fix our utility code to deal with incorrect padding. Signed-off-by: Francesco Romani <fromani@redhat.com>
The high performance hook support the dynamic IRQ balancing thanks some well known annotation. Pods can opt out from IRQ handling exposing this annotation. On pod startup, crio will recompute the irqbalance ban mask and reapply irqbalance to make sure the cpus assigned to these pods will not handle IRQs - being added to the irqbalance ban mask. This works only for guaranteed pods requesting integral CPUs. To support this feature, crio needs to recompute the cpu ban mask. In turn, crio depends on encoding/hex.DecodeString for some of the heavy lifting. The function expects to deal with even-sized inbound strings (which are trivial transformations of the irqbalance ban mask). So crio does left-aligned zero padding of the irq affinity mask at the beginning of computation. The left-padded mask is then adjusted excluding any relevant cpu, and then inverted. Here lies the problem: the padding character should always be "0" (ASCII zero), not "f", which is obtained inverting the original correct zero character. The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the irqbalance config file, and they are never removed. Fixing the padding or the mask inversion require relatively invasive code changes, for example to pass around the expected mask length and act accordingly. While not particularly challenging per se, these change don't fit smoothly in the current code. Another option is to clamp the computed masks to the expected length once the computation and the inversion has been done. We pursue this approach in this change. Fixes: cri-o#6145 Signed-off-by: Francesco Romani <fromani@redhat.com>
Testing the irqbalance handling on tuned restart highlighted a crio bug when handling odd-numbered cpu affinity masks. We expect this bug to have little impact on production environments because it's unlikely they will have a number of CPUs multiple by 4 but not multiple by 8, but still we filed cri-o/cri-o#6145 In order to have a backportable change, we fix our utility code to deal with incorrect padding. Signed-off-by: Francesco Romani <fromani@redhat.com>
* test: perfprof: utils: make sure to unquote cpus Make sure to unquote the cpumask output to prevent false negatives Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: utils: robustness fixes Add testcases and log enhancement emerged during the local testing. Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: tuned: disable the irqbalance plugin The tuned irqbalance plugin clears the irqbalance banned CPUs list when tuned starts. The list is then managed dynamically by the runtime handlers. On node restart, the tuned pod can be started AFTER the workload pods (kubelet nor kubernetes offers ordering guarantees when recovering the node state); clearing the banned CPUs list while pods are running and compromising the IRQ isolation guarantees. Same holds true if the NTO pod restarts for whatever reason. To prevent this disruption, we disable the irqbalance plugin entirely. Another component in the stack must now clear the irqbalance cpu ban list once per node reboot. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2105123 Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: perfprof: add tests for cpu ban list handling Add a test to verify that a tuned restart will not clear the irqbalance cpu ban list, which is the key reason why we disabled the irqbalance tuned plugin earlier. Note there is no guarantee that any component in the stack will reset the irqbalance copu ban list once. crio inconditionally does a restore depending on a snapshot taken the first time the server runs, which is likely but not guaranteed to be correct. There's no way to declare or check the content of the value crio would reset. Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: functests: utils: fix cpu mask padding Testing the irqbalance handling on tuned restart highlighted a crio bug when handling odd-numbered cpu affinity masks. We expect this bug to have little impact on production environments because it's unlikely they will have a number of CPUs multiple by 4 but not multiple by 8, but still we filed cri-o/cri-o#6145 In order to have a backportable change, we fix our utility code to deal with incorrect padding. Signed-off-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>
Testing the irqbalance handling on tuned restart highlighted a crio bug when handling odd-numbered cpu affinity masks. We expect this bug to have little impact on production environments because it's unlikely they will have a number of CPUs multiple by 4 but not multiple by 8, but still we filed cri-o/cri-o#6145 In order to have a backportable change, we fix our utility code to deal with incorrect padding. Signed-off-by: Francesco Romani <fromani@redhat.com>
In commit c5cf0bd we added a workaround for the incorrect crio mask padding (see: cri-o/cri-o#6145 ) But the fix implemented here is partial. Address the gap. Signed-off-by: Francesco Romani <fromani@redhat.com>
In commit c5cf0bd we added a workaround for the incorrect crio mask padding (see: cri-o/cri-o#6145 ) But the fix implemented here is partial. Address the gap. Signed-off-by: Francesco Romani <fromani@redhat.com>
In commit c5cf0bd we added a workaround for the incorrect crio mask padding (see: cri-o/cri-o#6145 ) But the fix implemented here is partial. Address the gap. Signed-off-by: Francesco Romani <fromani@redhat.com>
* podsecurity: disable OCP label sync Per last OCP recommendation (internal doc) is not sufficient to declare the pod security labels, we also need to disable the label sync, which we do here. Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: perfprof: crio mask fix workaround part 2 In commit c5cf0bd we added a workaround for the incorrect crio mask padding (see: cri-o/cri-o#6145 ) But the fix implemented here is partial. Address the gap. Signed-off-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>
* test: perfprof: utils: make sure to unquote cpus Make sure to unquote the cpumask output to prevent false negatives Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: utils: robustness fixes Add testcases and log enhancement emerged during the local testing. Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: tuned: disable the irqbalance plugin The tuned irqbalance plugin clears the irqbalance banned CPUs list when tuned starts. The list is then managed dynamically by the runtime handlers. On node restart, the tuned pod can be started AFTER the workload pods (kubelet nor kubernetes offers ordering guarantees when recovering the node state); clearing the banned CPUs list while pods are running and compromising the IRQ isolation guarantees. Same holds true if the NTO pod restarts for whatever reason. To prevent this disruption, we disable the irqbalance plugin entirely. Another component in the stack must now clear the irqbalance cpu ban list once per node reboot. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2105123 Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: perfprof: add tests for cpu ban list handling Add a test to verify that a tuned restart will not clear the irqbalance cpu ban list, which is the key reason why we disabled the irqbalance tuned plugin earlier. Note there is no guarantee that any component in the stack will reset the irqbalance copu ban list once. crio inconditionally does a restore depending on a snapshot taken the first time the server runs, which is likely but not guaranteed to be correct. There's no way to declare or check the content of the value crio would reset. Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: functests: utils: fix cpu mask padding Testing the irqbalance handling on tuned restart highlighted a crio bug when handling odd-numbered cpu affinity masks. We expect this bug to have little impact on production environments because it's unlikely they will have a number of CPUs multiple by 4 but not multiple by 8, but still we filed cri-o/cri-o#6145 In order to have a backportable change, we fix our utility code to deal with incorrect padding. Signed-off-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com> Co-authored-by: Francesco Romani <fromani@redhat.com>
A friendly reminder that this issue had no activity for 30 days. |
In commit c5cf0bd we added a workaround for the incorrect crio mask padding (see: cri-o/cri-o#6145 ) But the fix implemented here is partial. Address the gap. Signed-off-by: Francesco Romani <fromani@redhat.com>
…ane (#467) * podsecurity: disable OCP label sync Per last OCP recommendation (internal doc) is not sufficient to declare the pod security labels, we also need to disable the label sync, which we do here. Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: perfprof: crio mask fix workaround part 2 In commit c5cf0bd we added a workaround for the incorrect crio mask padding (see: cri-o/cri-o#6145 ) But the fix implemented here is partial. Address the gap. Signed-off-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com> Co-authored-by: Francesco Romani <fromani@redhat.com>
Closing this issue since it had no activity in the past 90 days. |
/reopen |
@fromanirh: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle stale |
A friendly reminder that this issue had no activity for 30 days. |
* test: perfprof: utils: make sure to unquote cpus Make sure to unquote the cpumask output to prevent false negatives Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: utils: robustness fixes Add testcases and log enhancement emerged during the local testing. Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: tuned: disable the irqbalance plugin The tuned irqbalance plugin clears the irqbalance banned CPUs list when tuned starts. The list is then managed dynamically by the runtime handlers. On node restart, the tuned pod can be started AFTER the workload pods (kubelet nor kubernetes offers ordering guarantees when recovering the node state); clearing the banned CPUs list while pods are running and compromising the IRQ isolation guarantees. Same holds true if the NTO pod restarts for whatever reason. To prevent this disruption, we disable the irqbalance plugin entirely. Another component in the stack must now clear the irqbalance cpu ban list once per node reboot. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2105123 Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: perfprof: add tests for cpu ban list handling Add a test to verify that a tuned restart will not clear the irqbalance cpu ban list, which is the key reason why we disabled the irqbalance tuned plugin earlier. Note there is no guarantee that any component in the stack will reset the irqbalance copu ban list once. crio inconditionally does a restore depending on a snapshot taken the first time the server runs, which is likely but not guaranteed to be correct. There's no way to declare or check the content of the value crio would reset. Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: functests: utils: fix cpu mask padding Testing the irqbalance handling on tuned restart highlighted a crio bug when handling odd-numbered cpu affinity masks. We expect this bug to have little impact on production environments because it's unlikely they will have a number of CPUs multiple by 4 but not multiple by 8, but still we filed cri-o/cri-o#6145 In order to have a backportable change, we fix our utility code to deal with incorrect padding. Signed-off-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>
* podsecurity: disable OCP label sync Per last OCP recommendation (internal doc) is not sufficient to declare the pod security labels, we also need to disable the label sync, which we do here. Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: perfprof: crio mask fix workaround part 2 In commit c5cf0bd we added a workaround for the incorrect crio mask padding (see: cri-o/cri-o#6145 ) But the fix implemented here is partial. Address the gap. Signed-off-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani is this a CRI-O issue or one that should live somewhere else? |
* test: perfprof: utils: make sure to unquote cpus Make sure to unquote the cpumask output to prevent false negatives Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: utils: robustness fixes Add testcases and log enhancement emerged during the local testing. Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: tuned: disable the irqbalance plugin The tuned irqbalance plugin clears the irqbalance banned CPUs list when tuned starts. The list is then managed dynamically by the runtime handlers. On node restart, the tuned pod can be started AFTER the workload pods (kubelet nor kubernetes offers ordering guarantees when recovering the node state); clearing the banned CPUs list while pods are running and compromising the IRQ isolation guarantees. Same holds true if the NTO pod restarts for whatever reason. To prevent this disruption, we disable the irqbalance plugin entirely. Another component in the stack must now clear the irqbalance cpu ban list once per node reboot. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2105123 Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: perfprof: add tests for cpu ban list handling Add a test to verify that a tuned restart will not clear the irqbalance cpu ban list, which is the key reason why we disabled the irqbalance tuned plugin earlier. Note there is no guarantee that any component in the stack will reset the irqbalance copu ban list once. crio inconditionally does a restore depending on a snapshot taken the first time the server runs, which is likely but not guaranteed to be correct. There's no way to declare or check the content of the value crio would reset. Signed-off-by: Francesco Romani <fromani@redhat.com> * perfprof: functests: utils: fix cpu mask padding Testing the irqbalance handling on tuned restart highlighted a crio bug when handling odd-numbered cpu affinity masks. We expect this bug to have little impact on production environments because it's unlikely they will have a number of CPUs multiple by 4 but not multiple by 8, but still we filed cri-o/cri-o#6145 In order to have a backportable change, we fix our utility code to deal with incorrect padding. Signed-off-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>
* podsecurity: disable OCP label sync Per last OCP recommendation (internal doc) is not sufficient to declare the pod security labels, we also need to disable the label sync, which we do here. Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: perfprof: crio mask fix workaround part 2 In commit c5cf0bd we added a workaround for the incorrect crio mask padding (see: cri-o/cri-o#6145 ) But the fix implemented here is partial. Address the gap. Signed-off-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>
A friendly reminder that this issue had no activity for 30 days. |
Closing this issue since it had no activity in the past 90 days. |
The high performance hook support the dynamic IRQ balancing thanks some well known annotation. Pods can opt out from IRQ handling exposing this annotation. On pod startup, crio will recompute the irqbalance ban mask and reapply irqbalance to make sure the cpus assigned to these pods will not handle IRQs - being added to the irqbalance ban mask. This works only for guaranteed pods requesting integral CPUs. To support this feature, crio needs to recompute the cpu ban mask. In turn, crio depends on encoding/hex.DecodeString for some of the heavy lifting. The function expects to deal with even-sized inbound strings (which are trivial transformations of the irqbalance ban mask). So crio does left-aligned zero padding of the irq affinity mask at the beginning of computation. The left-padded mask is then adjusted excluding any relevant cpu, and then inverted. Here lies the problem: the padding character should always be "0" (ASCII zero), not "f", which is obtained inverting the original correct zero character. The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the irqbalance config file, and they are never removed. Fixing the padding or the mask inversion require relatively invasive code changes, for example to pass around the expected mask length and act accordingly. While not particularly challenging per se, these change don't fit smoothly in the current code. Another option is to clamp the computed masks to the expected length once the computation and the inversion has been done. We pursue this approach in this change. Fixes: cri-o#6145 Signed-off-by: Francesco Romani <fromani@redhat.com>
/reopen the issue, AFAIK, still exists |
@ffromani: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The high performance hook support the dynamic IRQ balancing thanks some well known annotation. Pods can opt out from IRQ handling exposing this annotation. On pod startup, crio will recompute the irqbalance ban mask and reapply irqbalance to make sure the cpus assigned to these pods will not handle IRQs - being added to the irqbalance ban mask. This works only for guaranteed pods requesting integral CPUs. To support this feature, crio needs to recompute the cpu ban mask. In turn, crio depends on encoding/hex.DecodeString for some of the heavy lifting. The function expects to deal with even-sized inbound strings (which are trivial transformations of the irqbalance ban mask). So crio does left-aligned zero padding of the irq affinity mask at the beginning of computation. The left-padded mask is then adjusted excluding any relevant cpu, and then inverted. Here lies the problem: the padding character should always be "0" (ASCII zero), not "f", which is obtained inverting the original correct zero character. The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the irqbalance config file, and they are never removed. Fixing the padding or the mask inversion require relatively invasive code changes, for example to pass around the expected mask length and act accordingly. While not particularly challenging per se, these change don't fit smoothly in the current code. Another option is to clamp the computed masks to the expected length once the computation and the inversion has been done. We pursue this approach in this change. Fixes: cri-o#6145 Signed-off-by: Francesco Romani <fromani@redhat.com>
A friendly reminder that this issue had no activity for 30 days. |
Closing this issue since it had no activity in the past 90 days. |
What happened?
The high performance hook support the dynamic IRQ balancing thanks some well known annotation.
Pods can opt out from IRQ handling exposing this annotation. On pod startup, crio will recompute the irqbalance ban mask and reapply irqbalance to make sure the cpus assigned to these pods will not handle IRQs - being added to the irqbalance ban mask.
This works only for guaranteed pods requesting integral CPUs.
To support this feature, crio needs to recompute the cpu ban mask. In turn, crio depends on
encoding/hex.DecodeString
for some heavy lifting. The function expects to deal with even-sized inbound strings (which are trivial transformations of the irqbalance ban mask).So crio does left-aligned zero padding of the irq affinity mask at the beginning of computation.
The left-padded mask is then adjusted excluding any relevant cpu, and then inverted.
Here lies the problem: the padding character should always be "0" (ASCII zero), not "f", which is obtained inverting the original correct zero character.
The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the irqbalance config file, and they are never removed.
To the best of my knowledge, this happens only when the IRQ affinity mask is a multiple of 4 but not of 8, so on systems with 4, 12, 20... cpus, which will have odd-numbered irq affinity mask reported by the kernel.
Machines like this are expected to be quite rare in production environment, but relatively common on CI (especially 4 and 12 cpus)
What did you expect to happen?
cri-o should compute the padding characters correctly using ASCII zero ("0") and not just trivially invert the computed mask, which however was padded correctly.
IOW, the problem is the computation of the inverted mask (= the ban list)
How can we reproduce it (as minimally and precisely as possible)?
run the dynamic IRQ balancing code on a system with 4 cpus, like the openshift CI: see openshift/cluster-node-tuning-operator@4821953 (e2e test added as part of openshift/cluster-node-tuning-operator#396)
Excerpt of a test run which demonstrates the behavior
note this:
stray unexpected "f" in the banned cpu mask
Anything else we need to know?
this issue is likely present since the introdution of the dynamic IRQ balancing feature.
CRI-O and Kubernetes version
OS version
Additional environment details (AWS, VirtualBox, physical, etc.)
The text was updated successfully, but these errors were encountered: