Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high performace hook: irqbalance: incorrect handling of short affinity mask #6145

Open
ffromani opened this issue Aug 11, 2022 · 13 comments · May be fixed by #6146
Open

high performace hook: irqbalance: incorrect handling of short affinity mask #6145

ffromani opened this issue Aug 11, 2022 · 13 comments · May be fixed by #6146
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@ffromani
Copy link
Contributor

What happened?

The high performance hook support the dynamic IRQ balancing thanks some well known annotation.
Pods can opt out from IRQ handling exposing this annotation. On pod startup, crio will recompute the irqbalance ban mask and reapply irqbalance to make sure the cpus assigned to these pods will not handle IRQs - being added to the irqbalance ban mask.
This works only for guaranteed pods requesting integral CPUs.

To support this feature, crio needs to recompute the cpu ban mask. In turn, crio depends on encoding/hex.DecodeString for some heavy lifting. The function expects to deal with even-sized inbound strings (which are trivial transformations of the irqbalance ban mask).

So crio does left-aligned zero padding of the irq affinity mask at the beginning of computation.

The left-padded mask is then adjusted excluding any relevant cpu, and then inverted.
Here lies the problem: the padding character should always be "0" (ASCII zero), not "f", which is obtained inverting the original correct zero character.

The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the irqbalance config file, and they are never removed.

To the best of my knowledge, this happens only when the IRQ affinity mask is a multiple of 4 but not of 8, so on systems with 4, 12, 20... cpus, which will have odd-numbered irq affinity mask reported by the kernel.
Machines like this are expected to be quite rare in production environment, but relatively common on CI (especially 4 and 12 cpus)

What did you expect to happen?

cri-o should compute the padding characters correctly using ASCII zero ("0") and not just trivially invert the computed mask, which however was padded correctly.
IOW, the problem is the computation of the inverted mask (= the ban list)

How can we reproduce it (as minimally and precisely as possible)?

run the dynamic IRQ balancing code on a system with 4 cpus, like the openshift CI: see openshift/cluster-node-tuning-operator@4821953 (e2e test added as part of openshift/cluster-node-tuning-operator#396)

Excerpt of a test run which demonstrates the behavior

[performance] Checking IRQBalance settings Verify irqbalance configuration handling 
  Should not overwrite the banned CPU set on tuned restart
  /home/fromani/go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/irqbalance.go:70
Aug 11 13:39:53.221: [INFO]: MCP "worker" is targeting 3 node(s)
Aug 11 13:39:53.786: [INFO]: MCP "worker" is targeting 3 node(s)
STEP: verifying worker node "ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm"
Aug 11 13:39:54.058: [INFO]: found mcd machine-config-daemon-s5g7t for node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
Aug 11 13:39:55.062: [INFO]: banned CPUs setting: "IRQBALANCE_BANNED_CPUS=\"00000000,000000f0\""
Aug 11 13:39:55.062: [INFO]: banned CPUs: "00000000,000000f0"
Aug 11 13:39:55.062: [INFO]: banned CPUs on "ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm" when test begins: {4-7}
Aug 11 13:39:55.200: [INFO]: found mcd machine-config-daemon-s5g7t for node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
Aug 11 13:39:56.363: [INFO]: found mcd machine-config-daemon-s5g7t for node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
Aug 11 13:39:57.456: [INFO]: using testpod:
{"metadata":{"generateName":"test-","namespace":"performance-addon-operators-testing","creationTimestamp":null,"labels":{"test":""},"annotations":{"cpu-quota.crio.io":"disable","irq-load-balancing.crio.io":"disable"}},"spec":{"containers":[{"name":"test","image":"quay.io/openshift-kni/cnf-tests:4.9","command":["sleep","10h"],"resources":{"limits":{"cpu":"2","memory":"256Mi"}}}],"nodeName":"ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm","runtimeClassName":"performance-perfprof"},"status":{}}
Aug 11 13:40:02.009: [WARNING]: ->  AddedInterface Add eth0 [10.128.2.19/23] from ovn-kubernetes
Aug 11 13:40:02.009: [WARNING]: ->  Pulled Container image "quay.io/openshift-kni/cnf-tests:4.9" already present on machine
Aug 11 13:40:02.009: [WARNING]: ->  Created Created container test
Aug 11 13:40:02.009: [WARNING]: ->  Started Started container test
Aug 11 13:40:02.146: [INFO]: found mcd machine-config-daemon-s5g7t for node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
Aug 11 13:40:03.405: [INFO]: banned CPUs setting: "IRQBALANCE_BANNED_CPUS=\"00000000,000000fa\""
Aug 11 13:40:03.405: [INFO]: banned CPUs: "00000000,000000fa"
Aug 11 13:40:03.405: [INFO]: banned CPUs on "ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm" just before the tuned restart: {1,3-7}
STEP: getting a TuneD Pod running on node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
STEP: causing a restart of the tuned pod (deleting the pod) on ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
run command 'oc [delete pod --wait=true -n openshift-cluster-node-tuning-operator tuned-2cb42]':
  out=pod "tuned-2cb42" deleted
  err=
  ret=<nil>
STEP: getting again a TuneD Pod running on node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
STEP: waiting for the TuneD daemon running on node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
STEP: re-verifying worker node "ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm" after TuneD restart
Aug 11 13:40:13.896: [INFO]: found mcd machine-config-daemon-s5g7t for node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
Aug 11 13:40:14.880: [INFO]: banned CPUs setting: "IRQBALANCE_BANNED_CPUS=\"00000000,000000fa\""
Aug 11 13:40:14.880: [INFO]: banned CPUs: "00000000,000000fa"
Aug 11 13:40:14.880: [INFO]: banned CPUs on "ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm" after the tuned restart: {1,3-7}
Aug 11 13:40:14.880: [INFO]: deleting pod "test-w577h"
Aug 11 13:40:47.564: [INFO]: found mcd machine-config-daemon-s5g7t for node ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm
Aug 11 13:40:48.577: [INFO]: banned CPUs setting: "IRQBALANCE_BANNED_CPUS=\"00000000,000000f0\""
Aug 11 13:40:48.577: [INFO]: banned CPUs: "00000000,000000f0"
Aug 11 13:40:48.577: [INFO]: banned CPUs on "ci-ln-2h3mjpb-72292-2wnt7-worker-a-zbxpm" when test ends: {4-7}

note this:

Aug 11 13:40:48.577: [INFO]: banned CPUs setting: "IRQBALANCE_BANNED_CPUS=\"00000000,000000f0\""

stray unexpected "f" in the banned cpu mask

Anything else we need to know?

this issue is likely present since the introdution of the dynamic IRQ balancing feature.

CRI-O and Kubernetes version

$ crio --version
crio version 1.24.1-11.rhaos4.11.gitb0d2ef3.el8
Version:          1.24.1-11.rhaos4.11.gitb0d2ef3.el8
GoVersion:        go1.18.1
Compiler:         gc
Platform:         linux/amd64
Linkmode:         dynamic
BuildTags:        exclude_graphdriver_devicemapper, containers_image_ostree_stub, seccomp, selinux
SeccompEnabled:   true
AppArmorEnabled:  false
$ kubectl --version
$ oc version
Client Version: 4.10.9
Server Version: 4.11.0-0.nightly-2022-07-24-151159
Kubernetes Version: v1.24.0+9546431

OS version

# On Linux:
$ cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="411.86.202207210724-0"
VERSION_ID="4.11"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 411.86.202207210724-0 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.11/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.11"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.11"
OPENSHIFT_VERSION="4.11"
RHEL_VERSION="8.6"
OSTREE_VERSION="411.86.202207210724-0"

$ uname -a
Linux cnfdt12 4.18.0-372.16.1.el8_6.x86_64 #1 SMP Tue Jun 28 03:02:21 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

bare metal
@ffromani ffromani added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2022
ffromani added a commit to ffromani/cri-o that referenced this issue Aug 11, 2022
The high performance hook support the dynamic IRQ balancing thanks some
well known annotation.
Pods can opt out from IRQ handling exposing this annotation. On pod startup,
crio will recompute the irqbalance ban mask and reapply irqbalance to make
sure the cpus assigned to these pods will not handle IRQs - being added to
the irqbalance ban mask.
This works only for guaranteed pods requesting integral CPUs.

To support this feature, crio needs to recompute the cpu ban mask.
In turn, crio depends on encoding/hex.DecodeString for some of the heavy lifting.
The function expects to deal with even-sized inbound strings (which are trivial
transformations of the irqbalance ban mask).

So crio does left-aligned zero padding of the irq affinity mask at the beginning
of computation.

The left-padded mask is then adjusted excluding any relevant cpu, and then inverted.
Here lies the problem: the padding character should always be "0" (ASCII zero), not
"f", which is obtained inverting the original correct zero character.

The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the
irqbalance config file, and they are never removed.

Fixing the padding or the mask inversion require relatively invasive
code changes, for example to pass around the expected mask length and
act accordingly. While not particularly challenging per se, these change
don't fit smoothly in the current code.

Another option is to clamp the computed masks to the expected length
once the computation and the inversion has been done. We pursue this
approach in this change.

Fixes: cri-o#6145
Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani linked a pull request Aug 11, 2022 that will close this issue
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this issue Aug 11, 2022
Testing the irqbalance handling on tuned restart highlighted
a crio bug when handling odd-numbered cpu affinity masks.
We expect this bug to have little impact on production environments
because it's unlikely they will have a number of CPUs multiple by 4
but not multiple by 8, but still we filed
cri-o/cri-o#6145

In order to have a backportable change, we fix our utility code
to deal with incorrect padding.

Signed-off-by: Francesco Romani <fromani@redhat.com>
ffromani added a commit to ffromani/cri-o that referenced this issue Aug 11, 2022
The high performance hook support the dynamic IRQ balancing thanks some
well known annotation.
Pods can opt out from IRQ handling exposing this annotation. On pod startup,
crio will recompute the irqbalance ban mask and reapply irqbalance to make
sure the cpus assigned to these pods will not handle IRQs - being added to
the irqbalance ban mask.
This works only for guaranteed pods requesting integral CPUs.

To support this feature, crio needs to recompute the cpu ban mask.
In turn, crio depends on encoding/hex.DecodeString for some of the heavy lifting.
The function expects to deal with even-sized inbound strings (which are trivial
transformations of the irqbalance ban mask).

So crio does left-aligned zero padding of the irq affinity mask at the beginning
of computation.

The left-padded mask is then adjusted excluding any relevant cpu, and then inverted.
Here lies the problem: the padding character should always be "0" (ASCII zero), not
"f", which is obtained inverting the original correct zero character.

The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the
irqbalance config file, and they are never removed.

Fixing the padding or the mask inversion require relatively invasive
code changes, for example to pass around the expected mask length and
act accordingly. While not particularly challenging per se, these change
don't fit smoothly in the current code.

Another option is to clamp the computed masks to the expected length
once the computation and the inversion has been done. We pursue this
approach in this change.

Fixes: cri-o#6145
Signed-off-by: Francesco Romani <fromani@redhat.com>
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this issue Aug 11, 2022
Testing the irqbalance handling on tuned restart highlighted
a crio bug when handling odd-numbered cpu affinity masks.
We expect this bug to have little impact on production environments
because it's unlikely they will have a number of CPUs multiple by 4
but not multiple by 8, but still we filed
cri-o/cri-o#6145

In order to have a backportable change, we fix our utility code
to deal with incorrect padding.

Signed-off-by: Francesco Romani <fromani@redhat.com>
openshift-merge-robot pushed a commit to openshift/cluster-node-tuning-operator that referenced this issue Aug 17, 2022
* test: perfprof: utils: make sure to unquote cpus

Make sure to unquote the cpumask output to prevent false negatives

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: utils: robustness fixes

Add testcases and log enhancement emerged during the local testing.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: tuned: disable the irqbalance plugin

The tuned irqbalance plugin clears the irqbalance banned CPUs list
when tuned starts. The list is then managed dynamically by the runtime
handlers.

On node restart, the tuned pod can be started AFTER the workload pods
(kubelet nor kubernetes offers ordering guarantees when recovering the
node state); clearing the banned CPUs list while pods are running and
compromising the IRQ isolation guarantees. Same holds true if the
NTO pod restarts for whatever reason.

To prevent this disruption, we disable the irqbalance plugin entirely.
Another component in the stack must now clear the irqbalance cpu ban
list once per node reboot.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2105123
Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: perfprof: add tests for cpu ban list handling

Add a test to verify that a tuned restart will not clear
the irqbalance cpu ban list, which is the key reason
why we disabled the irqbalance tuned plugin earlier.

Note there is no guarantee that any component in the stack
will reset the irqbalance copu ban list once.

crio inconditionally does a restore depending on a snapshot
taken the first time the server runs, which is likely
but not guaranteed to be correct.

There's no way to declare or check the content of the value
crio would reset.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: functests: utils: fix cpu mask padding

Testing the irqbalance handling on tuned restart highlighted
a crio bug when handling odd-numbered cpu affinity masks.
We expect this bug to have little impact on production environments
because it's unlikely they will have a number of CPUs multiple by 4
but not multiple by 8, but still we filed
cri-o/cri-o#6145

In order to have a backportable change, we fix our utility code
to deal with incorrect padding.

Signed-off-by: Francesco Romani <fromani@redhat.com>

Signed-off-by: Francesco Romani <fromani@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-node-tuning-operator that referenced this issue Aug 18, 2022
Testing the irqbalance handling on tuned restart highlighted
a crio bug when handling odd-numbered cpu affinity masks.
We expect this bug to have little impact on production environments
because it's unlikely they will have a number of CPUs multiple by 4
but not multiple by 8, but still we filed
cri-o/cri-o#6145

In order to have a backportable change, we fix our utility code
to deal with incorrect padding.

Signed-off-by: Francesco Romani <fromani@redhat.com>
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this issue Aug 19, 2022
In commit c5cf0bd we added
a workaround for the incorrect crio mask padding (see:
cri-o/cri-o#6145 )
But the fix implemented here is partial. Address the gap.

Signed-off-by: Francesco Romani <fromani@redhat.com>
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this issue Aug 19, 2022
In commit c5cf0bd we added
a workaround for the incorrect crio mask padding (see:
cri-o/cri-o#6145 )
But the fix implemented here is partial. Address the gap.

Signed-off-by: Francesco Romani <fromani@redhat.com>
ffromani added a commit to ffromani/cluster-node-tuning-operator that referenced this issue Aug 19, 2022
In commit c5cf0bd we added
a workaround for the incorrect crio mask padding (see:
cri-o/cri-o#6145 )
But the fix implemented here is partial. Address the gap.

Signed-off-by: Francesco Romani <fromani@redhat.com>
openshift-merge-robot pushed a commit to openshift/cluster-node-tuning-operator that referenced this issue Aug 19, 2022
* podsecurity: disable OCP label sync

Per last OCP recommendation (internal doc) is not sufficient to
declare the pod security labels, we also need to disable the
label sync, which we do here.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: perfprof: crio mask fix workaround part 2

In commit c5cf0bd we added
a workaround for the incorrect crio mask padding (see:
cri-o/cri-o#6145 )
But the fix implemented here is partial. Address the gap.

Signed-off-by: Francesco Romani <fromani@redhat.com>

Signed-off-by: Francesco Romani <fromani@redhat.com>
openshift-merge-robot pushed a commit to openshift/cluster-node-tuning-operator that referenced this issue Aug 25, 2022
* test: perfprof: utils: make sure to unquote cpus

Make sure to unquote the cpumask output to prevent false negatives

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: utils: robustness fixes

Add testcases and log enhancement emerged during the local testing.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: tuned: disable the irqbalance plugin

The tuned irqbalance plugin clears the irqbalance banned CPUs list
when tuned starts. The list is then managed dynamically by the runtime
handlers.

On node restart, the tuned pod can be started AFTER the workload pods
(kubelet nor kubernetes offers ordering guarantees when recovering the
node state); clearing the banned CPUs list while pods are running and
compromising the IRQ isolation guarantees. Same holds true if the
NTO pod restarts for whatever reason.

To prevent this disruption, we disable the irqbalance plugin entirely.
Another component in the stack must now clear the irqbalance cpu ban
list once per node reboot.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2105123
Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: perfprof: add tests for cpu ban list handling

Add a test to verify that a tuned restart will not clear
the irqbalance cpu ban list, which is the key reason
why we disabled the irqbalance tuned plugin earlier.

Note there is no guarantee that any component in the stack
will reset the irqbalance copu ban list once.

crio inconditionally does a restore depending on a snapshot
taken the first time the server runs, which is likely
but not guaranteed to be correct.

There's no way to declare or check the content of the value
crio would reset.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: functests: utils: fix cpu mask padding

Testing the irqbalance handling on tuned restart highlighted
a crio bug when handling odd-numbered cpu affinity masks.
We expect this bug to have little impact on production environments
because it's unlikely they will have a number of CPUs multiple by 4
but not multiple by 8, but still we filed
cri-o/cri-o#6145

In order to have a backportable change, we fix our utility code
to deal with incorrect padding.

Signed-off-by: Francesco Romani <fromani@redhat.com>

Signed-off-by: Francesco Romani <fromani@redhat.com>
Co-authored-by: Francesco Romani <fromani@redhat.com>
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 11, 2022
marioferh pushed a commit to marioferh/cluster-node-tuning-operator that referenced this issue Sep 12, 2022
In commit c5cf0bd we added
a workaround for the incorrect crio mask padding (see:
cri-o/cri-o#6145 )
But the fix implemented here is partial. Address the gap.

Signed-off-by: Francesco Romani <fromani@redhat.com>
openshift-merge-robot pushed a commit to openshift/cluster-node-tuning-operator that referenced this issue Sep 20, 2022
…ane (#467)

* podsecurity: disable OCP label sync

Per last OCP recommendation (internal doc) is not sufficient to
declare the pod security labels, we also need to disable the
label sync, which we do here.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: perfprof: crio mask fix workaround part 2

In commit c5cf0bd we added
a workaround for the incorrect crio mask padding (see:
cri-o/cri-o#6145 )
But the fix implemented here is partial. Address the gap.

Signed-off-by: Francesco Romani <fromani@redhat.com>

Signed-off-by: Francesco Romani <fromani@redhat.com>
Co-authored-by: Francesco Romani <fromani@redhat.com>
@github-actions
Copy link

Closing this issue since it had no activity in the past 90 days.

@github-actions github-actions bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 11, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2022
@ffromani
Copy link
Contributor Author

/reopen

@openshift-ci openshift-ci bot reopened this Dec 11, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2022

@fromanirh: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions github-actions bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 12, 2022
@ffromani
Copy link
Contributor Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 28, 2023
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2023
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this issue May 23, 2023
* test: perfprof: utils: make sure to unquote cpus

Make sure to unquote the cpumask output to prevent false negatives

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: utils: robustness fixes

Add testcases and log enhancement emerged during the local testing.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: tuned: disable the irqbalance plugin

The tuned irqbalance plugin clears the irqbalance banned CPUs list
when tuned starts. The list is then managed dynamically by the runtime
handlers.

On node restart, the tuned pod can be started AFTER the workload pods
(kubelet nor kubernetes offers ordering guarantees when recovering the
node state); clearing the banned CPUs list while pods are running and
compromising the IRQ isolation guarantees. Same holds true if the
NTO pod restarts for whatever reason.

To prevent this disruption, we disable the irqbalance plugin entirely.
Another component in the stack must now clear the irqbalance cpu ban
list once per node reboot.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2105123
Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: perfprof: add tests for cpu ban list handling

Add a test to verify that a tuned restart will not clear
the irqbalance cpu ban list, which is the key reason
why we disabled the irqbalance tuned plugin earlier.

Note there is no guarantee that any component in the stack
will reset the irqbalance copu ban list once.

crio inconditionally does a restore depending on a snapshot
taken the first time the server runs, which is likely
but not guaranteed to be correct.

There's no way to declare or check the content of the value
crio would reset.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: functests: utils: fix cpu mask padding

Testing the irqbalance handling on tuned restart highlighted
a crio bug when handling odd-numbered cpu affinity masks.
We expect this bug to have little impact on production environments
because it's unlikely they will have a number of CPUs multiple by 4
but not multiple by 8, but still we filed
cri-o/cri-o#6145

In order to have a backportable change, we fix our utility code
to deal with incorrect padding.

Signed-off-by: Francesco Romani <fromani@redhat.com>

Signed-off-by: Francesco Romani <fromani@redhat.com>
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this issue May 23, 2023
* podsecurity: disable OCP label sync

Per last OCP recommendation (internal doc) is not sufficient to
declare the pod security labels, we also need to disable the
label sync, which we do here.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: perfprof: crio mask fix workaround part 2

In commit c5cf0bd we added
a workaround for the incorrect crio mask padding (see:
cri-o/cri-o#6145 )
But the fix implemented here is partial. Address the gap.

Signed-off-by: Francesco Romani <fromani@redhat.com>

Signed-off-by: Francesco Romani <fromani@redhat.com>
@haircommander
Copy link
Member

@ffromani is this a CRI-O issue or one that should live somewhere else?

@haircommander haircommander added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2023
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this issue Jun 13, 2023
* test: perfprof: utils: make sure to unquote cpus

Make sure to unquote the cpumask output to prevent false negatives

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: utils: robustness fixes

Add testcases and log enhancement emerged during the local testing.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: tuned: disable the irqbalance plugin

The tuned irqbalance plugin clears the irqbalance banned CPUs list
when tuned starts. The list is then managed dynamically by the runtime
handlers.

On node restart, the tuned pod can be started AFTER the workload pods
(kubelet nor kubernetes offers ordering guarantees when recovering the
node state); clearing the banned CPUs list while pods are running and
compromising the IRQ isolation guarantees. Same holds true if the
NTO pod restarts for whatever reason.

To prevent this disruption, we disable the irqbalance plugin entirely.
Another component in the stack must now clear the irqbalance cpu ban
list once per node reboot.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2105123
Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: perfprof: add tests for cpu ban list handling

Add a test to verify that a tuned restart will not clear
the irqbalance cpu ban list, which is the key reason
why we disabled the irqbalance tuned plugin earlier.

Note there is no guarantee that any component in the stack
will reset the irqbalance copu ban list once.

crio inconditionally does a restore depending on a snapshot
taken the first time the server runs, which is likely
but not guaranteed to be correct.

There's no way to declare or check the content of the value
crio would reset.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* perfprof: functests: utils: fix cpu mask padding

Testing the irqbalance handling on tuned restart highlighted
a crio bug when handling odd-numbered cpu affinity masks.
We expect this bug to have little impact on production environments
because it's unlikely they will have a number of CPUs multiple by 4
but not multiple by 8, but still we filed
cri-o/cri-o#6145

In order to have a backportable change, we fix our utility code
to deal with incorrect padding.

Signed-off-by: Francesco Romani <fromani@redhat.com>

Signed-off-by: Francesco Romani <fromani@redhat.com>
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this issue Jun 13, 2023
* podsecurity: disable OCP label sync

Per last OCP recommendation (internal doc) is not sufficient to
declare the pod security labels, we also need to disable the
label sync, which we do here.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: perfprof: crio mask fix workaround part 2

In commit c5cf0bd we added
a workaround for the incorrect crio mask padding (see:
cri-o/cri-o#6145 )
But the fix implemented here is partial. Address the gap.

Signed-off-by: Francesco Romani <fromani@redhat.com>

Signed-off-by: Francesco Romani <fromani@redhat.com>
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2023
@github-actions
Copy link

Closing this issue since it had no activity in the past 90 days.

@github-actions github-actions bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 23, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2023
ffromani added a commit to ffromani/cri-o that referenced this issue Oct 13, 2023
The high performance hook support the dynamic IRQ balancing thanks some
well known annotation.
Pods can opt out from IRQ handling exposing this annotation. On pod startup,
crio will recompute the irqbalance ban mask and reapply irqbalance to make
sure the cpus assigned to these pods will not handle IRQs - being added to
the irqbalance ban mask.
This works only for guaranteed pods requesting integral CPUs.

To support this feature, crio needs to recompute the cpu ban mask.
In turn, crio depends on encoding/hex.DecodeString for some of the heavy lifting.
The function expects to deal with even-sized inbound strings (which are trivial
transformations of the irqbalance ban mask).

So crio does left-aligned zero padding of the irq affinity mask at the beginning
of computation.

The left-padded mask is then adjusted excluding any relevant cpu, and then inverted.
Here lies the problem: the padding character should always be "0" (ASCII zero), not
"f", which is obtained inverting the original correct zero character.

The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the
irqbalance config file, and they are never removed.

Fixing the padding or the mask inversion require relatively invasive
code changes, for example to pass around the expected mask length and
act accordingly. While not particularly challenging per se, these change
don't fit smoothly in the current code.

Another option is to clamp the computed masks to the expected length
once the computation and the inversion has been done. We pursue this
approach in this change.

Fixes: cri-o#6145
Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani
Copy link
Contributor Author

ffromani commented Mar 6, 2024

/reopen

the issue, AFAIK, still exists

Copy link
Contributor

openshift-ci bot commented Mar 6, 2024

@ffromani: Reopened this issue.

In response to this:

/reopen

the issue, AFAIK, still exists

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot reopened this Mar 6, 2024
ffromani added a commit to ffromani/cri-o that referenced this issue Mar 6, 2024
The high performance hook support the dynamic IRQ balancing thanks some
well known annotation.
Pods can opt out from IRQ handling exposing this annotation. On pod startup,
crio will recompute the irqbalance ban mask and reapply irqbalance to make
sure the cpus assigned to these pods will not handle IRQs - being added to
the irqbalance ban mask.
This works only for guaranteed pods requesting integral CPUs.

To support this feature, crio needs to recompute the cpu ban mask.
In turn, crio depends on encoding/hex.DecodeString for some of the heavy lifting.
The function expects to deal with even-sized inbound strings (which are trivial
transformations of the irqbalance ban mask).

So crio does left-aligned zero padding of the irq affinity mask at the beginning
of computation.

The left-padded mask is then adjusted excluding any relevant cpu, and then inverted.
Here lies the problem: the padding character should always be "0" (ASCII zero), not
"f", which is obtained inverting the original correct zero character.

The outcome of this behavior is that stray 'f' end up in the IRQ ban list in the
irqbalance config file, and they are never removed.

Fixing the padding or the mask inversion require relatively invasive
code changes, for example to pass around the expected mask length and
act accordingly. While not particularly challenging per se, these change
don't fit smoothly in the current code.

Another option is to clamp the computed masks to the expected length
once the computation and the inversion has been done. We pursue this
approach in this change.

Fixes: cri-o#6145
Signed-off-by: Francesco Romani <fromani@redhat.com>
@github-actions github-actions bot removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 7, 2024
Copy link

github-actions bot commented Apr 6, 2024

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2024
Copy link

github-actions bot commented Jul 6, 2024

Closing this issue since it had no activity in the past 90 days.

@github-actions github-actions bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 6, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2024
@kwilczynski kwilczynski removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 11, 2024
@kwilczynski kwilczynski reopened this Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants