Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve the irqbalance restore flow #6388

Merged
merged 6 commits into from
Feb 28, 2023

Conversation

jlojosnegros
Copy link
Contributor

@jlojosnegros jlojosnegros commented Nov 22, 2022

What type of PR is this?

/kind cleanup
/kind feature

note: this PR is substitute #6105 PR from @fromanirh due to workload issues

What this PR does / why we need it:

Since #4441 , to support the dynamic irqbalance configuration in turn needed by the high
performance hooks, CRI-O runs the restore code at startup.

In some configuration scenarios it's preferred to have more control over this step. Some configuration wants
to have more control over the mask to be restored, or to disable the restore flow entirely.

For this reason, we add an option to provide a user-supplied restore file. Users wishing to exert more control can provide
their data, or they can disable the flow entirely setting the path to the restore file to empty string ("").

The new option defaults are meant to ensure full backward compatibility.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add configuration option to control the irqbalance configuration restore process for high performance hooks dynamic IRQ pinning.

@openshift-ci openshift-ci bot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/feature Categorizes issue or PR as related to a new feature. labels Nov 22, 2022
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 22, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 22, 2022

Hi @jlojosnegros. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking over!

test/irqbalance.bats Outdated Show resolved Hide resolved
test/irqbalance.bats Outdated Show resolved Hide resolved
@ffromani
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 22, 2022
@ffromani
Copy link
Contributor

/retest
let's rule out glitches

@ffromani
Copy link
Contributor

possibly relevant: #6392

@openshift-ci openshift-ci bot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 24, 2022
@openshift-ci openshift-ci bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Nov 24, 2022
@jlojosnegros jlojosnegros force-pushed the irqbalance-restore branch 5 times, most recently from d722aa1 to ab817fd Compare November 28, 2022 09:43
@openshift-ci openshift-ci bot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 30, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2023
@jlojosnegros
Copy link
Contributor Author

/assign @haircommander

@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Merging #6388 (1a488f5) into main (2b11872) will decrease coverage by 0.01%.
The diff coverage is 70.68%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6388      +/-   ##
==========================================
- Coverage   44.55%   44.55%   -0.01%     
==========================================
  Files         128      127       -1     
  Lines       14888    14919      +31     
==========================================
+ Hits         6634     6647      +13     
- Misses       7458     7477      +19     
+ Partials      796      795       -1     

@@ -124,6 +124,8 @@ const (
const (
// DefaultIrqBalanceConfigFile default irqbalance service configuration file path
DefaultIrqBalanceConfigFile = "/etc/sysconfig/irqbalance"
// DefaultIrqBalanceConfigRestoreFile contains the banned cpu mask configuration to restore. Name due to backward compatibility.
DefaultIrqBalanceConfigRestoreFile = "/etc/sysconfig/orig_irq_banned_cpus"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this config option would mean we remove the hard-coded one in internal/runtimehandlerhooks/high_performance_hooks.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, given the name of the file, I wouldn't expect it to be a restore file. Is this file's name/format documented anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR provides an alternative param for the user to have more control over the restore process while keeping the already working way to do it, so it will keep the exact same behaviour if the user do not give the new option.
At least that is how I have understood the change but I could be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. We keep the filename originally added back in time for full backward compatibility.

docs/crio.8.md Outdated Show resolved Hide resolved
Use `internal/log`, not logrus directly in the rest
of the high performance hooks code. To enable this,
change the functions signatures to accept context.Context,
which is a good idea in general.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Use ExpectWithOffset in the common hjelper to get a more
immediate understanding of where a test failed.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Add logs in the irqbalance restore flow to improve
the debuggability. This code runs once at startup, so
the extra weight in the logs should be negligible.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2023
@jlojosnegros
Copy link
Contributor Author

rebase and minor typo ...

@ffromani
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2023
@haircommander
Copy link
Member

/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-aws-ovn
LGTM

@saschagrunert PTAL for final approval

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 22, 2023

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-aws-ovn, ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-aws-ovn
LGTM

@saschagrunert PTAL for final approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ffromani and others added 3 commits February 23, 2023 09:55
Since cri-o#4441 , to support
the dynamic irqbalance configuration in turn needed by the high
performance hooks, CRI-O runs the restore code at startup.

In some configuration scenarios it's preferred to have
more control over this step. Some configuration wants
to have more control over the mask to be restored, or to
disable the restore flow entirely.

For this reason, we add an option to provide a user-supplied
restore file. Users wishing to exert more control can provide
their data, or they can disable the flow entirely setting
the path to the restore file "disable".

The new option defaults are meant to ensure full backward
compatibility.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Add integration tests for the irqbalance CPU ban list
save/restore feature.

Signed-off-by: Francesco Romani <fromani@redhat.com>
irqbalance tests relay on some file contents to work. If tests are run
in parallel test will not find the expected content in the files and
will fail.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 23, 2023
@jlojosnegros
Copy link
Contributor Author

Minor changes to solve doc and completion validations failures.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2023

@jlojosnegros: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ci-rhel-integration 7ebbbe0 link true /test ci-rhel-integration

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ffromani
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 23, 2023
@haircommander
Copy link
Member

/override ci/prow/e2e-aws-ovn
/override ci/prow/e2e-gcp-ovn
/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2023

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-aws-ovn, ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-aws-ovn
/override ci/prow/e2e-gcp-ovn
/approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, jlojosnegros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2023
@openshift-merge-robot openshift-merge-robot merged commit a3efa73 into cri-o:main Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants