Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exec cgroup #5837

Merged
merged 3 commits into from May 10, 2022
Merged

Exec cgroup #5837

merged 3 commits into from May 10, 2022

Conversation

donpenney
Copy link
Contributor

@donpenney donpenney commented May 2, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, exec sync containers are launched under the crio cgroup. For applications with a large number of exec probes, this can impact cpu usage for reserved platform cores, in addition to affecting accounting. To address this, this PR introduces an opt-in exec_cgroup option (defaults to false) that moves the exec probe process to the pod's cgroup. This ensures cpu usage and resource accounting are restricted to the pod's allowances.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add `monitor_exec_cgroup` to the configuration's runtime handler struct. This allows an admin to specify which cgroup the monitor for exec sync requests runs in (defaults to that of CRI-O).

@openshift-ci openshift-ci bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. labels May 2, 2022
@openshift-ci openshift-ci bot requested review from klihub and QiWang19 May 2, 2022 15:59
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 2, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 2, 2022

Hi @donpenney. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@donpenney
Copy link
Contributor Author

/cc @haircommander

@openshift-ci openshift-ci bot requested a review from haircommander May 2, 2022 16:00
@@ -427,6 +435,32 @@ func (r *runtimeOCI) ExecContainer(ctx context.Context, c *Container, cmd []stri
return cmdErr
}

// moveExecToCgroup moves Exec command pid to the pod group
func moveExecToCgroup(containerPid, commandPid int) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: is there any way we can merge this funcitonality with the CreateContainer variant?

2: I think we want this to be in internal/config/cgmgr so we can toggle based on cgroup type and version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this. Thanks for the suggestion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the function to cgmgr.go. I don't see a way to merge this with CreateContainer, though

@haircommander
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 2, 2022
@haircommander haircommander added this to the 1.24 milestone May 2, 2022
@donpenney
Copy link
Contributor Author

/retest

@donpenney
Copy link
Contributor Author

Rebased to pick up newly merged PR #5823

@@ -521,6 +544,42 @@ func (r *runtimeOCI) ExecSyncContainer(ctx context.Context, c *Container, comman

// We don't need childPipe on the parent side
childPipe.Close()
childStartPipe.Close()

if r.handler.MonitorExecCgroup && r.config.InfraCtrCPUSet != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to defer the container process kill if it errors here (with a closure like in CreateContainer), otherwise we'll leak a process because conmon will be waiting forever on the pipe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the comment... I've updated the code to do the same kill as CreateContainer

@haircommander
Copy link
Member

I would love to dedup this code in the future if possible, but I would not consider that blocking it in the first place

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: donpenney, haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2022
@haircommander
Copy link
Member

@saschagrunert PTAL

@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels May 9, 2022
@haircommander
Copy link
Member

/hold

actually, I had a thought that this should be a string instead of a bool. If empty, default behavior, if value container it goes to the container cgroup. Otherwise we fail. I can help put together a patch if you'd like @donpenney

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 9, 2022
Signed-off-by: Don Penney <dpenney@redhat.com>
Added runtime handler config option, monitor_exec_cgroup, to enable
moving the exec probe process to the container's cgroup. Users can
enable the feature by setting monitor_exec_cgroup to "container".

Signed-off-by: Don Penney <dpenney@redhat.com>
Updated ExecSyncContainer to move the exec probe process to the
container's cgroup, if the runtime handler monitor_exec_cgroup config
option is set to "container". This ensures cpu usage and resource
accounting are restricted to the container's allowances.

Co-Authored-By: Brent Rowsell <browsell@redhat.com>
Signed-off-by: Don Penney <dpenney@redhat.com>
@donpenney
Copy link
Contributor Author

Done. If monitor_exec_cgroup is set to a string other than "container", you'll see an error log like the following:

I0509 16:46:55.212517 3089960 prober.go:121] "Probe failed" probeType="Readiness" pod="xxxxxx" podUID=c6ba8458475c9dd26941a04586b8e318 containerName="xxxxxx" probeResult=failure output="Unsupported monitor_exec_cgroup value: not-container"

@donpenney
Copy link
Contributor Author

/retest

@haircommander
Copy link
Member

/lgtm

thanks for all of your work @donpenney

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2022
@haircommander
Copy link
Member

/hold cancel
/retest

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 9, 2022
@haircommander
Copy link
Member

/retest

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@haircommander
Copy link
Member

/override ci/openshift-jenkins/integration_crun

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 9, 2022

@haircommander: Overrode contexts on behalf of haircommander: ci/openshift-jenkins/integration_crun

In response to this:

/override ci/openshift-jenkins/integration_crun

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 0ba47c9 into cri-o:main May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants