Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci: keep track of exec PIDs and stop them on container stop #7937

Merged
merged 1 commit into from
Apr 24, 2024

Conversation

haircommander
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Keep track of exec calls for a container, and make sure to kill them when a container is being stopped

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 26, 2024
Copy link
Contributor

openshift-ci bot commented Mar 26, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 26, 2024
Copy link

codecov bot commented Mar 26, 2024

Codecov Report

Merging #7937 (fa7e972) into main (01b30dd) will decrease coverage by 0.06%.
Report is 2 commits behind head on main.
The diff coverage is 18.18%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7937      +/-   ##
==========================================
- Coverage   48.93%   48.88%   -0.06%     
==========================================
  Files         152      152              
  Lines       16452    16485      +33     
==========================================
+ Hits         8051     8058       +7     
- Misses       7425     7450      +25     
- Partials      976      977       +1     

@haircommander haircommander force-pushed the stop-exec-pid branch 2 times, most recently from 36bf1b4 to 67638d4 Compare March 27, 2024 19:11
Copy link
Collaborator

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was thinking cri-o maintains a list of running execs per container somewhere, at least those with a terminals, as it has to read from those terminals (doing that in a separate goroutine I guess). Can we reuse that list?

(and yes, we don't have to kill all execs, just those with a terminal, although that's not important)

@haircommander
Copy link
Member Author

Hmm, I was thinking cri-o maintains a list of running execs per container somewhere, at least those with a terminals, as it has to read from those terminals (doing that in a separate goroutine I guess). Can we reuse that list?

(and yes, we don't have to kill all execs, just those with a terminal, although that's not important)

yeah right it's in a separate goroutine, and even the context diverges between the different CRI calls, so wiring them together would be tricky.

test/ctr.bats Outdated Show resolved Hide resolved
@haircommander haircommander force-pushed the stop-exec-pid branch 2 times, most recently from 0932325 to ee8e9d2 Compare April 19, 2024 03:22
@haircommander haircommander marked this pull request as ready for review April 19, 2024 03:23
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2024
@haircommander
Copy link
Member Author

@cri-o/cri-o-maintainers this is ready, PTAL

@haircommander
Copy link
Member Author

/retest

1 similar comment
@haircommander
Copy link
Member Author

/retest

@kwilczynski
Copy link
Member

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 24, 2024
Copy link
Contributor

openshift-ci bot commented Apr 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, kwilczynski

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 2f72dc6 into cri-o:main Apr 24, 2024
60 of 63 checks passed
@haircommander
Copy link
Member Author

/cherry-pick release-1.29

@openshift-cherrypick-robot

@haircommander: new pull request created: #8072

In response to this:

/cherry-pick release-1.29

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kwilczynski
Copy link
Member

kwilczynski commented Apr 30, 2024

@haircommander, anything against cherry-picks to releases from 1.28 to 1.25? I will be happy to do it manually, if needed.

@haircommander
Copy link
Member Author

nothing at all, sounds good!

@kwilczynski
Copy link
Member

/cherry-pick release-1.28
/cherry-pick release-1.27
/cherry-pick release-1.26
/cherry-pick release-1.25

@openshift-cherrypick-robot

@kwilczynski: #7937 failed to apply on top of branch "release-1.28":

Applying: oci: keep track of exec PIDs and stop them on container stop
Using index info to reconstruct a base tree...
M	internal/oci/container.go
M	internal/oci/runtime_oci.go
M	test/ctr.bats
Falling back to patching base and 3-way merge...
Auto-merging test/ctr.bats
Auto-merging internal/oci/runtime_oci.go
CONFLICT (content): Merge conflict in internal/oci/runtime_oci.go
Auto-merging internal/oci/container.go
CONFLICT (content): Merge conflict in internal/oci/container.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 oci: keep track of exec PIDs and stop them on container stop
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.28
/cherry-pick release-1.27
/cherry-pick release-1.26
/cherry-pick release-1.25

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kwilczynski
Copy link
Member

OK. Needs a manual cherry-pick. No worries.

saschagrunert added a commit to saschagrunert/cri-o that referenced this pull request May 10, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on cri-o#7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
saschagrunert added a commit to saschagrunert/cri-o that referenced this pull request May 10, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on cri-o#7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cri-o that referenced this pull request May 15, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on cri-o#7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cri-o that referenced this pull request May 15, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on cri-o#7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cri-o that referenced this pull request May 15, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on cri-o#7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cri-o that referenced this pull request May 15, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on cri-o#7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cri-o that referenced this pull request May 15, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on cri-o#7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cri-o that referenced this pull request May 15, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on cri-o#7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
github-actions bot pushed a commit that referenced this pull request Jun 1, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on #7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
github-actions bot pushed a commit that referenced this pull request Jun 1, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on #7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
github-actions bot pushed a commit that referenced this pull request Jun 1, 2024
Before applying this patch we killed the exec PIDs right away on
container stop which leads into the failing e2e test:

```
[sig-node] [NodeFeature:SidecarContainers] Containers Lifecycle should terminate sidecars simultaneously if prestop doesn't exit
```

This regression is now fixed by killing the exec PIDs after the main
container as well as in the same thread.

Fixes kubernetes/kubernetes#124743
Follow-up on #7937

Needs a cherry-pick since the enhancement got already backported into
supported release branches.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants