Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/1.2 backport] backport exec fixes #3755

Conversation

thaJeztah
Copy link
Member

Backports of:

First two cherry-picks were clean:

# https://github.com/containerd/containerd/pull/3361 Add timeout for I/O waitgroups
git cherry-pick -s -S -x 245052243d23c8de21fcc95bbf47fb1dbc731ab4

# https://github.com/containerd/containerd/pull/3366 Robust pid locking for shim processes
git cherry-pick -s -S -x 719a2c594e4aad6a2de5cd9c298ab95309c2135c

# https://github.com/containerd/containerd/pull/3711 Use cached state instead of `runc state`
git cherry-pick -s -S -x 18be6e37140e778dffd91804dab2bc66ba54493f

Last cherry-pick didn't apply clean:

both modified:   runtime/v1/linux/proc/init.go

This was because #3085 (Shim pluggable logging) and #3374 (Refactor runtime package for code usage) are not in the 1.2 branch,
and respectively changed runc.IO -> *processIO, and proc.Platform->stdio.Platform`:

diff --cc runtime/v1/linux/proc/init.go
index 4a87b2455,539f9c24a..000000000
--- a/runtime/v1/linux/proc/init.go
+++ b/runtime/v1/linux/proc/init.go
@@@ -59,12 -56,14 +59,23 @@@ type Init struct 
  
        WorkDir string
  
++<<<<<<< HEAD:runtime/v1/linux/proc/init.go
 +      id           string
 +      Bundle       string
 +      console      console.Console
 +      Platform     proc.Platform
 +      io           runc.IO
 +      runtime      *runc.Runc
++=======
+       id       string
+       Bundle   string
+       console  console.Console
+       Platform stdio.Platform
+       io       *processIO
+       runtime  *runc.Runc
+       // pausing preserves the pausing state.
+       pausing      *atomicBool
++>>>>>>> 18be6e371... Use cached state instead of `runc state`.:pkg/process/init.go
        status       int
        exited       time.Time
        pid          safePid

Closes containerd#3286

This and a combination of a couple Docker changes are needed to fully
resolve the issue on the Docker side.  However, this ensures that after
processes exit, we still leave some time for the I/O to fully flush
before closing.  Without this timeout, the delete methods would block
forever.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
(cherry picked from commit 2450522)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Closes containerd#2832

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
(cherry picked from commit 719a2c5)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah changed the title [release/1.2 backport] backport exec fixes [wip][release/1.2 backport] backport exec fixes Oct 14, 2019
@thaJeztah
Copy link
Member Author

Temporarily marked as "WIP", because the first two backports were not marked for backporting, so I want to be sure they look reasonable for backporting to this branch.

I'll remove "WIP" if they are ok, and non-risky to backport

ping @Random-Liu @crosbymichael @estesp PTAL

@thaJeztah thaJeztah changed the title [wip][release/1.2 backport] backport exec fixes [WIP][release/1.2 backport] backport exec fixes Oct 14, 2019
@thaJeztah
Copy link
Member Author

Keeping as "WIP" per a discussion on Slack with Michael Crosby;

we were not going to cherry pick that change just yet, we wanted more testing in master before the runc state changes are added to a release

Signed-off-by: Lantao Liu <lantaol@google.com>
(cherry picked from commit 18be6e3)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah force-pushed the 1.2_backport_avoid_unnecessary_runc_state branch from 5189864 to 0877136 Compare October 14, 2019 18:01
@Random-Liu
Copy link
Member

we were not going to cherry pick that change just yet, we wanted more testing in master before the runc state changes are added to a release

We don't see obvious test failures caused by that change in HEAD. It seems safe to cherrypick.

@thaJeztah thaJeztah changed the title [WIP][release/1.2 backport] backport exec fixes [release/1.2 backport] backport exec fixes Oct 24, 2019
@thaJeztah
Copy link
Member Author

removed "WIP"

@crosbymichael @dmcgowan PTAL

@Random-Liu
Copy link
Member

LGTM

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-io
Copy link

Codecov Report

Merging #3755 into release/1.2 will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           release/1.2    #3755   +/-   ##
============================================
  Coverage        44.19%   44.19%           
============================================
  Files              100      100           
  Lines            10847    10847           
============================================
  Hits              4794     4794           
  Misses            5313     5313           
  Partials           740      740
Flag Coverage Δ
#linux 47.87% <ø> (ø) ⬆️
#windows 41% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0d7ef6...0877136. Read the comment docs.

@thaJeztah
Copy link
Member Author

Don't merge yet; discussing with @crosbymichael, and he's still on the fence on backporting this one in a patch release

@sorenmogensen
Copy link

Assuming this does not get backported to 1.2 - when is a rough estimate for when a containerd with the fix will get released? (We are within last 2 hardening sprints before a product release, and had hoped - maybe naively - that we could get this fix in)

@thaJeztah
Copy link
Member Author

I'm not a maintainer, but from the quick chats I had with @crosbymichael, the main concern was that there may be risk involved in this patch (or at least, it should be battle tested). Which brings us in a bit of a chicken & egg situation; to get the patch widely tested, ideally there would be packages for people to test (beyond the automated tests in this repository and the kubernetes test suites).

There's been a proposal (and implementation) to have "nightly" builds of containerd (see #3702, and https://github.com/kind-ci/containerd-nightlies/releases), which would have this patch, but the master branch has progressed quite a bit since 1.2.x, so (without having looked closely how much this particular codepath has diverged) no guarantees that it works equally well on the 1.2 codebase.

I'm open to suggestions on how to move this forward; perhaps a runtime options (env var?) to enable/disable this could be considered, but that's just thinking out loud 😂

@crosbymichael
Copy link
Member

LGTM

@crosbymichael crosbymichael merged commit 79f4c65 into containerd:release/1.2 Nov 28, 2019
@thaJeztah thaJeztah deleted the 1.2_backport_avoid_unnecessary_runc_state branch November 28, 2019 08:49
@sorenmogensen
Copy link

Just wanted to update you that this (git build after merge) did indeed fix the high CPU issue we were experiencing.

dmcgowan pushed a commit to thaJeztah/containerd that referenced this pull request Feb 4, 2020
* Update the runc vendor to v1.0.0-rc10 which includes a mitigation for [CVE-2019-19921](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-19921).
* Update the opencontainers/selinux which includes a mitigation for [CVE-2019-16884](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16884).
* Update Golang runtime to 1.12.16, mitigating the [CVE-2020-0601](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-0601) certificate verification bypass on Windows, and [CVE-2020-7919](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-7919), which only affects 32-bit architectures.
* Update Golang runtime to 1.12.15, which includes a fix to the runtime (Go 1.12.14, Go 1.12.15) and and the `net/http` package (Go 1.12.15)
* A fix to prevent `SIGSEGV` when starting containerd-shim [containerd#3960](containerd#3960)
* Fixes to `exec` [containerd#3755](containerd#3755)
    - Prevent `docker exec` hanging if an earlier `docker exec` left a zombie process
    - Prevent High system load/CPU utilization with liveness and readiness probes
    - Prevent Docker healthcheck causing high CPU utilization

* CRI fixes:
    - Update the `gopkg.in/yaml.v2` vendor to v2.2.8 with a mitigation for [CVE-2019-11253](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-11253)

* API
    - Fix API filters to properly handle and return parse errors [containerd#3950](containerd#3950)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/docker that referenced this pull request Feb 4, 2020
full diff: containerd/containerd@v1.2.11...v1.2.12

Welcome to the v1.2.12 release of containerd!

The twelfth patch release for containerd 1.2 includes an updated runc with
a fix for CVE-2019-19921, an updated version of the opencontainers/selinux
dependency, which includes a fix for CVE-2019-16884, an updated version of the
gopkg.in/yaml.v2 dependency to address CVE-2019-11253, and a Golang update.

Notable Updates

- Update the runc vendor to v1.0.0-rc10 which includes a mitigation for CVE-2019-19921.
- Update the opencontainers/selinux which includes a mitigation for CVE-2019-16884.
- Update Golang runtime to 1.12.16, mitigating the CVE-2020-0601 certificate verification
  bypass on Windows, and CVE-2020-7919, which only affects 32-bit architectures.
- Update Golang runtime to 1.12.15, which includes a fix to the runtime (Go 1.12.14,
  Go 1.12.15) and and the net/http package (Go 1.12.15)
- A fix to prevent SIGSEGV when starting containerd-shim containerd/containerd#3960
- Fixes to exec containerd/containerd#3755
    - Prevent docker exec hanging if an earlier docker exec left a zombie process
    - Prevent High system load/CPU utilization with liveness and readiness probes
    - Prevent Docker healthcheck causing high CPU utilization

CRI fixes:

- Update the gopkg.in/yaml.v2 vendor to v2.2.8 with a mitigation for CVE-2019-11253

API

- Fix API filters to properly handle and return parse errors containerd/containerd#3950

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Feb 5, 2020
full diff: containerd/containerd@v1.2.11...v1.2.12

Welcome to the v1.2.12 release of containerd!

The twelfth patch release for containerd 1.2 includes an updated runc with
a fix for CVE-2019-19921, an updated version of the opencontainers/selinux
dependency, which includes a fix for CVE-2019-16884, an updated version of the
gopkg.in/yaml.v2 dependency to address CVE-2019-11253, and a Golang update.

Notable Updates

- Update the runc vendor to v1.0.0-rc10 which includes a mitigation for CVE-2019-19921.
- Update the opencontainers/selinux which includes a mitigation for CVE-2019-16884.
- Update Golang runtime to 1.12.16, mitigating the CVE-2020-0601 certificate verification
  bypass on Windows, and CVE-2020-7919, which only affects 32-bit architectures.
- Update Golang runtime to 1.12.15, which includes a fix to the runtime (Go 1.12.14,
  Go 1.12.15) and and the net/http package (Go 1.12.15)
- A fix to prevent SIGSEGV when starting containerd-shim containerd/containerd#3960
- Fixes to exec containerd/containerd#3755
    - Prevent docker exec hanging if an earlier docker exec left a zombie process
    - Prevent High system load/CPU utilization with liveness and readiness probes
    - Prevent Docker healthcheck causing high CPU utilization

CRI fixes:

- Update the gopkg.in/yaml.v2 vendor to v2.2.8 with a mitigation for CVE-2019-11253

API

- Fix API filters to properly handle and return parse errors containerd/containerd#3950

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: f8cfa7947cd0a2750bd0b4ebf616044a98a07a24
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants