Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/1.6] test: introduce failpoint control to runc-shimv2 and cni #7455

Merged
merged 9 commits into from
Sep 30, 2022

Conversation

qiutongs
Copy link
Contributor

@qiutongs qiutongs commented Sep 30, 2022

Backport #7069 to 1.6 branch. This is a prerequisite to backport #5904.

Testing

$ CONTAINERD_RUNTIME=runc FOCUS=TestRunPodSandboxWithShimStartFailure make cri-integration
PASS

Failpoint is used to control the fail during API call when testing, especially
the API is complicated like CRI-RunPodSandbox. It can help us to test
the unexpected behavior without mock. The control design is based on freebsd
fail(9), but simpler.

REF: https://www.freebsd.org/cgi/man.cgi?query=fail&sektion=9&apropos=0&manpath=FreeBSD%2B10.0-RELEASE

Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit ffd59ba)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
Currently, the runc shimv2 commandline manager doesn't support ttrpc
server's customized option, for example, the ttrpc server interceptor.
This commit is to allow the task plugin can return the
`UnaryServerInterceptor` option to the manager so that the task plugin
can do enhancement before handling the incoming request, like API-level
failpoint control.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit 822cc51)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
@k8s-ci-robot
Copy link

Hi @qiutongs. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@qiutongs
Copy link
Contributor Author

/cc @fuweid @samuelkarp

@qiutongs
Copy link
Contributor Author

qiutongs commented Sep 30, 2022

github.com/containerd/containerd/api/runtime/task/v2: module github.com/containerd/containerd/api@latest found (v1.6.0-beta.3), but does not contain package github.com/containerd/containerd/api/runtime/task/v2

The failure was due to #6827

@qiutongs qiutongs force-pushed the backport-failpoint-1.6 branch 2 times, most recently from ca2e110 to 1db5e31 Compare September 30, 2022 05:40
@fuweid fuweid changed the title Backport failpoint control to 1.6 [release/1.6] test: introduce failpoint control to runc-shimv2 and cni Sep 30, 2022
Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

To me, it belongs to test framework and it is good to test the new features or bugfix from main branch. Agree to do backport.

@samuelkarp
Copy link
Member

Would you mind squashing "Use the old path of runtime v2 task, prior to PR 6827" with "bin/ctr,integration: new runc-shim with failpoint"? It's better for each commit to be buildable so things like git bisect can work properly on the codebase.

Added new runc shim binary in integration testing.

The shim is named by io.containerd.runc-fp.v1, which allows us to use
additional OCI annotation `io.containerd.runtime.v2.shim.failpoint.*` to
setup shim task API's failpoint. Since the shim can be shared with
multiple container, like what kubernetes pod does, the failpoint will be
initialized during setup the shim server. So, the following the
container's OCI failpoint's annotation will not work.

This commit also updates the ctr tool that we can use `--annotation` to
specify annotations when run container. For example:

```bash
➜  ctr run -d --runtime runc-fp.v1 \
     --annotation "io.containerd.runtime.v2.shim.failpoint.Kill=1*error(sorry)" \
     docker.io/library/alpine:latest testing sleep 1d

➜  ctr t ls
TASK       PID       STATUS
testing    147304    RUNNING

➜  ctr t kill -s SIGKILL testing
ctr: sorry: unknown

➜  ctr t kill -s SIGKILL testing

➜  sudo ctr t ls
TASK       PID       STATUS
testing    147304    STOPPED
```

The runc-fp.v1 shim is based on core runc.v2. We can use it to inject
failpoint during testing complicated or big transcation API, like
kubernetes PodRunPodsandbox.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit 5f9b318)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
If there is any unskipable error during setuping shim plugins, we should
fail return error to prevent from leaky shim instance. For example,
there is error during init task plugin, the shim ttrpc server will not
contain any shim API method. The any call to the shim will receive that

  failed to create shim task: service containerd.task.v2.Task: not implemented

Then containerd can't use `Shutdown` to let the shim close. The shim
will be leaky. And also fail return if there is no ttrpc service.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit b297775)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit 1ae6e8b)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
Introduce cni-bridge-fp as CNI bridge plugin wrapper binary for CRI
testing.

With CNI `io.kubernetes.cri.pod-annotations` capability enabled, the user
can inject the failpoint setting by pod's annotation
`cniFailpointControlStateDir`, which stores each pod's failpoint setting
named by `${K8S_POD_NAMESPACE}-${K8S_POD_NAME}.json`.

When the plugin is invoked, the plugin will check the CNI_ARGS to get
the failpoint for the CNI_COMMAND from disk. For the testing, the user
can prepare setting before RunPodSandbox.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit be91a21)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit 3c5e80b)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit cbebeb9)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
* Use delegated plugin call to simplify cni-bridge-cni
* Add README.md for cni-bridge-cni

Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit e6a2c07)
Signed-off-by: Qiutong Song <songqt01@gmail.com>
@qiutongs
Copy link
Contributor Author

Would you mind squashing "Use the old path of runtime v2 task, prior to PR 6827" with "bin/ctr,integration: new runc-shim with failpoint"? It's better for each commit to be buildable so things like git bisect can work properly on the codebase.

Done

@samuelkarp
Copy link
Member

/ok-to-test

@samuelkarp samuelkarp merged commit 6338ef8 into containerd:release/1.6 Sep 30, 2022
@qiutongs qiutongs deleted the backport-failpoint-1.6 branch September 30, 2022 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants