-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/1.6] test: introduce failpoint control to runc-shimv2 and cni #7455
[release/1.6] test: introduce failpoint control to runc-shimv2 and cni #7455
Conversation
Failpoint is used to control the fail during API call when testing, especially the API is complicated like CRI-RunPodSandbox. It can help us to test the unexpected behavior without mock. The control design is based on freebsd fail(9), but simpler. REF: https://www.freebsd.org/cgi/man.cgi?query=fail&sektion=9&apropos=0&manpath=FreeBSD%2B10.0-RELEASE Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit ffd59ba) Signed-off-by: Qiutong Song <songqt01@gmail.com>
Currently, the runc shimv2 commandline manager doesn't support ttrpc server's customized option, for example, the ttrpc server interceptor. This commit is to allow the task plugin can return the `UnaryServerInterceptor` option to the manager so that the task plugin can do enhancement before handling the incoming request, like API-level failpoint control. Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 822cc51) Signed-off-by: Qiutong Song <songqt01@gmail.com>
Hi @qiutongs. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @fuweid @samuelkarp |
The failure was due to #6827 |
ca2e110
to
1db5e31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
To me, it belongs to test framework and it is good to test the new features or bugfix from main branch. Agree to do backport.
Would you mind squashing "Use the old path of runtime v2 task, prior to PR 6827" with "bin/ctr,integration: new runc-shim with failpoint"? It's better for each commit to be buildable so things like |
Added new runc shim binary in integration testing. The shim is named by io.containerd.runc-fp.v1, which allows us to use additional OCI annotation `io.containerd.runtime.v2.shim.failpoint.*` to setup shim task API's failpoint. Since the shim can be shared with multiple container, like what kubernetes pod does, the failpoint will be initialized during setup the shim server. So, the following the container's OCI failpoint's annotation will not work. This commit also updates the ctr tool that we can use `--annotation` to specify annotations when run container. For example: ```bash ➜ ctr run -d --runtime runc-fp.v1 \ --annotation "io.containerd.runtime.v2.shim.failpoint.Kill=1*error(sorry)" \ docker.io/library/alpine:latest testing sleep 1d ➜ ctr t ls TASK PID STATUS testing 147304 RUNNING ➜ ctr t kill -s SIGKILL testing ctr: sorry: unknown ➜ ctr t kill -s SIGKILL testing ➜ sudo ctr t ls TASK PID STATUS testing 147304 STOPPED ``` The runc-fp.v1 shim is based on core runc.v2. We can use it to inject failpoint during testing complicated or big transcation API, like kubernetes PodRunPodsandbox. Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 5f9b318) Signed-off-by: Qiutong Song <songqt01@gmail.com>
If there is any unskipable error during setuping shim plugins, we should fail return error to prevent from leaky shim instance. For example, there is error during init task plugin, the shim ttrpc server will not contain any shim API method. The any call to the shim will receive that failed to create shim task: service containerd.task.v2.Task: not implemented Then containerd can't use `Shutdown` to let the shim close. The shim will be leaky. And also fail return if there is no ttrpc service. Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit b297775) Signed-off-by: Qiutong Song <songqt01@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 1ae6e8b) Signed-off-by: Qiutong Song <songqt01@gmail.com>
Introduce cni-bridge-fp as CNI bridge plugin wrapper binary for CRI testing. With CNI `io.kubernetes.cri.pod-annotations` capability enabled, the user can inject the failpoint setting by pod's annotation `cniFailpointControlStateDir`, which stores each pod's failpoint setting named by `${K8S_POD_NAMESPACE}-${K8S_POD_NAME}.json`. When the plugin is invoked, the plugin will check the CNI_ARGS to get the failpoint for the CNI_COMMAND from disk. For the testing, the user can prepare setting before RunPodSandbox. Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit be91a21) Signed-off-by: Qiutong Song <songqt01@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 3c5e80b) Signed-off-by: Qiutong Song <songqt01@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit cbebeb9) Signed-off-by: Qiutong Song <songqt01@gmail.com>
* Use delegated plugin call to simplify cni-bridge-cni * Add README.md for cni-bridge-cni Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit e6a2c07) Signed-off-by: Qiutong Song <songqt01@gmail.com>
1db5e31
to
a85709c
Compare
Done |
/ok-to-test |
Backport #7069 to 1.6 branch. This is a prerequisite to backport #5904.
Testing