Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8337: add test cases for guest pull images #1

Merged
merged 1 commit into from
Dec 18, 2023

Conversation

liudalibj
Copy link

  • add test cases for guest pull images
  • need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

@liudalibj
Copy link
Author

liudalibj commented Dec 13, 2023

This pr only include the test cases for guest pull with nydus-snapshotter
I tested those cases with my kcli cluster with nydus-snapshotter be installed and config by hand and the runtime class is "kata-qemu".
A good result looks like:

root@liudali-x86-libvirt:~/kata-containers/tests/integration/kubernetes# export KATA_HYPERVISOR=qemu
root@liudali-x86-libvirt:~/kata-containers/tests/integration/kubernetes# export PULL_TYPE=guest-pull
root@liudali-x86-libvirt:~/kata-containers/tests/integration/kubernetes# bats k8s-guest-pull-image.bats
 ✓ Test we can pull an unencrypted image (busybox) outside the guest with runc and then inside the guest successfully
 ✓ Test we can pull an unencrypted image (busybox) inside the guest twice in a row and then outside the guest successfully
 ✓ Test we can pull an other unencrypted image (alpine) outside the guest and then inside the guest successfully

3 tests, 0 failures

@liudalibj liudalibj force-pushed the guest-pull branch 6 times, most recently from 9494b5b to af3ee22 Compare December 13, 2023 15:32
Copy link

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey DaLi, I've made a few comments for possible ways to make the code a bit easier to read, which @wainersm might also have input on.

The other thing that we might want to consider is the reverse case of issue 8337 (unless you have it and I've missed it), where we pull without a cri-runtime handler annotation, then pull with it and verify that it's pulled on the guest properly. You seem to have the reverse, pulled with annotation, on guest and then without and (TODO) check that it isn't pulled on the guest, but maybe both ways would be useful unless you think the runc case covers that?

I also think that for test like this one that we know will fail it's fine to write them as they should work, but then add a skip, and add a note in kata-containers#8337 to unskip them in future.

Thanks for the great work!

Comment on lines 22 to 25
runc_pod_config="$(new_pod_config "$unencrypted_image_1" "kata-${KATA_HYPERVISOR}")"
sed -i '/runtimeClassName:/d' $runc_pod_config

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an idea rather than direct suggestion: Rather than setting the runtime class to kata- and then deleting it, I wonder if it would be clearer to update new_pod_config to allow it to either allow a blank runtime class, or maybe nicer create a separate to pod-config.yaml.in that doesn't have a runtimeclass and let new_pod_config optional take the base config as the final parameter? I just wonder if that would be clearer to read and understand?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought (and considering some of the other suggestions I've made about extending it), maybe overloading new_pod_config with a base runtime class is too much of an edge-case and we'd be better off just adding a comment here to say that although the config is initialised with a runtimeclass it is then immediately deleted, such that it runs as runc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks the comment, updated the new_pod_config function and pod-config.yaml.in template.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my review comments were very confusing, but I left it so that people could give their input. In my first comment I suggested making runtimeclass optional, but after thinking about the idea of making metadata_annotations optional instead I decided that might be more useful, so suggested leaving as is with a comment. Not stuck with shell scripts and not being able to use multiple option values is an annoyance though.

Anyway these comments are mostly related to readability and re-use of code, so they can be refactored later as we add more tests and see the common patterns, so I don't want to hold up this PR due to these thoughts

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to say we want to get runc pod here.


# Get pod specification
kubectl wait --for=condition=Ready --timeout=$timeout pod "test-e2e"
set_node "$runc_pod_config" "$node"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to do set_node for all our pod_configs, should we consider integrating it into new_pod_config and pass it as a third parameter to check the test files cleaner?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me is wondering about adding metadata_annotations to the new_pod_config too, but I can see scenarios, like in this file that they will need to be optional, so that would make it need to be the last parameter. Maybe still worth considering though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to move $node to new_pod_config function but failed, because the runtimeClassName is one optional parameter too one function can't have 2+ optional parameters.
I didn't make the $node as a required parameter, I assume that we do NOT need set in for all other test cases?
So just leave the set_node call asis

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wainersm - any thoughts on this. I think you added the node call as we are running these tests on AKS now, so getting the logs requires us to know which node it's running on, so do you think it should be optional? At the moment the only usage of new_pod_config is the k8s-measured-rootfs.bats test which also calls set_node

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the set_node to new_pod_config function, the codes looks better than before.

@liudalibj
Copy link
Author

Hey DaLi, I've made a few comments for possible ways to make the code a bit easier to read, which @wainersm might also have input on.

The other thing that we might want to consider is the reverse case of issue 8337 (unless you have it and I've missed it), where we pull without a cri-runtime handler annotation, then pull with it and verify that it's pulled on the guest properly. You seem to have the reverse, pulled with annotation, on guest and then without and (TODO) check that it isn't pulled on the guest, but maybe both ways would be useful unless you think the runc case covers that?

Both case are reversed, there are two TODO items, Test we can pull an other unencrypted image (alpine) outside the guest and then inside the guest successfully is for the case we pull without a cri-runtime handler annotation, then pull with it and verify that it's pulled on the guest properly

I also think that for test like this one that we know will fail it's fine to write them as they should work, but then add a skip, and add a note in kata-containers#8337 to unskip them in future.

Thanks for the great work!

@liudalibj
Copy link
Author

@stevenhorsman @ChengyuZhu6 Many thanks for your review comments, I updated the codes, please help to take a look when you are free.

@liudalibj liudalibj force-pushed the guest-pull branch 5 times, most recently from 1ad9581 to 64b8c46 Compare December 14, 2023 08:41
@stevenhorsman
Copy link

Both case are reversed, there are two TODO items

Sorry, I'm not sure how I missed this last night.
Just to ensure I completely understand the Test we can pull an other unencrypted image (alpine) outside the guest and then inside the guest successfully scenario - it is first doing a host pull (but still kata-containers, not runc) using the vanilla kata logic, as it doesn't set the cri-handler annotation, then it does a second pull with the annotation and (in future) checks that it is still pulled on the guest?
If so that works great for me, sorry I got distracted with the first two tests.

I guess something to think about here, and maybe it needs wider discussion is whether having TODOs and changing the expected behaviour is the correct thing to do. There is an argument that this is a pretty serious limitation, so having the tests correct and initially failing might drive useful conversations in the community and potentially get any container 2.0 fixes backported to 1.7 maybe, so we can start using this as soon as possible? I'm a bit worried that what you've done, although it passes the test, hides the problem a bit?

@liudalibj
Copy link
Author

Sorry, I'm not sure how I missed this last night. Just to ensure I completely understand the Test we can pull an other unencrypted image (alpine) outside the guest and then inside the guest successfully scenario - it is first doing a host pull (but still kata-containers, not runc) using the vanilla kata logic, as it doesn't set the cri-handler annotation, then it does a second pull with the annotation and (in future) checks that it is still pulled on the guest? If so that works great for me, sorry I got distracted with the first two tests.

Yeah, that case try to pull image on host(without annotation) first and then try to pull image on guest(with annotation)

I guess something to think about here, and maybe it needs wider discussion is whether having TODOs and changing the expected behaviour is the correct thing to do. There is an argument that this is a pretty serious limitation, so having the tests correct and initially failing might drive useful conversations in the community and potentially get any container 2.0 fixes backported to 1.7 maybe, so we can start using this as soon as possible? I'm a bit worried that what you've done, although it passes the test, hides the problem a bit?

Yeah, agree with you @stevenhorsman that current way hides the problem.
So we should update the test case to expect all correct count of rootfs from now, and then:

  • option1: failed the test case
  • option2: skip the test case, enable it after we have containerd 2.0
    Which options we should use here @stevenhorsman @ChengyuZhu6 ?

@liudalibj liudalibj force-pushed the guest-pull branch 2 times, most recently from f7efa72 to 344aa76 Compare December 15, 2023 03:07
@stevenhorsman
Copy link

So we should update the test case to expect all correct count of rootfs from now, and then:

  • option1: failed the test case
  • option2: skip the test case, enable it after we have containerd 2.0
    Which options we should use here @stevenhorsman @ChengyuZhu6 ?

So this might be kicking the can down the road a bit, but for this PR I'd recommend option 1 where we let the test case fail and then as part of the main PR 8484 (IIRC) we can test it out and have the wider discussion with the community to highlight that it is still an issue and try and get broader agreement on a path forward.

- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
@liudalibj
Copy link
Author

So this might be kicking the can down the road a bit, but for this PR I'd recommend option 1 where we let the test case fail and then as part of the main PR 8484 (IIRC) we can test it out and have the wider discussion with the community to highlight that it is still an issue and try and get broader agreement on a path forward.

Updated the test cases to use option 1 with comments provided.

Copy link

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks DaLi for the great work!

@ChengyuZhu6
Copy link
Owner

Thanks @liudalibj LGTM!

@ChengyuZhu6 ChengyuZhu6 merged commit 17970a5 into ChengyuZhu6:guest-pull Dec 18, 2023
1 check failed
@liudalibj liudalibj deleted the guest-pull branch December 18, 2023 06:34
ChengyuZhu6 pushed a commit that referenced this pull request May 22, 2024
Preparation for complete GPU rootfs build step #1/#N

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants