Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: use busybox to reduce e2e-test execution time #1605

Merged
merged 2 commits into from
Dec 22, 2023

Conversation

liudalibj
Copy link
Member

@liudalibj liudalibj commented Nov 28, 2023

  • image layer order issue is fixed by versions: bump kata ref #1632, thanks @mkulke
  • change to check container status for pod with running and testcommands.
  • keep simplepod test using nginx
  • use busybox to reduce e2e-test execution time
  • add a new doTestNginxDeployement test case

part of #1450

Signed-off-by: Da Li Liu liudali@cn.ibm.com

@huoqifeng
Copy link
Contributor

I knew @sudharshanibm3 and @stevenhorsman did many investigation on this unstable nginx pod, any comments on that?

@sudharshanibm3
Copy link
Contributor

sudharshanibm3 commented Nov 28, 2023

Hi @huoqifeng
Here we found some errors while using nginx pods which is reproducible while running test cases
#1450 (comment)
might be caused by
nginx-issue

@mkulke
Copy link
Contributor

mkulke commented Nov 29, 2023

I still find it weird that library/nginx is causing flaky tests for us and works presumably for the rest of the world. I'm all in for getting rid of flaky tests, but I worry that if we replace nginx w/ busybox+sleep we might obscure a nasty race condition.

Do we have any suspicion what a root cause might be, maybe the way a specific test is written? the linked issue above doesn't make it look like a well-known issue with library/nginx.

I would at least suggest documenting the behaviour as a bug-issue "Can't use nginx in e2e tests" or sth and provided context, w/ a link to this and previous discussions.

@liudalibj liudalibj force-pushed the e2e-tests-update branch 4 times, most recently from b0b5bf5 to 8549685 Compare December 5, 2023 10:39
@liudalibj
Copy link
Member Author

I still find it weird that library/nginx is causing flaky tests for us and works presumably for the rest of the world. I'm all in for getting rid of flaky tests, but I worry that if we replace nginx w/ busybox+sleep we might obscure a nasty race condition.

Do we have any suspicion what a root cause might be, maybe the way a specific test is written? the linked issue above doesn't make it look like a well-known issue with library/nginx.

I would at least suggest documenting the behaviour as a bug-issue "Can't use nginx in e2e tests" or sth and provided context, w/ a link to this and previous discussions.

Yeah, it looks like not a well-known issue with library/nginx, I will try to do some debug on the failed test case, maybe we can pass the nginx issue with tiny changes.

@liudalibj liudalibj force-pushed the e2e-tests-update branch 15 times, most recently from 71ab108 to a938e6a Compare December 6, 2023 09:19
@liudalibj liudalibj marked this pull request as draft December 6, 2023 14:34
@stevenhorsman
Copy link
Member

We discussed this on the community call and the suggestion is that whilst we don't want to hide and lose the nginx issue, having a range of tests that occasionally fail isn't helpful either, so I propose that we swap the defaultimages for tests to busybox, but at the same time we add an nginx specific test, maybe using the approach Magnus mentioned: #1450 (comment) to add multiple replicas to get this reliable.

@liudalibj
Copy link
Member Author

liudalibj commented Dec 20, 2023

Note: at the moment, a fix cannot be applied to kata, b/c of an issue w/ guest-components.

So this pr still valid that we replace replace nginx image with busybox image, I would like skip the new added deployment nginx test cases now, enable them after we have confidential-containers/guest-components#404 in kata side.

WDYT @stevenhorsman @mkulke @wainersm @bpradipt ?

@stevenhorsman
Copy link
Member

So this pr still valid that we replace replace nginx image with busybox image, I would like skip the new added deployment nginx test cases now, enable them after we have confidential-containers/guest-components#404 in kata side.

WDYT @stevenhorsman @mkulke @wainersm @bpradipt ?

So the kata fix should be merged into CAA very soon (#1632), so I think my suggested course, based on discussions last week

  1. Leave the tests as is and see if the image-rs laying problem, once merged has made the e2e tests with nginx reliable to check if it is working for us - I'm assuming that it will
  2. Update the 'feature tests' e.g. things testing secrets, config map, authentication to use busybox as this PR does so we don't have an extra variable that might cause failures, but also add a new 'multi-layer' image test that we run multiple replicas of to check that the laying issue doesn't regress. In the short-term we could use nginx for this, but I think Magnus has an idea of making a specific test image with layers arranged to make failures more reliable if the order gets scrambled. That test image should be tested in image-rs, but we could also use it here for peer pods?

I'm not sure if that makes sense and is a reasonale plan?

@mkulke
Copy link
Contributor

mkulke commented Dec 20, 2023

So this pr still valid that we replace replace nginx image with busybox image, I would like skip the new added deployment nginx test cases now, enable them after we have confidential-containers/guest-components#404 in kata side.
WDYT @stevenhorsman @mkulke @wainersm @bpradipt ?

So the kata fix should be merged into CAA very soon (#1632), so I think my suggested course, based on discussions last week

  1. Leave the tests as is and see if the image-rs laying problem, once merged has made the e2e tests with nginx reliable to check if it is working for us - I'm assuming that it will
  2. Update the 'feature tests' e.g. things testing secrets, config map, authentication to use busybox as this PR does so we don't have an extra variable that might cause failures, but also add a new 'multi-layer' image test that we run multiple replicas of to check that the laying issue doesn't regress. In the short-term we could use nginx for this, but I think Magnus has an idea of making a specific test image with layers arranged to make failures more reliable if the order gets scrambled. That test image should be tested in image-rs, but we could also use it here for peer pods?

I'm not sure if that makes sense and is a reasonale plan?

sounds good. maybe we can leave the simplepod test using nginx and switch to busybox for others, if just to reduce test execution time. A specific multi-layer test w/ a contrived image should be part of image-rs's unit tests, I think, we might not need it in CAA.

@liudalibj liudalibj force-pushed the e2e-tests-update branch 2 times, most recently from a1d4aeb to 9f2479b Compare December 22, 2023 08:31
@liudalibj liudalibj changed the title e2e: use busybox to reduce unexpected e2e-test fail e2e: use busybox to reduce e2e-test execution time Dec 22, 2023
@liudalibj
Copy link
Member Author

@mkulke @stevenhorsman I updated this PR please help to check again:

  • image layer order issue is fixed by versions: bump kata ref #1632, thanks @mkulke
  • change to check container status for pod with running and testcommands.
  • keep simplepod test using nginx
  • use busybox to reduce e2e-test execution time
  • add a new doTestNginxDeployement test case

@mkulke
Copy link
Contributor

mkulke commented Dec 22, 2023

@mkulke @stevenhorsman I updated this PR please help to check again:

  • image layer order issue is fixed by versions: bump kata ref #1632, thanks @mkulke
  • change to check container status for pod with running and testcommands.
  • keep simplepod test using nginx
  • use busybox to reduce e2e-test execution time
  • add a new doTestNginxDeployement test case

is doTestNginxDeployement redundant if we keep simplepod using nginx?

@liudalibj
Copy link
Member Author

@mkulke @stevenhorsman I updated this PR please help to check again:

  • image layer order issue is fixed by versions: bump kata ref #1632, thanks @mkulke
  • change to check container status for pod with running and testcommands.
  • keep simplepod test using nginx
  • use busybox to reduce e2e-test execution time
  • add a new doTestNginxDeployement test case

is doTestNginxDeployement redundant if we keep simplepod using nginx?

We didn't have a test case to create one deployement with replicates, from this view, I prefer it is a new valid case?

@liudalibj liudalibj added the test_e2e_libvirt Run Libvirt e2e tests label Dec 22, 2023
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey DaLi, thanks a lot for this, it's looking pretty good. I have a few questions and comments:
0. I initially thought that refactoring some of the generic methods in nginx_deployment_test into a common file e.g. asssessment_runner_test.go would be better as they could be re-used, but on reflection I think you've done the correct thing and we have refactor out later if we want to re-used them rather than doing it straight away, so no need to change here

  1. Should we add the doTestNginxDeployment test to the other cloud-providers e.g. ibmcloud and azure, or where you thinking about doing that in a future PR instead? I think it's a good test, so don't want to see it forgotten
  2. What is you reason for keeping createSimplePod test using nginx, given that we have the new nginx deployment test which more effectively tests the multi-layers issue? If we could switch it over to busybox (to match deleteSimplePodTest for example) then it should reduce the complexity of that test case as it only deals with a single layer image and would allow us to remove some of the extra code like newNginxPod and newNginxPodWithName which are only used in that single case?
  3. I think there is an argument that the PR would be 'cleaner' if it was two separate commits - one for switching over nginx to busybox and another that adds the nginx deployment test, but this isn't a blocker

Sorry if I'm not coming across as being helpful in this review, or overly picking, I think the PR is good and none of these are blockers for merging, just thoughts that I had that I wanted to share and hear your comments on. Thanks!

test/e2e/nginx_deployment_test.go Outdated Show resolved Hide resolved
- change to check container status for pod with running and testcommands.
- use busybox to reduce test execution time
- reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
@liudalibj liudalibj removed the test_e2e_libvirt Run Libvirt e2e tests label Dec 22, 2023
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the updates!

@stevenhorsman stevenhorsman added the test_e2e_libvirt Run Libvirt e2e tests label Dec 22, 2023
@liudalibj
Copy link
Member Author

Hey DaLi, thanks a lot for this, it's looking pretty good. I have a few questions and comments: 0. I initially thought that refactoring some of the generic methods in nginx_deployment_test into a common file e.g. asssessment_runner_test.go would be better as they could be re-used, but on reflection I think you've done the correct thing and we have refactor out later if we want to re-used them rather than doing it straight away, so no need to change here

  1. Should we add the doTestNginxDeployment test to the other cloud-providers e.g. ibmcloud and azure, or where you thinking about doing that in a future PR instead? I think it's a good test, so don't want to see it forgotten

Added this test cases to all provider.

  1. What is you reason for keeping createSimplePod test using nginx, given that we have the new nginx deployment test which more effectively tests the multi-layers issue? If we could switch it over to busybox (to match deleteSimplePodTest for example) then it should reduce the complexity of that test case as it only deals with a single layer image and would allow us to remove some of the extra code like newNginxPod and newNginxPodWithName which are only used in that single case?

Use busybox for createSimplePod too, it looks simpler now.

  1. I think there is an argument that the PR would be 'cleaner' if it was two separate commits - one for switching over nginx to busybox and another that adds the nginx deployment test, but this isn't a blocker

Split the codes to two commits.

Sorry if I'm not coming across as being helpful in this review, or overly picking, I think the PR is good and none of these are blockers for merging, just thoughts that I had that I wanted to share and hear your comments on. Thanks!

Thanks @stevenhorsman good comments, I updated this PR.

@liudalibj liudalibj force-pushed the e2e-tests-update branch 3 times, most recently from e8f86c8 to 3372f4a Compare December 22, 2023 12:10
- add a new test case create one nginx deployment with 2 replicas

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
@liudalibj
Copy link
Member Author

CI build passed when replicas of nginx deployment test is 2, it is failed when replicas is 5 it maybe caused by the limit resources.
Merge this pr now.

@liudalibj liudalibj merged commit 4ac087d into confidential-containers:main Dec 22, 2023
25 checks passed
@liudalibj liudalibj deleted the e2e-tests-update branch December 22, 2023 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test_e2e_libvirt Run Libvirt e2e tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants