Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/system: make some tests faster part 1 #22821

Merged
merged 9 commits into from
May 28, 2024

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented May 27, 2024

So many tests use sleep or run into the default 10s stop timeout, this is just part 1 of many. I try to fix as much as I can but this is very slow tedious work.

see commits for details

Does this PR introduce a user-facing change?

None

It takes over 10 seconds for this test as it uses --wait 5 twice which
runs into the timeout. IMO this tests is just redundant as it is already
covered in the e2e tests much better. Thus remove it here.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Defining a timer with a fixed interval is not a good idea as we first
have to wait until the timer triggers, while the interval was every two
seconds it means that we have to wait at least 2s for it to start.
However much worse it means it triggers the unit over and over, this
seems to cause some soft of race with the output check. I have seen
this test run 10-60s which does not make much sense.

Switching the timer to trgger once on start seem to make the test run
consistently in 7s locally for me so this is much better.

There still is the question if we really have to test this at all on
each upstream PR but I left it for now.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Merge two podman event tests into one to speed them up as they did
mostly the same anyway. This way we only have to do the setup/teardown
once and only run one container.

Second, add the --since option because reading the journal can be slow
if you have thousands of event entries. This is not so critical in CI as
we run on fresh systems but on local dev machines I have almost 100k
events in the journal so parsing all of them makes this test slow (like
30s), with this change I can get it under 1s.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This test waits 15 seconds to send sigterm for no good reason, we can
just make the timeout shorter. Also make sure the podman command quit on
sigterm by looking for the output message.

While at it fix the tests to use $PODMAN_TMPDIR not /tmp and define the
yaml in the test instead of using the podman create && podman kube
generate && podman rm way to create the yaml as it is a bit slower as we
have to call three podman commands for it.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
There is really no point in waiting 10s for the kill, let's use 2 this
should be good enough to observe the timing.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
IMO it is not important to cover each case with each sdnotify policy, to
speed them up we run all the exit code cases only once just twice for
each policy while switching the sdnotify policy between each case. This
way we safe 50% of runs and should still have sufficient coverage.

Before it took around 24 seconds, with this it is around 12 seconds now.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Both tests take 10s longer than they need to because they run the sleep
command int he container which does not react to sigterm, as such podman
waits 10s before killing it with sigkill.

To fix it just stop them with podman rm -fa -t0 to avoid the wait and do
not use podman kube down as we cannot set a timeout there. podman kube
down is still covered in many other tests so this is not an issue.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The test does a normal stop on a command that does not react to sigterm.
As I cannot fix the system stop logic use a command which does. This
safes us 10s as it no longer waits for the timeout.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Copy link
Contributor

openshift-ci bot commented May 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2024
@edsantiago
Copy link
Member

Thanks for doing this. I know it's tedious. I will review tomorrow.

FWIW I've given up on my initial approach to parallelizing tests. My new thinking involves splitting system tests into two subdirectories, parallel and serial. Some of the huge time hogs (help, completion) can easily be parallelized. Again, will explore all that later in the week.

@Luap99
Copy link
Member Author

Luap99 commented May 27, 2024

FWIW I've given up on my initial approach to parallelizing tests. My new thinking involves splitting system tests into two subdirectories, parallel and serial. Some of the huge time hogs (help, completion) can easily be parallelized. Again, will explore all that later in the week.

That is my conclusion as well. I was thinking splitting it in some form of (OS dependent/independent), e.g. the slow podman --help, shell completion, --format checks all do not have to run in on every os combination. We could use only two tasks for it local/remote which should also safe some compute time overall.

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but one fix needed. Thank you for taking this on.

test/system/250-systemd.bats Outdated Show resolved Hide resolved
This container did not react to sigterm thus we always waited 10s for it
to stop. Also do not wait 2s for the logs instead use a retry loop.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@edsantiago
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit aca5a7b into containers:main May 28, 2024
89 checks passed
@Luap99 Luap99 deleted the fast-system-test branch May 28, 2024 14:46
edsantiago added a commit to edsantiago/libpod that referenced this pull request Jun 20, 2024
PR containers#22821 (CI speedup) was overly aggressive in one kube test.
It's flaking. Bump up timeout from 3s to 4.

Signed-off-by: Ed Santiago <santiago@redhat.com>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 27, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Aug 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants