test/system: make some tests faster part 1 #22821

Luap99 · 2024-05-27T17:02:25Z

So many tests use sleep or run into the default 10s stop timeout, this is just part 1 of many. I try to fix as much as I can but this is very slow tedious work.

see commits for details

Does this PR introduce a user-facing change?

None

It takes over 10 seconds for this test as it uses --wait 5 twice which runs into the timeout. IMO this tests is just redundant as it is already covered in the e2e tests much better. Thus remove it here. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Defining a timer with a fixed interval is not a good idea as we first have to wait until the timer triggers, while the interval was every two seconds it means that we have to wait at least 2s for it to start. However much worse it means it triggers the unit over and over, this seems to cause some soft of race with the output check. I have seen this test run 10-60s which does not make much sense. Switching the timer to trgger once on start seem to make the test run consistently in 7s locally for me so this is much better. There still is the question if we really have to test this at all on each upstream PR but I left it for now. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Merge two podman event tests into one to speed them up as they did mostly the same anyway. This way we only have to do the setup/teardown once and only run one container. Second, add the --since option because reading the journal can be slow if you have thousands of event entries. This is not so critical in CI as we run on fresh systems but on local dev machines I have almost 100k events in the journal so parsing all of them makes this test slow (like 30s), with this change I can get it under 1s. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

This test waits 15 seconds to send sigterm for no good reason, we can just make the timeout shorter. Also make sure the podman command quit on sigterm by looking for the output message. While at it fix the tests to use $PODMAN_TMPDIR not /tmp and define the yaml in the test instead of using the podman create && podman kube generate && podman rm way to create the yaml as it is a bit slower as we have to call three podman commands for it. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

There is really no point in waiting 10s for the kill, let's use 2 this should be good enough to observe the timing. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

IMO it is not important to cover each case with each sdnotify policy, to speed them up we run all the exit code cases only once just twice for each policy while switching the sdnotify policy between each case. This way we safe 50% of runs and should still have sufficient coverage. Before it took around 24 seconds, with this it is around 12 seconds now. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Both tests take 10s longer than they need to because they run the sleep command int he container which does not react to sigterm, as such podman waits 10s before killing it with sigkill. To fix it just stop them with podman rm -fa -t0 to avoid the wait and do not use podman kube down as we cannot set a timeout there. podman kube down is still covered in many other tests so this is not an issue. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

The test does a normal stop on a command that does not react to sigterm. As I cannot fix the system stop logic use a command which does. This safes us 10s as it no longer waits for the timeout. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

openshift-ci · 2024-05-27T17:02:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

edsantiago · 2024-05-27T17:05:14Z

Thanks for doing this. I know it's tedious. I will review tomorrow.

FWIW I've given up on my initial approach to parallelizing tests. My new thinking involves splitting system tests into two subdirectories, parallel and serial. Some of the huge time hogs (help, completion) can easily be parallelized. Again, will explore all that later in the week.

Luap99 · 2024-05-27T17:15:44Z

FWIW I've given up on my initial approach to parallelizing tests. My new thinking involves splitting system tests into two subdirectories, parallel and serial. Some of the huge time hogs (help, completion) can easily be parallelized. Again, will explore all that later in the week.

That is my conclusion as well. I was thinking splitting it in some form of (OS dependent/independent), e.g. the slow podman --help, shell completion, --format checks all do not have to run in on every os combination. We could use only two tasks for it local/remote which should also safe some compute time overall.

edsantiago

LGTM but one fix needed. Thank you for taking this on.

test/system/250-systemd.bats

This container did not react to sigterm thus we always waited 10s for it to stop. Also do not wait 2s for the logs instead use a retry loop. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

edsantiago · 2024-05-28T11:58:31Z

/lgtm

PR containers#22821 (CI speedup) was overly aggressive in one kube test. It's flaking. Bump up timeout from 3s to 4. Signed-off-by: Ed Santiago <santiago@redhat.com>

Luap99 added 8 commits May 27, 2024 15:54

test/system: speed up "podman run --timeout"

94ba2cf

There is really no point in waiting 10s for the kill, let's use 2 this should be good enough to observe the timing. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

openshift-ci bot added the release-note-none label May 27, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2024

edsantiago requested changes May 28, 2024

View reviewed changes

test/system/250-systemd.bats Outdated Show resolved Hide resolved

test/system: speed up podman generate systemd - envar

1093ebb

This container did not react to sigterm thus we always waited 10s for it to stop. Also do not wait 2s for the logs instead use a retry loop. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Luap99 force-pushed the fast-system-test branch from f5283c3 to 1093ebb Compare May 28, 2024 11:54

openshift-ci bot assigned edsantiago May 28, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2024

openshift-merge-bot bot merged commit aca5a7b into containers:main May 28, 2024
89 checks passed

Luap99 deleted the fast-system-test branch May 28, 2024 14:46

This was referenced May 28, 2024

test/system: make some tests faster part 2 #22826

Merged

test/system: Add test for journald log check in quadlet #22736

Merged

This was referenced May 29, 2024

Windows: unable to connect to Podman socket: failed to read identity #22843

Open

windows: system connection: unexpected ports #22844

Closed

Luap99 mentioned this pull request Jun 3, 2024

test/system: make some tests faster part 3 #22886

Merged

edsantiago mentioned this pull request Jun 20, 2024

systests: kube: bump up a timeout #23059

Merged

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 27, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/system: make some tests faster part 1 #22821

test/system: make some tests faster part 1 #22821

Luap99 commented May 27, 2024

openshift-ci bot commented May 27, 2024

edsantiago commented May 27, 2024

Luap99 commented May 27, 2024

edsantiago left a comment

edsantiago commented May 28, 2024

test/system: make some tests faster part 1 #22821

test/system: make some tests faster part 1 #22821

Conversation

Luap99 commented May 27, 2024

Does this PR introduce a user-facing change?

openshift-ci bot commented May 27, 2024

edsantiago commented May 27, 2024

Luap99 commented May 27, 2024

edsantiago left a comment

Choose a reason for hiding this comment

edsantiago commented May 28, 2024