Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration Tests Fix #3638

Closed
wants to merge 4 commits into from

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Oct 31, 2023

Close: #3440

1. Explain what the PR does

a8d94ad chore: bring some colour into tests life
0650e61 fix(tests): on error stop tracee gracefully
12228df fix(tests): integration timeout handling
d240add chore(tests): integration changes

0650e61 fix(tests): on error stop tracee gracefully

Return all errors to the main test loop to check and abort the tracee
execution in the correct way.

This also:

- Introduces cmdEvents.waitFor duration as a guarantee before starting
  to collect events after executing a command.
- Changes the following tests to be certain of the events collected:
  . pid: trace events from pid 1
  . pid: event: args: trace event sched_switch with args from pid 0
  . pid: trace new (should be empty)

12228df fix(tests): integration timeout handling

Updated the Tracee integration tests to address a bug with the
time.After usage. Previously, at each iteration of the loop, the
time.After was reset whenever its corresponding select clause was
reached. This behavior resulted in the creation of a new timer on each
loop iteration without freeing the previous one, leading to a memory
leak.

The solution involved replacing time.After with a dedicated
timeoutTicker, ensuring consistent timeout handling without unnecessary
resource consumption.

d240add chore(tests): integration changes

- add coolDown to sleep before to run the next test.
- change runCmd to receive the number of expected events.
- ExpectAnyOfEvts (ExpectAnyOfEach before) now call runCmd passing 1 as
  expected events.
- eventBuffer.clear() now sets a new empty slice for it, instead of
  relying on the old underlying array.
- replaced copyActualEvents() with eventBuffer.getCopy().
- refactored test functions, improved comments and logs.

2. Explain how to test it

3. Other comments

@aqua-ci

This comment was marked as outdated.

@geyslan
Copy link
Member Author

geyslan commented Oct 31, 2023

@rafaeldtinoco, marked you but it's just for v0.20.

@geyslan
Copy link
Member Author

geyslan commented Oct 31, 2023

https://github.com/aquasecurity/tracee/actions/runs/6713255929/job/18244550422?pr=3305#step:4:487

image

It is no longer reproducible in local tests after increasing the cooldown period to 2s between the two kill -SIGHUP 1, however in the github workflow the second SIGHUP called in sequence continues not emitting events to tracee. I have already reproduced a fix moving one of these tests away from the other, but before doing that I would like to know if it is safe to have tests dealing with the init process - I suspect not. @rafaeldtinoco any ideas?

- add coolDown to sleep before to run the next test.
- change runCmd to receive the number of expected events.
- ExpectAnyOfEvts (ExpectAnyOfEach before) now call runCmd passing 1 as
  expected events.
- eventBuffer.clear() now sets a new empty slice for it, instead of
  relying on the old underlying array.
- replaced copyActualEvents() with eventBuffer.getCopy().
- refactored test functions, improved comments and logs.
Updated the Tracee integration tests to address a bug with the
time.After usage. Previously, at each iteration of the loop, the
time.After was reset whenever its corresponding select clause was
reached. This behavior resulted in the creation of a new timer on each
loop iteration without freeing the previous one, leading to a memory
leak.

The solution involved replacing time.After with a dedicated
timeoutTicker, ensuring consistent timeout handling without unnecessary
resource consumption.
Return all errors to the main test loop to check and abort the tracee
execution in the correct way.

This also:

- Introduces cmdEvents.waitFor duration as a guarantee before starting
  to collect events after executing a command.
- Changes the following tests to be certain of the events collected:
  . pid: trace events from pid 1
  . pid: event: args: trace event sched_switch with args from pid 0
  . pid: trace new (should be empty)
@geyslan geyslan force-pushed the 3440-integration-timeout branch 2 times, most recently from 0e89284 to a778c39 Compare November 3, 2023 21:26
@geyslan
Copy link
Member Author

geyslan commented Nov 3, 2023

It is no longer reproducible in local tests after increasing the cooldown period to 2s between the two kill -SIGHUP 1, however in the github workflow the second SIGHUP called in sequence continues not emitting events to tracee. I have already reproduced a fix moving one of these tests away from the other, but before doing that I would like to know if it is safe to have tests dealing with the init process - I suspect not. @rafaeldtinoco any ideas?

I discovered that sequential calls to systemd's reload system will be disregarded if it's still processing:

However instead of relocating the sequential tests that signaled SIGHUP (what would solve that issue), I changed them to different signals like SIGUSR1 which tries to reconnect to the D-Bus bus.

I also removed the attempt to get events from init - sysvinit.

@geyslan
Copy link
Member Author

geyslan commented Nov 6, 2023

Ready for review.

@@ -704,6 +704,31 @@ clean-e2e-inst-signatures:
#
$(CMD_RM) -rf $(OUTPUT_DIR)/e2e-inst-signatures


AWK_CMD_TEST_COLOR = '{ \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below. I'm +1 without the colors/awk change in Makefile. I'm giving +1, feel free to merge, but please remove the colors logic (we have -1 other changes that did similar things in the past).

@@ -704,6 +704,31 @@ clean-e2e-inst-signatures:
#
$(CMD_RM) -rf $(OUTPUT_DIR)/e2e-inst-signatures


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a no for me, sorry. I love colors but dont want to start having to deal with ansi color codes and bugs born from that. Also, makes the makefile very ugly, hope you dont mind removing this.

Copy link
Contributor

@rafaeldtinoco rafaeldtinoco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (check my comments). Thanks.

@rafaeldtinoco
Copy link
Contributor

Im rebasing this at: #3683 and removing the colors change since you're on vacation and its better to have this merged to avoid failures in testing. Cheers!

@geyslan geyslan deleted the 3440-integration-timeout branch November 28, 2023 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integration tests timeout
3 participants