Skip to content

sigusr2 autest: simplify Process and Ready objects#13021

Merged
bneradt merged 1 commit intoapache:masterfrom
bneradt:rewrite-sigusr2-autest
Mar 27, 2026
Merged

sigusr2 autest: simplify Process and Ready objects#13021
bneradt merged 1 commit intoapache:masterfrom
bneradt:rewrite-sigusr2-autest

Conversation

@bneradt
Copy link
Copy Markdown
Contributor

@bneradt bneradt commented Mar 25, 2026

Rewrite the sigusr2 autest to drive each scenario from a single
shel helper script instead of a chain of AuTest Processes.

This removes the Ready-condition races from the old test, keeps the
signal and log rotation ordering in one place, and makes the log file
expectations easier to follow. This should address flakiness in the
test that has been seen in CI.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the sigusr2 gold test to reduce flakiness by consolidating scenario orchestration into dedicated shell helper scripts instead of chaining multiple AuTest Process objects with Ready conditions.

Changes:

  • Added sigusr2_rotate_diags.sh to perform diags.log rotation orchestration (wait → rename → SIGUSR2 → verify).
  • Added sigusr2_rotate_custom_log.sh to drive configured access log rotation via requests + SIGUSR2 and verify log placement.
  • Simplified sigusr2.test.py to use one default process per scenario that invokes the helper scripts, and moved log assertions into clearer per-scenario methods.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/gold_tests/logging/sigusr2_rotate_diags.sh New helper script that deterministically sequences diags.log rename + SIGUSR2 + content checks.
tests/gold_tests/logging/sigusr2_rotate_custom_log.sh New helper script that drives /first→rotate→/second→SIGUSR2→/third and waits for expected log writes.
tests/gold_tests/logging/sigusr2.test.py Reworked test to call the helper scripts (single-process orchestration per run) and to simplify ATS/server setup and assertions.

Copy link
Copy Markdown
Contributor

@bryancall bryancall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shell-script approach is a big improvement over the old Process chain. Much easier to follow the sequencing, and the explicit polling loops with 60-second timeouts should handle the ASAN/CI slowness that was causing the flakiness.

Two things I noticed:

  1. Missing StillRunningAfter -- The old test had tr1.StillRunningAfter = diags_test.ts (and the server) to verify ATS survived the signal. Neither add_system_log_test nor add_configured_log_test sets this. Without it, if ATS crashes during SIGUSR2 handling, the test could still pass as long as the shell script exits 0 (the shell script's wait_for_contains would succeed on files written before the crash). Might be worth adding back.

  2. Typo in commit message -- "shel helper script" should be "shell helper script" (appears twice).

Rewrite the sigusr2 autest to drive each scenario from a single
shell helper script instead of a chain of AuTest Processes.

This removes the Ready-condition races from the old test, keeps the
signal and log rotation ordering in one place, and makes the log file
expectations easier to follow. This should address flakiness in the
test that has been seen in CI.
@bneradt bneradt force-pushed the rewrite-sigusr2-autest branch from 231992d to cd14573 Compare March 27, 2026 16:32
@bneradt
Copy link
Copy Markdown
Contributor Author

bneradt commented Mar 27, 2026

The shell-script approach is a big improvement over the old Process chain. Much easier to follow the sequencing, and the explicit polling loops with 60-second timeouts should handle the ASAN/CI slowness that was causing the flakiness.

Two things I noticed:

  1. Missing StillRunningAfter -- The old test had tr1.StillRunningAfter = diags_test.ts (and the server) to verify ATS survived the signal. Neither add_system_log_test nor add_configured_log_test sets this. Without it, if ATS crashes during SIGUSR2 handling, the test could still pass as long as the shell script exits 0 (the shell script's wait_for_contains would succeed on files written before the crash). Might be worth adding back.

Actually, with this change, we were able to associate the ATS instance with the more specific TestRun itself rather than the global Test namespace. That's a good thing since it is a cleaner solution - it's easier to think through the processes if they are all neatly associated with their own TestRun. Therefore the TS instance will not be "StillRunningAfter" - it intentionally is cleaned up when the TestRun completes. That said, any crash would manifest itself as a non-zero return in TS and will be seen via that mechanism.

  1. Typo in commit message -- "shel helper script" should be "shell helper script" (appears twice).

Fixed.

@bryancall bryancall self-requested a review March 27, 2026 17:48
@bneradt bneradt merged commit 15e879b into apache:master Mar 27, 2026
15 checks passed
@bneradt bneradt deleted the rewrite-sigusr2-autest branch March 27, 2026 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants