Skip to content

Fix flaky autests for timeout, sigusr2, and thread_config#13012

Open
bryancall wants to merge 3 commits intoapache:masterfrom
bryancall:fix/flaky-autests
Open

Fix flaky autests for timeout, sigusr2, and thread_config#13012
bryancall wants to merge 3 commits intoapache:masterfrom
bryancall:fix/flaky-autests

Conversation

@bryancall
Copy link
Contributor

Summary

  • Stabilize three flaky AuTests by hardening helper/process behavior under parallel ASAN runs.
  • Ignore SIGPIPE and retry accept() on EINTR in ssl-delay-server to avoid spurious helper exits.
  • Remove deadlocking startup sequencing in sigusr2 and improve thread_config process matching for ASAN command lines.

Test plan

  • Run AuTests in ASAN configuration and verify these tests pass reliably.
  • Validate no production code paths are changed (test-only PR).
  • Re-run Apache CI to confirm cross-platform stability.

The ssl-delay-server test helper could die unexpectedly when a
client disconnects during the handshake delay. SIGPIPE from the
broken connection kills the process, or accept() returns EINTR
under heavy parallel load. Add SIGPIPE ignore and EINTR retry to
keep the server alive for the StillRunningAfter check.
Test 1's Default process had Ready = When.FileExists(diags.log),
but by the time Default starts, rotate_diags_log has already moved
diags.log to diags.log_old. This creates a deadlock: Default waits
for diags.log to exist, but only SIGUSR2 (sent by Default) would
cause TS to recreate it. The StartBefore chain already guarantees
correct ordering (ts → rotate → Default), so the Ready condition
is unnecessary and harmful.
Under ASAN, the ATS process CWD may differ from the
expected ts_path. Fall back to matching ts_path in the
process command line arguments so the test can find the
correct traffic_server process.
@bryancall bryancall self-assigned this Mar 23, 2026
@bryancall bryancall added AuTest Tests ASan Address Sanitizer labels Mar 23, 2026
@bryancall bryancall added this to the 11.0.0 milestone Mar 23, 2026
@bryancall bryancall requested a review from Copilot March 23, 2026 17:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Test-only PR to stabilize three flaky AuTests under parallel ASAN runs by hardening helper behavior and improving AuTest process sequencing / matching.

Changes:

  • Update ssl-delay-server helper to ignore SIGPIPE and retry accept() on EINTR.
  • Make thread_config’s thread-count helper identify the correct traffic_server process more reliably under ASAN by matching via CWD or command line.
  • Adjust sigusr2 test process ordering to remove a deadlocking Ready condition and clarify the intended startup chain.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/gold_tests/timeout/ssl-delay-server.cc Improves helper robustness against SIGPIPE and EINTR during accept.
tests/gold_tests/thread_config/check_threads.py Improves ATS process identification under ASAN by broadening the matching criteria.
tests/gold_tests/logging/sigusr2.test.py Removes deadlocking Ready gating and documents intended process ordering for SIGUSR2 log rotation.

@@ -199,8 +204,10 @@ main(int argc, char *argv[])
for (;;) {
sfd = accept(listenfd, (struct sockaddr *)nullptr, nullptr);
if (sfd <= 0) {
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept() returns -1 on error, but can legally return fd 0 on success (e.g., if stdin is closed). The if (sfd <= 0) check will treat a valid connection as failure and exit, potentially reintroducing flakiness. Use sfd < 0 for the error check, keeping the EINTR retry logic the same.

Suggested change
if (sfd <= 0) {
if (sfd < 0) {

Copilot uses AI. Check for mistakes.
Comment on lines +133 to 137
# Configure process order: ts starts first, then rotate moves diags.log,
# then Default sends SIGUSR2. No Ready condition needed on Default since the
# StartBefore chain already ensures ts is fully started before rotate runs.
tr1.Processes.Default.StartBefore(rotate_diags_log)
rotate_diags_log.StartBefore(diags_test.ts)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the StartBefore chain ensures TS is "fully started", but StartBefore only enforces launch ordering; it doesn’t guarantee diags.log exists before mv runs. With the Ready condition removed from Default, rotate_diags_log can still race TS startup and fail if diags.log hasn’t been created yet. Consider moving the file-exists gating to the rotate step (e.g., via a ready= condition on the StartBefore edge or by making the rotate command wait/retry) and/or adjust the comment to avoid implying readiness guarantees.

Copilot uses AI. Check for mistakes.
@bryancall bryancall requested a review from bneradt March 23, 2026 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASan Address Sanitizer AuTest Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants