Fix flaky autests for timeout, sigusr2, and thread_config#13012
Fix flaky autests for timeout, sigusr2, and thread_config#13012bryancall wants to merge 3 commits intoapache:masterfrom
Conversation
The ssl-delay-server test helper could die unexpectedly when a client disconnects during the handshake delay. SIGPIPE from the broken connection kills the process, or accept() returns EINTR under heavy parallel load. Add SIGPIPE ignore and EINTR retry to keep the server alive for the StillRunningAfter check.
Test 1's Default process had Ready = When.FileExists(diags.log), but by the time Default starts, rotate_diags_log has already moved diags.log to diags.log_old. This creates a deadlock: Default waits for diags.log to exist, but only SIGUSR2 (sent by Default) would cause TS to recreate it. The StartBefore chain already guarantees correct ordering (ts → rotate → Default), so the Ready condition is unnecessary and harmful.
Under ASAN, the ATS process CWD may differ from the expected ts_path. Fall back to matching ts_path in the process command line arguments so the test can find the correct traffic_server process.
There was a problem hiding this comment.
Pull request overview
Test-only PR to stabilize three flaky AuTests under parallel ASAN runs by hardening helper behavior and improving AuTest process sequencing / matching.
Changes:
- Update
ssl-delay-serverhelper to ignoreSIGPIPEand retryaccept()onEINTR. - Make
thread_config’s thread-count helper identify the correcttraffic_serverprocess more reliably under ASAN by matching via CWD or command line. - Adjust
sigusr2test process ordering to remove a deadlocking Ready condition and clarify the intended startup chain.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/gold_tests/timeout/ssl-delay-server.cc |
Improves helper robustness against SIGPIPE and EINTR during accept. |
tests/gold_tests/thread_config/check_threads.py |
Improves ATS process identification under ASAN by broadening the matching criteria. |
tests/gold_tests/logging/sigusr2.test.py |
Removes deadlocking Ready gating and documents intended process ordering for SIGUSR2 log rotation. |
| @@ -199,8 +204,10 @@ main(int argc, char *argv[]) | |||
| for (;;) { | |||
| sfd = accept(listenfd, (struct sockaddr *)nullptr, nullptr); | |||
| if (sfd <= 0) { | |||
There was a problem hiding this comment.
accept() returns -1 on error, but can legally return fd 0 on success (e.g., if stdin is closed). The if (sfd <= 0) check will treat a valid connection as failure and exit, potentially reintroducing flakiness. Use sfd < 0 for the error check, keeping the EINTR retry logic the same.
| if (sfd <= 0) { | |
| if (sfd < 0) { |
| # Configure process order: ts starts first, then rotate moves diags.log, | ||
| # then Default sends SIGUSR2. No Ready condition needed on Default since the | ||
| # StartBefore chain already ensures ts is fully started before rotate runs. | ||
| tr1.Processes.Default.StartBefore(rotate_diags_log) | ||
| rotate_diags_log.StartBefore(diags_test.ts) |
There was a problem hiding this comment.
The comment says the StartBefore chain ensures TS is "fully started", but StartBefore only enforces launch ordering; it doesn’t guarantee diags.log exists before mv runs. With the Ready condition removed from Default, rotate_diags_log can still race TS startup and fail if diags.log hasn’t been created yet. Consider moving the file-exists gating to the rotate step (e.g., via a ready= condition on the StartBefore edge or by making the rotate command wait/retry) and/or adjust the comment to avoid implying readiness guarantees.
Summary
accept()on EINTR inssl-delay-serverto avoid spurious helper exits.sigusr2and improvethread_configprocess matching for ASAN command lines.Test plan