Skip to content

fix(podlogs): Follow mode exit bug and test improvements#9599

Merged
leonardoce merged 2 commits intomainfrom
dev/fix-flaky-podlogs-test
Jan 12, 2026
Merged

fix(podlogs): Follow mode exit bug and test improvements#9599
leonardoce merged 2 commits intomainfrom
dev/fix-flaky-podlogs-test

Conversation

@mnencia
Copy link
Member

@mnencia mnencia commented Dec 29, 2025

The test "should catch extra logs if given the follow option" was flaky in CI because it tested implementation details (counting loop iterations with tight timing) rather than actual behavior.

Redesigned the test to verify what Follow=true actually does: it keeps streaming until the context is cancelled. The test now uses Eventually/Consistently patterns that handle timing variations gracefully, making it robust across different environments.

The improved test exposed a bug in the production code: the streaming function would exit when all current log streams completed, even when Follow=true was set. This caused premature exit instead of continuing to poll for new or restarted pods. Fixed by changing the exit condition to only apply when Follow=false.

@mnencia mnencia requested a review from a team as a code owner December 29, 2025 16:34
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Dec 29, 2025
@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.27 release-1.28 labels Dec 29, 2025
@github-actions
Copy link
Contributor

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@NiccoloFei NiccoloFei force-pushed the dev/fix-flaky-podlogs-test branch from d6cc96d to 027c158 Compare December 29, 2025 16:36
@mnencia
Copy link
Member Author

mnencia commented Dec 29, 2025

/ok-to-merge test only change

@mnencia mnencia added the ok to merge 👌 This PR can be merged label Dec 29, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 30, 2025
@mnencia mnencia force-pushed the dev/fix-flaky-podlogs-test branch from 027c158 to 2bda92a Compare December 30, 2025 12:17
@mnencia mnencia force-pushed the dev/fix-flaky-podlogs-test branch from 2bda92a to 73a08a5 Compare January 7, 2026 17:55
@mnencia mnencia changed the title test(podlogs): make Follow=true test robust and behavior-focused fix(podlogs): Follow mode exit bug and test improvements Jan 10, 2026
@mnencia
Copy link
Member Author

mnencia commented Jan 10, 2026

One of the executions of the improved test failed in CI: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20791171388/job/59713483662?pr=9599

The failure exposed a bug in cluster_writer.go: the streaming function would exit when streamSet.isZero() returned true, even with Follow=true enabled. This violated the Follow semantics which should keep polling for new/restarted pods.

The race occurred because log streams could complete before the main loop checked for active streams, causing premature exit. Fixed by adding !csr.Options.Follow to the exit condition, ensuring Follow mode continues looping as intended.

@mnencia mnencia force-pushed the dev/fix-flaky-podlogs-test branch from 3203816 to a48a08c Compare January 10, 2026 22:35
@mnencia
Copy link
Member Author

mnencia commented Jan 10, 2026

/test

@github-actions
Copy link
Contributor

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20885430224

@mnencia mnencia force-pushed the dev/fix-flaky-podlogs-test branch from a48a08c to 22baf8e Compare January 10, 2026 22:42
The Follow=true test was flaky in CI because it tested implementation
details (counting loop iterations with tight timing) rather than actual
behavior.

Redesigned the test to verify what Follow=true actually does: it keeps
streaming until the context is cancelled, rather than exiting after one
read. The test now checks that streaming starts, continues running
without exiting on its own, and only stops when we cancel the context.

This approach uses Eventually/Consistently patterns that handle timing
variations gracefully, making it robust across different environments.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Fixed a bug where Follow=true would exit prematurely when all current
log streams completed, instead of continuing to poll for new or
restarted pods.

Changed the exit condition on zero active streams to only apply when
Follow=false, allowing Follow mode to keep looping as intended.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
@leonardoce leonardoce force-pushed the dev/fix-flaky-podlogs-test branch from 22baf8e to 53265c6 Compare January 12, 2026 16:03
@leonardoce leonardoce merged commit 76817ec into main Jan 12, 2026
34 checks passed
@leonardoce leonardoce deleted the dev/fix-flaky-podlogs-test branch January 12, 2026 16:19
cnpg-bot pushed a commit that referenced this pull request Jan 12, 2026
The test "should catch extra logs if given the follow option" was flaky
in CI because it tested implementation details (counting loop iterations
with tight timing) rather than actual behavior.

Redesigned the test to verify what Follow=true actually does: it keeps
streaming until the context is cancelled. The test now uses
Eventually/Consistently patterns that handle timing variations
gracefully, making it robust across different environments.

The improved test exposed a bug in the production code: the streaming
function would exit when all current log streams completed, even when
Follow=true was set. This caused premature exit instead of continuing to
poll for new or restarted pods. Fixed by changing the exit condition to
only apply when Follow=false.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
(cherry picked from commit 76817ec)
cnpg-bot pushed a commit that referenced this pull request Jan 12, 2026
The test "should catch extra logs if given the follow option" was flaky
in CI because it tested implementation details (counting loop iterations
with tight timing) rather than actual behavior.

Redesigned the test to verify what Follow=true actually does: it keeps
streaming until the context is cancelled. The test now uses
Eventually/Consistently patterns that handle timing variations
gracefully, making it robust across different environments.

The improved test exposed a bug in the production code: the streaming
function would exit when all current log streams completed, even when
Follow=true was set. This caused premature exit instead of continuing to
poll for new or restarted pods. Fixed by changing the exit condition to
only apply when Follow=false.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
(cherry picked from commit 76817ec)
cnpg-bot pushed a commit that referenced this pull request Jan 12, 2026
The test "should catch extra logs if given the follow option" was flaky
in CI because it tested implementation details (counting loop iterations
with tight timing) rather than actual behavior.

Redesigned the test to verify what Follow=true actually does: it keeps
streaming until the context is cancelled. The test now uses
Eventually/Consistently patterns that handle timing variations
gracefully, making it robust across different environments.

The improved test exposed a bug in the production code: the streaming
function would exit when all current log streams completed, even when
Follow=true was set. This caused premature exit instead of continuing to
poll for new or restarted pods. Fixed by changing the exit condition to
only apply when Follow=false.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
(cherry picked from commit 76817ec)
mnencia added a commit that referenced this pull request Jan 20, 2026
The test "should catch extra logs if given the follow option" was flaky
in CI because it tested implementation details (counting loop iterations
with tight timing) rather than actual behavior.

Redesigned the test to verify what Follow=true actually does: it keeps
streaming until the context is cancelled. The test now uses
Eventually/Consistently patterns that handle timing variations
gracefully, making it robust across different environments.

The improved test exposed a bug in the production code: the streaming
function would exit when all current log streams completed, even when
Follow=true was set. This caused premature exit instead of continuing to
poll for new or restarted pods. Fixed by changing the exit condition to
only apply when Follow=false.

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
(cherry picked from commit 76817ec)
mnencia added a commit that referenced this pull request Feb 5, 2026
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
mnencia added a commit that referenced this pull request Feb 5, 2026
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-requested ◀️ This pull request should be backported to all supported releases lgtm This PR has been approved by a maintainer no-issue ok to merge 👌 This PR can be merged release-1.25 release-1.27 release-1.28 size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants