Do not attribute logical decoding aborts to xact_rollback#28
Open
Do not attribute logical decoding aborts to xact_rollback#28
Conversation
26e23d8 to
2c80918
Compare
c992cfb to
11b70a0
Compare
f58166e to
130c588
Compare
130c588 to
e4c9697
Compare
Logical decoding aborts the current transaction after decoding committed transactions to clean up catalog access and other transaction-local state. In a logical walsender this can be a top-level abort, so pg_stat_database.xact_rollback is incremented even though no user-visible transaction rolled back. Keep these internal cleanup aborts out of xact_rollback by adding AbortCurrentTransactionWithoutXactStats(), a narrow wrapper around AbortCurrentTransaction() that suppresses only the pg_stat_database xact_commit/xact_rollback counter update while preserving the rest of transaction cleanup. Add a TAP test that fails without the fix: five committed transactions decoded by a subscription produce a publisher xact_rollback delta of 5 when the walsender exits. With the fix, the delta remains 0. Reported-by: Rafael Thofehrn Castro Discussion: https://postgr.es/m/CAG0ozMo_xWQn%2BAvv8jzbbhePGp5OnhdO%2BYWTkdg4faWSXz0Jzg%40mail.gmail.com
e4c9697 to
72a73f5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v2 — reworked after a thorough synthetic review pass. Key shape changes vs. v1:
pgstat_cancel_xact_rollback()as a "decrement the counter" helper. It now uses a module-local flag inpgstat_database.c(pgstat_begin_internal_xact_cleanup()/pgstat_end_internal_xact_cleanup()), whichAtEOXact_PgStat_Database()reads to skip the bump entirely. No more bump-and-undo.SnapBuildClearExportedSnapshot()(same-shape bug, hit on replication-snapshot cleanup).src/test/subscription/t/100_bugs.plas a new bug section (not a new file). Asserts a delta-from-baseline (not a literal0afterpg_stat_reset), runs only 5 INSERTs, and polls walsender-gone filtered byapplication_name.Backpatch-through: 15trailer, noCo-Authored-Byin the PG commit body.Fixes report CAG0ozMo_xWQn+Avv8jzbbhePGp5OnhdO+YWTkdg4faWSXz0Jzg@mail.gmail.com (Rafael Thofehrn Castro, 2024-06-14), still present in
masteras of 2026-04-16. See also postgres-ai/tests-and-benchmarks#39 for the production-incident context.Root cause
ReorderBufferProcessTXN()ends each decoded transaction withAbortCurrentTransaction()for catalog cleanup.using_subtxn == false, entered viaStartTransactionCommand()), that abort is top-level, soAtEOXact_PgStat_Database(isCommit=false)bumps the backend-localpgStatXactRollback. Counts flush to shared stats on walsender exit → the spike.pg_logical_slot_get_changes()path (using_subtxn == true), the abort is of an internal subtransaction and goes throughAtEOSubXact_PgStat, which never touches the counter.Fix
pgStatInternalXactCleanupflag inpgstat_database.c, set/cleared by two public helpers.AtEOXact_PgStat_Database()early-returns when the flag is set (same slot as the existingparallelskip).ReorderBufferProcessTXN()bracketAbortCurrentTransaction()with the begin/end pair, gated on!using_subtxn.SnapBuildClearExportedSnapshot()gets the same bracket (unconditionally — that path is always top-level).applyparallelworker.c:1458is intentionally left alone: that is a real user-visible rollback on the subscriber side.Reproduction evidence
Two
postgres:17Docker containers:With the patch, the post-
DISABLErollback value stays at baseline.TAP test (RED / GREEN)
New section in
src/test/subscription/t/100_bugs.pl: 5 autocommit INSERTs,wait_for_catchup,ALTER SUBSCRIPTION xrb_sub DISABLE, poll walsender-gone filtered byapplication_name='xrb_sub', assertcmp_ok($final - $baseline, '==', 0).got: 5 / expected: 0— FAIL.Test plan
meson test --suite subscription --suite test_decoding— 42/42 + 3/3 pass (incl. the new100_bugs.plsection).meson test postgresql:recovery/006_logical_decoding postgresql:recovery/010_logical_decoding_timelines postgresql:pg_basebackup/030_pg_recvlogical— pass.meson test postgresql:recovery/029_stats_restart postgresql:pg_stat_statements/regress— pass.meson test postgresql:regress/regress— 248/248 pass.Review trail
needs revision/accept with minor/needs work. Consensus blockers: API shape (don't exposepgStatXactRollbackthrough a public helper),> 0silent-flooring guard, too-tightis(…, '0')assertion, missing fix insnapbuild.c. All addressed here.🤖 Generated with Claude Code