Skip to content

Fix: re-open connection between migration-retry attempts (PostgresqlMigrator)#299

Merged
jeremydmiller merged 1 commit into
masterfrom
fix/migration-retry-reopens-connection
May 28, 2026
Merged

Fix: re-open connection between migration-retry attempts (PostgresqlMigrator)#299
jeremydmiller merged 1 commit into
masterfrom
fix/migration-retry-reopens-connection

Conversation

@jeremydmiller
Copy link
Copy Markdown
Member

What

PostgresqlMigrator.executeWithConcurrencyRetryAsync (added in #294 / closes #293) re-invokes cmd.ExecuteNonQueryAsync on the same DbCommand after a transient PostgresException, but doesn't account for Npgsql moving the underlying connection to Closed or Broken when the server-side error has aborted the session — typically 40P01 deadlock_detected and XX000 \"tuple concurrently updated\".

The next call then throws InvalidOperationException(\"Connection is not open\"), which isn't a PostgresException — so it slips past the catch filter and surfaces to the caller as a hard migration failure, defeating the whole point of the retry.

Why

Recurring intermittent failure across JasperFx/marten CI runs:

EventSourcingTests.end_to_end_event_capture_and_fetching_the_stream.
query_before_saving(tenancyStyle: Conjoined) [FAIL]
  Marten.Exceptions.MartenSchemaException : DDL Execution for 'All Configured Changes' Failed!
  ---- System.InvalidOperationException : Connection is not open
       at Npgsql.NpgsqlCommand.CheckAndGetConnection()
       at Weasel.Postgresql.PostgresqlMigrator.executeWithConcurrencyRetryAsync(DbCommand cmd, CancellationToken ct)
         in /_/src/Weasel.Postgresql/PostgresqlMigrator.cs:line 179

Hit on PRs JasperFx/marten#4576, #4578, #4582, #4584 — all the same stack frame.

Fix

After the backoff delay, call EnsureConnectionOpenAsync:

internal static async Task EnsureConnectionOpenAsync(DbCommand cmd, CancellationToken ct)
{
    if (cmd.Connection is null) return;
    if (cmd.Connection.State == ConnectionState.Open) return;

    // A Broken connection cannot be reopened without a Close first.
    if (cmd.Connection.State == ConnectionState.Broken)
    {
        await cmd.Connection.CloseAsync().ConfigureAwait(false);
    }

    await cmd.Connection.OpenAsync(ct).ConfigureAwait(false);
}

Pulled out as an internal static helper so the reopen rules are unit-testable against a fake DbCommand / DbConnection.

Tests

4 new unit tests in PostgresqlMigratorConcurrencyRetryTests:

Test Pins
ensure_connection_open_is_a_noop_when_already_open No call to Open/Close
ensure_connection_open_reopens_a_closed_connection Open called once, no Close
ensure_connection_open_closes_then_reopens_a_broken_connection Close then Open (in that order — OpenAsync on Broken throws)
ensure_connection_open_does_nothing_when_command_has_no_connection Defensive guard, doesn't throw

The retry loop itself races on the catalog and can't be exercised deterministically (called out in the existing test class header comment); the reopen helper is a pure function over (DbCommand, ConnectionState) and is fully testable.

Weasel.Postgresql.Tests.PostgresqlMigratorConcurrencyRetryTests: 16/16 passing locally (12 original + 4 new) on both net9.0 and net10.0.

🤖 Generated with Claude Code

…igrator)

The bounded retry in PostgresqlMigrator.executeWithConcurrencyRetryAsync
(#293 / #294) re-invokes cmd.ExecuteNonQueryAsync on the same DbCommand
after a transient PostgresException, but doesn't account for Npgsql
moving the underlying connection to Closed or Broken when the
server-side error has aborted the session (40P01 deadlock_detected and
XX000 "tuple concurrently updated" in particular). The retry then
throws InvalidOperationException("Connection is not open") — which
isn't a PostgresException, so it slips past the catch filter and
surfaces to the caller as a hard migration failure, defeating the
whole point of the retry.

Repro: recurring intermittent failure on
EventSourcingTests.end_to_end_event_capture_and_fetching_the_stream.
query_before_saving(tenancyStyle: Conjoined) in JasperFx/marten PRs
#4576, #4578, #4582, #4584 — all hit this exact path.

Fix: after the backoff delay, call EnsureConnectionOpenAsync — a small
internal helper that puts the connection back into Open state before
the next ExecuteNonQueryAsync attempt. Broken connections require a
Close before OpenAsync (OpenAsync on Broken throws).

Tests: 4 new unit tests in PostgresqlMigratorConcurrencyRetryTests
exercise the reopen rules deterministically against a fake DbCommand /
DbConnection — Open is a no-op, Closed reopens, Broken closes-then-reopens,
null Connection short-circuits. The retry loop itself races on the
catalog and can't be exercised deterministically, but the reopen
helper is a pure function over (DbCommand, ConnectionState) and is
fully testable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller merged commit 4244d56 into master May 28, 2026
21 checks passed
@jeremydmiller jeremydmiller deleted the fix/migration-retry-reopens-connection branch May 28, 2026 21:05
jeremydmiller added a commit that referenced this pull request May 28, 2026
Picks up the PostgresqlMigrator connection-reopen fix from #299 — the
bounded retry in executeWithConcurrencyRetryAsync now reopens a Closed
or Broken connection before re-invoking the command, addressing the
recurring "Connection is not open" failure that surfaced on
JasperFx/marten PRs #4576, #4578, #4582, #4584.
jeremydmiller added a commit to JasperFx/marten that referenced this pull request May 28, 2026
… fix

Weasel 9.0.2 (JasperFx/weasel#299) fixes
PostgresqlMigrator.executeWithConcurrencyRetryAsync so it reopens a
Closed/Broken connection before the retry attempt. That eliminates
the recurring "Connection is not open" failure on the conjoined
`EventSourcingTests.end_to_end_event_capture_and_fetching_the_stream.
query_before_saving` test that hit this PR + #4576 + #4578 + #4582.

Bumps Weasel.Postgresql + Weasel.EntityFrameworkCore 9.0.1 → 9.0.2 in
Directory.Packages.props (CPM). Marten.MemoryPack.Tests still 8/8
locally on top of the new Weasel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremydmiller added a commit to JasperFx/marten that referenced this pull request May 28, 2026
#4584)

* #4515 Phase 2: binary event serialization on Quick + BulkEventAppender

#4578 shipped the foundation with `Rich` mode only; the Quick paths
(QuickWithServerTimestamps default + Quick + bulk COPY) all guarded
against binary events at store-build time. This lifts that constraint —
binary serialization now works on every EventAppendMode and through the
BulkEventAppender.

## Wire-format change: `mt_quick_append_events` grows a `bdatas bytea[]` param

The PostgreSQL function used by both Quick variants now accepts a
parallel `bdatas bytea[]` parameter right after `bodies jsonb[]`. The
INSERT writes `bdatas[index]` into `mt_events.bdata`. For JSON events
the array slot is NULL; for binary events `bodies[index]` is the `{}`
placeholder and `bdatas[index]` carries the real payload. Same
on-disk row shape as the Rich path: `bdata IS NULL` remains the
discriminator that the existing read path keys off.

Weasel's standard function-diff migration handles the signature
change as DROP + CREATE on existing installations; existing JSON rows
are untouched.

## Call-site dispatch — same shape as Rich

`PostgresEventStoreDialect.BuildQuickDescriptor` and
`BuildQuickWithServerTimestampsDescriptor` install the same
`serializeEventData` / `serializeEventBdata` closures the Rich
descriptor uses (look up `EventMapping`, branch on `IsBinary`).
`QuickAppendEventsOperationBase.writeBasicParameters` now accepts an
optional `Func<IEvent, byte[]?> serializeEventBdata` and binds the
parallel `bdatas bytea[]` array.

## BulkEventAppender — bdata in the COPY column list

`buildEventColumns()` adds `bdata` right after `data`; `writeEventRow`
looks up the EventMapping per event and writes either the binary
payload (for `[BinaryEvent]` types) or NULL (for JSON). The COPY
format already supports NULL values per column, so no schema
relaxation is needed.

## Removed: AssertNoBinaryEventsForQuickMode

The Phase 1 guard in `PostgresEventStoreDialect` that threw at
store-build time if a binary event type was registered with a
non-Rich AppendMode is gone — no longer needed.

## Tests

Three new tests in `QuickModeBinaryEventTests` (separate fixture so
each test can dial in its own AppendMode):

- `quick_with_server_timestamps_round_trips_binary_events` — mixed
  binary + JSON stream on the default mode, round-trip via the PG
  function.
- `quick_mode_round_trips_binary_events` — explicit `Quick` mode.
- `quick_mode_binary_events_land_in_bdata_column` — on-disk shape
  verification: binary rows have `data = '{}'` + `bdata != NULL`;
  JSON rows have `data = real JSON` + `bdata = NULL`.

Regression checks:
- Full Marten.MemoryPack.Tests suite: 8/8 ✅
- EventSourcingTests.end_to_end_event_capture_and_fetching: 83/83 ✅
- DaemonTests.Bug_3059_double_application: 1/1 ✅ (re-running the
  test that flushed out the column-count bug in PR #4578's first CI run)

## Docs

`docs/events/binary-serialization.md` updated:
- Removed the "EventAppendMode.Rich only" + "No bulk appender support"
  constraints from the Constraints section.
- Added a new "Append modes" section explaining the feature works
  across all three modes + BulkEventAppender.
- Quick-start example no longer forces `AppendMode = Rich`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Consume Weasel 9.0.2 — picks up the migration-retry connection-reopen fix

Weasel 9.0.2 (JasperFx/weasel#299) fixes
PostgresqlMigrator.executeWithConcurrencyRetryAsync so it reopens a
Closed/Broken connection before the retry attempt. That eliminates
the recurring "Connection is not open" failure on the conjoined
`EventSourcingTests.end_to_end_event_capture_and_fetching_the_stream.
query_before_saving` test that hit this PR + #4576 + #4578 + #4582.

Bumps Weasel.Postgresql + Weasel.EntityFrameworkCore 9.0.1 → 9.0.2 in
Directory.Packages.props (CPM). Marten.MemoryPack.Tests still 8/8
locally on top of the new Weasel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Document the versioned-event-types pattern for binary schema evolution

Closes #4579 with a docs-only answer rather than building a binary-side
upcaster framework. The JSON upcasters
(Marten.Services.Json.Transformations) operate on the JSON wire form and
don't generalize to a byte[] payload; designing a typed transform shape
for binary events is non-trivial and the use case can be addressed
end-to-end today by leaning on Marten's existing per-event-type
registry.

The recommendation: introduce a new event type for each schema change
(e.g. TripStarted -> TripStartedV2), have the aggregate handle both
versions, and let the coexistence design carry old rows + new rows on
the same stream without migration. The only caveat is that MemoryPack's
in-place backward-compatible field evolution works for additive-only
changes too, but stops at the serializer's tolerance rules (renames,
type changes, splits) — versioned event types work for every shape of
change and stay explicit.

Replaces the "No upcaster support" constraint section with a "Schema
evolution — use versioned event types" section that gives the
recommended pattern with code samples + a sub-section on the
"why-not-in-place" tradeoff + a note on mixing binary + JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migration DDL still races under concurrent EnsureStorageExists: XX000 "tuple concurrently updated" (follow-up to #282)

1 participant