Fix: re-open connection between migration-retry attempts (PostgresqlMigrator)#299
Merged
Merged
Conversation
…igrator) The bounded retry in PostgresqlMigrator.executeWithConcurrencyRetryAsync (#293 / #294) re-invokes cmd.ExecuteNonQueryAsync on the same DbCommand after a transient PostgresException, but doesn't account for Npgsql moving the underlying connection to Closed or Broken when the server-side error has aborted the session (40P01 deadlock_detected and XX000 "tuple concurrently updated" in particular). The retry then throws InvalidOperationException("Connection is not open") — which isn't a PostgresException, so it slips past the catch filter and surfaces to the caller as a hard migration failure, defeating the whole point of the retry. Repro: recurring intermittent failure on EventSourcingTests.end_to_end_event_capture_and_fetching_the_stream. query_before_saving(tenancyStyle: Conjoined) in JasperFx/marten PRs #4576, #4578, #4582, #4584 — all hit this exact path. Fix: after the backoff delay, call EnsureConnectionOpenAsync — a small internal helper that puts the connection back into Open state before the next ExecuteNonQueryAsync attempt. Broken connections require a Close before OpenAsync (OpenAsync on Broken throws). Tests: 4 new unit tests in PostgresqlMigratorConcurrencyRetryTests exercise the reopen rules deterministically against a fake DbCommand / DbConnection — Open is a no-op, Closed reopens, Broken closes-then-reopens, null Connection short-circuits. The retry loop itself races on the catalog and can't be exercised deterministically, but the reopen helper is a pure function over (DbCommand, ConnectionState) and is fully testable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremydmiller
added a commit
that referenced
this pull request
May 28, 2026
Picks up the PostgresqlMigrator connection-reopen fix from #299 — the bounded retry in executeWithConcurrencyRetryAsync now reopens a Closed or Broken connection before re-invoking the command, addressing the recurring "Connection is not open" failure that surfaced on JasperFx/marten PRs #4576, #4578, #4582, #4584.
jeremydmiller
added a commit
to JasperFx/marten
that referenced
this pull request
May 28, 2026
… fix Weasel 9.0.2 (JasperFx/weasel#299) fixes PostgresqlMigrator.executeWithConcurrencyRetryAsync so it reopens a Closed/Broken connection before the retry attempt. That eliminates the recurring "Connection is not open" failure on the conjoined `EventSourcingTests.end_to_end_event_capture_and_fetching_the_stream. query_before_saving` test that hit this PR + #4576 + #4578 + #4582. Bumps Weasel.Postgresql + Weasel.EntityFrameworkCore 9.0.1 → 9.0.2 in Directory.Packages.props (CPM). Marten.MemoryPack.Tests still 8/8 locally on top of the new Weasel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremydmiller
added a commit
to JasperFx/marten
that referenced
this pull request
May 28, 2026
#4584) * #4515 Phase 2: binary event serialization on Quick + BulkEventAppender #4578 shipped the foundation with `Rich` mode only; the Quick paths (QuickWithServerTimestamps default + Quick + bulk COPY) all guarded against binary events at store-build time. This lifts that constraint — binary serialization now works on every EventAppendMode and through the BulkEventAppender. ## Wire-format change: `mt_quick_append_events` grows a `bdatas bytea[]` param The PostgreSQL function used by both Quick variants now accepts a parallel `bdatas bytea[]` parameter right after `bodies jsonb[]`. The INSERT writes `bdatas[index]` into `mt_events.bdata`. For JSON events the array slot is NULL; for binary events `bodies[index]` is the `{}` placeholder and `bdatas[index]` carries the real payload. Same on-disk row shape as the Rich path: `bdata IS NULL` remains the discriminator that the existing read path keys off. Weasel's standard function-diff migration handles the signature change as DROP + CREATE on existing installations; existing JSON rows are untouched. ## Call-site dispatch — same shape as Rich `PostgresEventStoreDialect.BuildQuickDescriptor` and `BuildQuickWithServerTimestampsDescriptor` install the same `serializeEventData` / `serializeEventBdata` closures the Rich descriptor uses (look up `EventMapping`, branch on `IsBinary`). `QuickAppendEventsOperationBase.writeBasicParameters` now accepts an optional `Func<IEvent, byte[]?> serializeEventBdata` and binds the parallel `bdatas bytea[]` array. ## BulkEventAppender — bdata in the COPY column list `buildEventColumns()` adds `bdata` right after `data`; `writeEventRow` looks up the EventMapping per event and writes either the binary payload (for `[BinaryEvent]` types) or NULL (for JSON). The COPY format already supports NULL values per column, so no schema relaxation is needed. ## Removed: AssertNoBinaryEventsForQuickMode The Phase 1 guard in `PostgresEventStoreDialect` that threw at store-build time if a binary event type was registered with a non-Rich AppendMode is gone — no longer needed. ## Tests Three new tests in `QuickModeBinaryEventTests` (separate fixture so each test can dial in its own AppendMode): - `quick_with_server_timestamps_round_trips_binary_events` — mixed binary + JSON stream on the default mode, round-trip via the PG function. - `quick_mode_round_trips_binary_events` — explicit `Quick` mode. - `quick_mode_binary_events_land_in_bdata_column` — on-disk shape verification: binary rows have `data = '{}'` + `bdata != NULL`; JSON rows have `data = real JSON` + `bdata = NULL`. Regression checks: - Full Marten.MemoryPack.Tests suite: 8/8 ✅ - EventSourcingTests.end_to_end_event_capture_and_fetching: 83/83 ✅ - DaemonTests.Bug_3059_double_application: 1/1 ✅ (re-running the test that flushed out the column-count bug in PR #4578's first CI run) ## Docs `docs/events/binary-serialization.md` updated: - Removed the "EventAppendMode.Rich only" + "No bulk appender support" constraints from the Constraints section. - Added a new "Append modes" section explaining the feature works across all three modes + BulkEventAppender. - Quick-start example no longer forces `AppendMode = Rich`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Consume Weasel 9.0.2 — picks up the migration-retry connection-reopen fix Weasel 9.0.2 (JasperFx/weasel#299) fixes PostgresqlMigrator.executeWithConcurrencyRetryAsync so it reopens a Closed/Broken connection before the retry attempt. That eliminates the recurring "Connection is not open" failure on the conjoined `EventSourcingTests.end_to_end_event_capture_and_fetching_the_stream. query_before_saving` test that hit this PR + #4576 + #4578 + #4582. Bumps Weasel.Postgresql + Weasel.EntityFrameworkCore 9.0.1 → 9.0.2 in Directory.Packages.props (CPM). Marten.MemoryPack.Tests still 8/8 locally on top of the new Weasel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document the versioned-event-types pattern for binary schema evolution Closes #4579 with a docs-only answer rather than building a binary-side upcaster framework. The JSON upcasters (Marten.Services.Json.Transformations) operate on the JSON wire form and don't generalize to a byte[] payload; designing a typed transform shape for binary events is non-trivial and the use case can be addressed end-to-end today by leaning on Marten's existing per-event-type registry. The recommendation: introduce a new event type for each schema change (e.g. TripStarted -> TripStartedV2), have the aggregate handle both versions, and let the coexistence design carry old rows + new rows on the same stream without migration. The only caveat is that MemoryPack's in-place backward-compatible field evolution works for additive-only changes too, but stops at the serializer's tolerance rules (renames, type changes, splits) — versioned event types work for every shape of change and stay explicit. Replaces the "No upcaster support" constraint section with a "Schema evolution — use versioned event types" section that gives the recommended pattern with code samples + a sub-section on the "why-not-in-place" tradeoff + a note on mixing binary + JSON. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
PostgresqlMigrator.executeWithConcurrencyRetryAsync(added in #294 / closes #293) re-invokescmd.ExecuteNonQueryAsyncon the sameDbCommandafter a transientPostgresException, but doesn't account for Npgsql moving the underlying connection toClosedorBrokenwhen the server-side error has aborted the session — typically40P01 deadlock_detectedandXX000 \"tuple concurrently updated\".The next call then throws
InvalidOperationException(\"Connection is not open\"), which isn't aPostgresException— so it slips past the catch filter and surfaces to the caller as a hard migration failure, defeating the whole point of the retry.Why
Recurring intermittent failure across JasperFx/marten CI runs:
Hit on PRs JasperFx/marten#4576, #4578, #4582, #4584 — all the same stack frame.
Fix
After the backoff delay, call
EnsureConnectionOpenAsync:Pulled out as an
internal statichelper so the reopen rules are unit-testable against a fakeDbCommand/DbConnection.Tests
4 new unit tests in
PostgresqlMigratorConcurrencyRetryTests:ensure_connection_open_is_a_noop_when_already_openOpen/Closeensure_connection_open_reopens_a_closed_connectionOpencalled once, noCloseensure_connection_open_closes_then_reopens_a_broken_connectionClosethenOpen(in that order —OpenAsynconBrokenthrows)ensure_connection_open_does_nothing_when_command_has_no_connectionThe retry loop itself races on the catalog and can't be exercised deterministically (called out in the existing test class header comment); the reopen helper is a pure function over
(DbCommand, ConnectionState)and is fully testable.Weasel.Postgresql.Tests.PostgresqlMigratorConcurrencyRetryTests: 16/16 passing locally (12 original + 4 new) on bothnet9.0andnet10.0.🤖 Generated with Claude Code