Skip to content

Database Utilities for our boxes#4039

Draft
iancooper wants to merge 112 commits intomasterfrom
database_migration
Draft

Database Utilities for our boxes#4039
iancooper wants to merge 112 commits intomasterfrom
database_migration

Conversation

@iancooper
Copy link
Copy Markdown
Member

@iancooper iancooper commented Mar 1, 2026

We had some useful feedback that the two biggest usability issues were the complexity of configuration (covered by a separate ADR) and the management of a box (inbox/outbox).

This PR addresses the management of a box. It derives from lessons from the WebAPI sample and leans into Aspire, because that is the expectation for developers.

Scope

This PR delivers both specifications together, so the bootstrap path is production-ready when the feature first ships:

  • Spec 0023 — Box Database Migration (ADR 0053)
    Core provisioning infrastructure: IAmABoxProvisioner / IAmABoxMigrationRunner, BoxTableState, __BrighterMigrationHistory, advisory locking per backend, fail-fast hosted service, Aspire connection-name overloads.

  • Spec 0027 — Box Schema Versioning and Migrations (ADR 0057)
    Proper version chain (V1..V_latest) for outbox (7 versions) and inbox (2 versions) across MSSQL/Postgres/MySQL/SQLite, plus subset-based version detection so existing pre-DataRef/SpecVersion tables bootstrap correctly. Spanner is fresh-install-only.

Bundling these resolves review finding R1 — without Spec 0027, the Spec 0023 bootstrap path treats any pre-V_latest table as unrecognised and the runner attempts a CREATE TABLE that fails. Spec 0027 also addresses R2 (TOCTOU race on bootstrap insert), R4 (Spanner concurrency), and R5 (payload-mode validator test coverage on non-MSSQL backends).

Breaking Changes

  • IAmARelationalDatabaseConfiguration.SchemaName — a new string? SchemaName { get; } member is added to this public interface. The core Paramore.Brighter package targets netstandard2.0, which does not support default interface members, so this is a source-breaking change for any external code that implements IAmARelationalDatabaseConfiguration. All in-tree implementors are updated; external implementors must add the property (returning null reproduces the previous behaviour).

    ADR 0057 §10 documents the rationale; commit 297ca030f records the explicit decision to accept the abstract member rather than split target frameworks for one property.

iancooper and others added 2 commits March 1, 2026 19:55
Requirements and architectural design for a modular library that creates
and migrates Inbox/Outbox database tables with .NET Aspire integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@iancooper iancooper self-assigned this Mar 1, 2026
@iancooper iancooper added 2 - In Progress feature request .NET Pull requests that update .net code V10.X labels Mar 1, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 1, 2026

PR Review: Box Database Migration — ADR & Specification

This is a well-thought-out design for an important gap in Brighter's usability story. The problem statement is clear, the role-based architecture follows Brighter's Responsibility-Driven Design principles, and the alternatives-considered section is thorough. The concurrency control and bootstrap path for pre-migration installations are the two trickiest aspects — both are addressed.

Here are my findings, ranging from a design bug to minor consistency issues:


Design Bug: internal Registrations Breaks Cross-Package Extension

File: docs/adr/0053-box-database-migration.md (section 3)

The ADR defines BoxProvisioningOptions with an internal list:

public class BoxProvisioningOptions
{
    internal List<Action<IServiceCollection>> Registrations { get; } = [];
}

But then the MSSQL backend package (Paramore.Brighter.BoxProvisioning.MsSql) is shown calling options.Registrations.Add(...) from its extension methods. Because Registrations is internal, it is only visible within the Paramore.Brighter.BoxProvisioning assembly. Extension methods in a separate package (Paramore.Brighter.BoxProvisioning.MsSql) will get a compile error.

Suggested fix: Expose a public registration method while keeping the list internal:

public class BoxProvisioningOptions
{
    private readonly List<Action<IServiceCollection>> _registrations = [];
    internal IReadOnlyList<Action<IServiceCollection>> Registrations => _registrations;

    public void Add(Action<IServiceCollection> registration)
        => _registrations.Add(registration);
}

Then backend extension methods call options.Add(services => { ... }) instead of directly accessing options.Registrations.


Missing Error Handling in BoxProvisioningHostedService

File: docs/adr/0053-box-database-migration.md (section 2)

If ProvisionAsync throws (e.g., database unreachable, migration failed), the exception propagates unhandled through StartAsync and crashes the generic host at startup. While fail-fast on schema errors is often desirable, it should be a documented decision. The current code also logs "Provisioned {BoxType}" before the outcome is known — if the provisioner throws, the success log is never emitted but no failure log appears either.

Consider: Either wrap each ProvisionAsync in try/catch with structured error logging and re-throw, or explicitly document in the ADR that fail-fast on provisioning errors is intentional.


Invalid MSSQL Syntax in Migration History DDL

File: docs/adr/0053-box-database-migration.md (section 5)

The ADR shows:

CREATE TABLE IF NOT EXISTS [__BrighterMigrationHistory] (

MSSQL does not support IF NOT EXISTS in CREATE TABLE statements. The correct MSSQL idiom uses sys.tables:

IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = '__BrighterMigrationHistory' AND schema_id = SCHEMA_ID('dbo'))
BEGIN
    CREATE TABLE [__BrighterMigrationHistory] ( ... )
END

This is a documentation-level issue now but will become a runtime bug when implemented. Worth correcting in the ADR so the implementation has accurate reference material.


Provisioner Ordering Not Enforced

File: docs/adr/0053-box-database-migration.md (section 1 / section 2)

The ADR states the BoxProvisioningHostedService "decides ordering (outbox before inbox if both present)" but the shown implementation iterates IEnumerable<IAmABoxProvisioner> in DI registration order with no ordering logic. If the ordering guarantee matters, enforce it explicitly:

var ordered = _provisioners.OrderBy(p => p.BoxType == BoxType.Outbox ? 0 : 1);

If ordering is deliberately left to DI registration order, the ADR should say so rather than implying the service itself handles ordering.


ADR Status Is "Accepted" on a Draft PR

The ADR header reads Status: Accepted but the PR is in DRAFT state. The conventional status for an ADR under review is "Proposed" — update to "Accepted" when the PR is merged.


Naming Inconsistency Between Requirements and ADR

File: specs/0023-box_database_migration/requirements.md (FR-3)

The requirements doc names the package Paramore.Brighter.Extensions.Hosting.MsSql, but the ADR uses Paramore.Brighter.BoxProvisioning.MsSql. Both documents should agree before implementation begins. The ADR naming (BoxProvisioning.*) is more consistent with the feature name and Brighter's existing conventions.


Forward-Only Migrations Should Be Explicitly Documented

File: docs/adr/0053-box-database-migration.md (section 2)

IAmABoxMigration has UpScript but no DownScript. Since the property name implies there could be a DownScript, the ADR should explicitly state this is a forward-only design and why (e.g., DDL rollbacks are rarely safe or needed for additive column changes).


MySQL Advisory Lock Timeout Unspecified

File: docs/adr/0053-box-database-migration.md (section 5, Concurrency Control)

The MySQL locking entry shows GET_LOCK('BrighterMigration_{tableName}', timeout) with timeout undefined. In slow environments (cold-starting containers, overloaded databases), a too-short timeout causes spurious failures; too long blocks startup indefinitely. Recommend specifying a default (e.g., 30 seconds) and making it configurable in BoxProvisioningOptions.


What is Done Well

  • Problem statement is precise: The GetExistsQuery inconsistency table across backends is excellent motivation for a unified abstraction.
  • Bootstrap path (section 7): Handling pre-migration installations via DetectCurrentVersionAsync() is the right approach and is well-explained.
  • Concurrency control (section 5): Documenting per-backend locking mechanisms is exactly the right level of detail for an ADR.
  • Alternatives considered: FluentMigrator, EF Core, single-package, and extending existing builders are all reasonably rejected with clear rationale.
  • Backward compatibility: Preserving the existing static builders while the new library calls them internally is a clean approach.
  • UpScript naming: The note about naming UpScript rather than UpSql for future non-SQL backend extensibility is a thoughtful detail.

Summary: The core architecture is sound. The internal Registrations accessor is the only issue that would prevent the design from compiling as written — worth resolving before tasks are created from this ADR. The MSSQL IF NOT EXISTS syntax and the other points are lower priority but worth addressing in the ADR before implementation begins so the implementation has accurate reference material.

Fix six issues identified during design review of box database migration:

- Fix BoxProvisioningOptions internal Registrations breaking cross-package
  access: expose public Add() method, keep list internal
- Add fail-fast error handling: catch, log diagnostics, wrap in
  ConfigurationException consistent with Brighter conventions
- Fix invalid MSSQL DDL: replace CREATE TABLE IF NOT EXISTS with
  IF NOT EXISTS (SELECT FROM sys.tables) pattern
- Enforce outbox-before-inbox provisioner ordering in hosted service
- Align package naming: requirements FR-3 now uses BoxProvisioning.*
  matching the ADR
- Document forward-only migration design (no DownScript) with rationale
- Specify MySQL GET_LOCK timeout default (30s) and make configurable
  via BoxProvisioningOptions.MigrationLockTimeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 2, 2026

Follow-up Review: ADR 0053 after Design Review Fixes

All six issues from the previous review have been addressed correctly. This follow-up covers remaining issues in the updated document.


1. Design Gap: MigrationLockTimeout Not Wired to the Migration Runner

File: docs/adr/0053-box-database-migration.md (sections 3 and 5)

MigrationLockTimeout is added to BoxProvisioningOptions but the provisioner shown in section 4 does not receive it. The runner is created as new MsSqlBoxMigrationRunner(_configuration) with no timeout parameter, and the extension method registers new MsSqlOutboxProvisioner(configuration) without passing options.MigrationLockTimeout.

The ADR should specify how this value flows to the runner. One approach: backend extension methods capture the timeout and pass it to the provisioner constructor. Without this, MigrationLockTimeout is configurable but silently ignored at runtime.


2. MSSQL Advisory Lock Resource Not Per-Table

File: docs/adr/0053-box-database-migration.md (section 5, Concurrency Control)

MSSQL uses sp_getapplock with Resource='BrighterMigration' (a global string), while MySQL correctly scopes the lock to 'BrighterMigration_{tableName}'. This serializes migrations across all tables on the same database instance rather than just the target table. Since provisioners run sequentially this will not deadlock, but it adds unnecessary contention between unrelated applications on the same DB server. The MSSQL lock should include the table name for consistency with MySQL.


3. PostgreSQL Advisory Lock Hash Function Not Specified

File: docs/adr/0053-box-database-migration.md (section 5, Concurrency Control)

The table says pg_advisory_lock(hash) where hash is derived from the table name but does not specify the hash function. Without this, two implementers could choose different algorithms and produce non-colliding locks. Recommend specifying explicitly, e.g. hashtext('BrighterMigration_' || tableName) cast to bigint.


4. Component Table Still Says up/down DDL

File: docs/adr/0053-box-database-migration.md (section 1)

The IAmABoxMigration row in the component overview table says 'Describes a single migration step: version number, description, up/down DDL'. The phrase up/down DDL implies a DownScript exists, which contradicts the forward-only design documented in section 2. Should say 'forward migration DDL' or 'up DDL only'.


5. README Status Is Stale

File: specs/0023-box_database_migration/README.md

The README shows Status: Requirements and marks both checklist items as 'awaiting approval'. The third commit added both .requirements-approved and .design-approved marker files but did not update the README. It should reflect Status: Design (Approved) with both items checked as approved.


What the Fixes Got Right

  • BoxProvisioningOptions pattern (private _registrations, internal IReadOnlyList Registrations, public Add()) correctly solves the cross-assembly accessibility problem.
  • StartAsync ordering (outbox before inbox via OrderBy) is correct; the explanatory comment is valuable.
  • Fail-fast catch/log/ConfigurationException wrapping is consistent with Brighter conventions.
  • MSSQL history table DDL is now correct (sys.tables pattern).
  • MigrationLockTimeout with 30-second default is the right approach for MySQL.
  • Package naming is now consistent between requirements and ADR.

Summary: Items 2-5 are minor. Item 1 (MigrationLockTimeout wiring) is worth resolving in the ADR before tasks are written — an implementer following the ADR as written would produce code where the timeout is configurable but has no effect.

Wire MigrationLockTimeout through provisioner to migration runner,
scope MSSQL advisory lock per-table, specify PostgreSQL hash function,
fix component table description, and update spec README status.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 4, 2026

Test comment from review

@claude
Copy link
Copy Markdown

claude Bot commented Mar 4, 2026

PR Review: Database Utilities for Box Management (ADR + Spec)

This is a well-structured spec PR. The ADR is thorough and the decision to build a dedicated BoxProvisioning library rather than extending existing static builders is sound.


Strengths

  • Clear problem statement - The four gaps (no migrations, no unified abstraction, sample-only orchestration, no Aspire integration) are well-articulated. The table of inconsistent GetExistsQuery signatures is a compelling motivator.
  • Role-Based Architecture naming - Labelling components with RDD roles (Coordinator, Service Provider, Information Holder) is consistent with Brighter design vocabulary.
  • Deliberate forward-only migrations - The rationale for no DownScript is sound for a library-level tool.
  • Bootstrap for pre-migration installs - The synthetic history row approach is pragmatic and avoids breaking existing deployments.
  • Alternatives considered - Rejecting FluentMigrator and EF Core Migrations is well-reasoned; avoids forcing heavy dependencies on consumers.
  • UpScript over UpSql - Smart naming for future non-SQL backend extensibility.

Issues

1. Bug in BoxProvisioningHostedService log statement

The example has one {BoxType} placeholder but passes the value twice. The second argument is unused. Either remove the duplicate or add a second named placeholder.

2. DetectCurrentVersionAsync() bootstrap assumption

The default returns 1, assuming all pre-migration installs are at version 1. If a version-2 table was created manually, the bootstrapper could insert a synthetic v1 history row and then try to apply v2 migrations to a table that already has v2 columns. Recommend documenting the v1 assumption in the interface contract, and considering column-inspection via INFORMATION_SCHEMA.COLUMNS on MSSQL/PostgreSQL to infer a more accurate starting version.

3. PostgreSQL advisory lock hash collision

pg_advisory_lock(hashtext('BrighterMigration_' || tableName)::bigint) - two different table names could theoretically produce the same hash, causing spurious contention. Using the two-argument form pg_advisory_lock(bigint, bigint) with a Brighter namespace constant as arg1 and the hash as arg2 eliminates this. Worth noting as a known trade-off at minimum.

4. MigrationLockTimeout unit conversion

MigrationLockTimeout is a TimeSpan, but MySQL GET_LOCK expects whole seconds and MSSQL sp_getapplock expects milliseconds as an int. The ADR (Section 5) should document the per-backend unit conversion explicitly to prevent implementation bugs.

5. Aspire Hosting extension - insufficient detail

Section 8 mentions WithBrighterOutbox(sqlServer, tableName: "Outbox") but does not say what it does. Does it provision database infrastructure, annotate the resource, or set environment variables? This needs more detail for implementers.

6. Binary vs text payload and migration versioning

binaryMessagePayload produces a structurally different table schema. If a user switches this flag after initial provisioning, version 1 is already marked applied in the history table but the schema does not match the new mode. Recommend documenting this as unsupported post-creation, or adding a schema validation step.

7. UseBoxProvisioning - double-registration

Calling UseBoxProvisioning more than once would register multiple BoxProvisioningHostedService instances, running all provisioners multiple times at startup. A simple registration guard is worth adding.


Minor Nits

  • PostgreSQL outbox binary variant missing IF NOT EXISTS - confirm this is addressed in migration v1 DDL or tracked separately.
  • No tasks.md yet - expected for a DRAFT PR, but this is the next step before implementation.

Summary

The architecture is coherent, trade-offs are well-reasoned, and the modular package structure is the right call. Key items before creating tasks.md:

  1. Fix the duplicate argument in the LogError call.
  2. Clarify DetectCurrentVersionAsync contract and v1 assumption.
  3. Expand the Aspire Hosting extension description in Section 8.
  4. Document TimeSpan-to-backend-unit conversions in Section 5.
  5. Document binary/text payload mode as unsupported to change post-creation, or add a guard.

Looking forward to tasks.md and the implementation!

Fix duplicate log placeholder, document DetectCurrentVersionAsync v1
assumption, add PostgreSQL hash collision trade-off note, document
per-backend TimeSpan unit conversions, expand Aspire hosting extension
description, document binary/text payload mode as unsupported to change
post-creation, and guard against double-registration of hosted service.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 4, 2026

PR Review: Database Utilities for Boxes (ADR 53 + Spec 0023)

This is a design-only PR (ADR + requirements spec, no implementation yet), so the review focuses on the architectural decisions and their implications.

Overall Assessment

The design is well-thought-out. The problem statement is precise, the role-based architecture is consistent with Brighter's RDD approach, and the alternatives analysis is thorough. A few areas worth discussing before implementation begins.


Strengths

  • Fail-fast behavior on startup is the right call - an app that cannot provision its box tables should not start.
  • Forward-only migrations is the right safety choice; DDL rollbacks risk data loss.
  • Bootstrap path for pre-migration installations is well-designed and necessary for brownfield adoption.
  • Advisory lock concurrency control is properly per-backend - MySQL GET_LOCK, MSSQL sp_getapplock, PostgreSQL pg_advisory_lock. The table of backend/unit conversions is a nice touch.
  • Reusing existing builder DDL for version-1 migrations avoids DDL drift and is the right call.
  • Modular package structure is consistent with Brighter's existing per-backend pattern.

Issues and Questions

1. PostgreSQL advisory lock has no timeout

The design notes:

PostgreSQL: pg_advisory_lock ... blocks indefinitely

In a Kubernetes rolling deployment, a pod crashing mid-migration releases the session-level lock on connection close, so true indefinite deadlock is not the main concern. The concern is that an indefinitely blocking startup gives operators no signal about what is happening. Consider using pg_try_advisory_lock with a retry loop and the configurable MigrationLockTimeout, matching the MSSQL/MySQL behavior. At minimum, emit a "waiting for migration lock" log message so operators understand why startup is slow.

2. DetectCurrentVersionAsync default of 1 is risky for newer fresh installs

The default implementation returns 1, assuming the table was created by the original static builder. But if a user creates a fresh database with a current Brighter version that already includes DataRef/SpecVersion columns in the DDL, the bootstrap path would record version 1 as applied and then try to apply version 2's ALTER TABLE ADD COLUMN - which fails with "column already exists" on MySQL (no IF NOT EXISTS COLUMN support).

The mitigation ("backend-specific provisioners should override to introspect actual column existence") needs to be called out as required for every backend that ships migrations beyond version 1, not optional. The default returning 1 should be a last resort, not the expected path for installs created with newer DDL.

3. __BrighterMigrationHistory PK does not include schema name

The composite PK is (BoxTableName, MigrationVersion). In a multi-tenant deployment where dbo.Outbox and tenant2.Outbox coexist in the same database, migration history rows will collide. Consider (SchemaName, BoxTableName, MigrationVersion) as the PK, with SchemaName defaulting to the database default schema where omitted.

4. MsSqlOutboxProvisioner constructs MsSqlBoxMigrationRunner internally

The ADR says IAmABoxProvisioner and IAmABoxMigrationRunner are interfaces testable via mocks, but the example implementation hard-codes runner construction inside ProvisionAsync:

public async Task ProvisionAsync(CancellationToken cancellationToken = default)
{
    var runner = new MsSqlBoxMigrationRunner(_configuration, _migrationLockTimeout);

The runner should be injected (constructor injection or via a factory) to preserve the testability guarantee stated in the ADR.

5. binaryMessagePayload mode mismatch is silent

The provisioner does not validate that the existing table schema matches the configured payload mode.

A misconfigured payload mode risks silent data corruption (binary data stored as text or vice versa) rather than a startup failure. Consider adding a schema introspection step that checks the actual column type against the configured mode and logs a warning or throws on mismatch. This class of misconfiguration can be detected at startup.

6. Connection lifecycle is unspecified in the design

Provisioners receive IAmARelationalDatabaseConfiguration (connection string) but the design does not show how connections are opened and closed inside the migration runner. Is this raw SqlConnection, IAmARelationalDbConnectionProvider, or something else? The implementation tasks should be explicit about the connection strategy, especially for the advisory lock which must be held on the same connection for the duration of migrations.

7. specs/.current-spec pointer change

The PR changes .current-spec from 0021-Error-Examples to 0023-box_database_migration. Is spec 0021 complete? If still active, this could affect spec tooling for that spec.


Minor Notes

  • NFR-2 says "Migration version checks should be lightweight (single query to check current version)", but the runner makes several queries per startup (create history table, query applied versions, conditionally run migrations). "A small number of lightweight queries" would be a more accurate requirement.
  • The Aspire hosting package is named Paramore.Brighter.BoxProvisioning.Aspire.Hosting. As Aspire integration grows beyond BoxProvisioning (e.g. transport provisioning), a broader Paramore.Brighter.Aspire.Hosting may be a better long-term home for all Aspire AppHost extensions - worth considering before the package name is published.
  • The BoxType ordering in BoxProvisioningHostedService uses OrderBy(p => p.BoxType == BoxType.Outbox ? 0 : 1). A comment explaining the rationale ("outbox first: critical path for message production") would help future maintainers.

Summary

The architecture is solid and ready for task breakdown and implementation. The most important issues to address:

  1. PostgreSQL lock timeout - add a timeout/retry or at minimum a "waiting for lock" log message
  2. DetectCurrentVersionAsync correctness - make backend override mandatory for backends with v2+ migrations; ensure bootstrap is safe for fresh installs on newer Brighter builds
  3. Migration history PK - include schema name to support multi-tenant scenarios
  4. Runner injection - inject IAmABoxMigrationRunner via constructor rather than constructing inline, to preserve the stated testability guarantee

- PostgreSQL: use pg_try_advisory_lock with retry loop and timeout
  instead of blocking indefinitely
- DetectCurrentVersionAsync: make backend override mandatory for v2+
  migrations to handle fresh installs with newer DDL
- Migration history PK: add SchemaName column to support multi-tenant
  scenarios with same table name in different schemas
- Runner injection: accept IAmABoxMigrationRunner via constructor
  instead of constructing inline, preserving testability
- Payload mode validation: introspect column type at startup and
  fail-fast on mismatch with configured binaryMessagePayload
- Connection lifecycle: specify single DbConnection strategy with
  advisory lock held for duration of all migrations
- NFR-2: fix wording to reflect actual multi-query startup behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 4, 2026

PR Review: ADR-0053 Box Database Migration

This is a well-thought-out design PR with a clear ADR and requirements document. The architecture is solid and addresses a genuine usability gap. Below is detailed feedback.


Overall Assessment

The design is sound. The Responsibility-Driven approach (coordinator + service providers), forward-only migrations, fail-fast hosted service, and advisory locking for concurrency are all good choices. The ADR does an excellent job documenting rejected alternatives.


Design Issues Worth Addressing Before Implementation

1. Design Gap: Who Calls DetectCurrentVersionAsync?

Section 7 (Bootstrap) says the migration runner calls provisioner.DetectCurrentVersionAsync(), but IAmABoxMigrationRunner.MigrateAsync receives only (tableName, schemaName, migrations, cancellationToken) -- it has no reference to the provisioner.

Either:

  • The provisioner calls DetectCurrentVersionAsync() itself inside ProvisionAsync() before delegating to the runner, passing a currentVersion to MigrateAsync; or
  • MigrateAsync receives a Func<CancellationToken, Task<int>> detectVersion delegate; or
  • A new int currentVersion parameter is added to MigrateAsync.

The current ADR leaves this unresolved. Recommend option 1: the provisioner is the right owner since it holds backend-specific schema introspection knowledge, and the runner's responsibility is purely "apply these migrations starting from this version".

2. MigrationLockTimeout Capture Timing

In the extension methods, lockTimeout is captured when AddMsSqlOutbox() is called:

var lockTimeout = options.MigrationLockTimeout; // captured NOW
options.Add(services => { var runner = new MsSqlBoxMigrationRunner(config, lockTimeout); });

If a caller sets MigrationLockTimeout after calling AddMsSqlOutbox(), their value is silently ignored. This is a developer footgun. Consider capturing it lazily (read from options inside the lambda) or documenting clearly that MigrationLockTimeout must be set before calling Add*Outbox().

3. Spanner DDL Cannot Run in Transactions

The ADR states "Spanner transactions provide serializable isolation by default" and implies DDL runs within transactions. However, Cloud Spanner DDL statements cannot be run inside read-write transactions -- they must be submitted via ExecuteDdlAsync (a separate DDL batch operation). The concurrency model for Spanner migrations therefore needs a different design (Spanner-level DDL is inherently serialized; the history table update can use a normal transaction, but the DDL itself cannot). This should be called out explicitly in the ADR rather than leaving it to the implementer to discover.

4. Missing Symmetric Inbox Overloads

The ADR shows AddMsSqlOutbox(connectionName, ...) but only AddMsSqlInbox(configuration). If AddMsSqlOutbox has a connection-name overload for Aspire, AddMsSqlInbox should too. The same applies to all other backends. The ADR should explicitly address this or note it as a follow-up.


Minor Issues

5. PostgreSQL Advisory Lock Hash Collision Scope

The ADR correctly notes that hashtext is 32-bit and collisions are possible, and concludes this causes "spurious contention but no correctness issue." However, pg_advisory_lock is database-scoped -- a collision with an application-level advisory lock could cause unrelated operations to block. The two-argument form pg_advisory_lock(namespaceConstant, hashtext(tableName)) would eliminate cross-application interference entirely and is only marginally more complex.

6. MSSQL Nested Transaction Semantics

The ADR describes MSSQL as: a "lock transaction" spans the entire run (for sp_getapplock), while each migration also runs "in its own transaction". MSSQL does not support true nested transactions -- BEGIN TRANSACTION inside an outer transaction increments @@TRANCOUNT but the inner COMMIT only decrements the count. A rollback at any level rolls back everything. The ADR should clarify whether per-migration transactions are implemented via savepoints or whether the lock and migration transactions are actually the same transaction.

7. IAmABoxMigration as Interface

IAmABoxMigration is a read-only data contract with three properties. An interface here implies extensibility, but the ADR does not identify a scenario where a custom implementation is needed. A record or sealed record for BoxMigration used directly in the list would be simpler and harder to misuse.

8. Bootstrap Concurrency Not Explicitly Covered by Lock

Section 7 describes the bootstrap path but does not explicitly state it is protected by the advisory lock. Two instances concurrently bootstrapping the same table could each insert synthetic history rows. Explicitly noting "the bootstrap path runs within the advisory lock" would prevent implementers from inadvertently placing the lock acquisition after the exists-check.

9. StopAsync No-op and Cancellation

If the CancellationToken passed to StartAsync is cancelled mid-migration (e.g. the host is shutting down before startup completes), individual migration steps will continue unless they propagate the token to the DbCommand. The ADR should note that DbCommand.ExecuteNonQueryAsync(cancellationToken) must be used throughout.


Nits

  • MySQL ALTER TABLE ... IF NOT EXISTS for columns: only available from MySQL 8.0+. If Brighter targets older MySQL versions, this safety net is not available. Worth documenting the minimum supported MySQL version for migrations.
  • specs/.current-spec: This file is changed from 0021-Error-Examples to 0023-box_database_migration. If spec-0021 implementation is still in progress, this tracking file change could confuse the tooling. Intentional?
  • The PostgreSQL locking cell in the concurrency table (section 5) is notably longer than the other cells -- consider extracting it to a separate paragraph for readability.

Summary

The design is solid and ready for implementation with the following clarifications needed:

Priority Issue
Must address Bootstrap / DetectCurrentVersionAsync ownership (section 7 design gap)
Must address Spanner DDL transaction model
Should address MigrationLockTimeout capture timing footgun
Should address Symmetric Add*/Inbox(connectionName) overloads
Nice to have MSSQL nested transaction clarification
Nice to have PostgreSQL advisory lock scope discussion

Great work on the thorough requirements and ADR -- the "Alternatives Considered" section is especially useful for future maintainers.

iancooper and others added 4 commits March 5, 2026 23:05
Clarify DetectCurrentVersionAsync ownership (provisioner calls it, passes
currentVersion to MigrateAsync), fix Spanner DDL transaction model, add
symmetric AddMsSqlInbox connectionName overload, clarify MSSQL single-
transaction semantics, note bootstrap runs within advisory lock, and
require CancellationToken propagation to DbCommand calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ADR 0053 changes:
- Replace DetectCurrentVersionAsync with BoxTableState record for
  unambiguous provisioner-to-runner state passing
- Document Spanner DDL failure window and idempotency requirement
- Add payload mode validation column names and type mappings per backend
- Add prerequisites: SchemaName on interface, Spanner builder fix
- Defer Aspire to ADR 0054

ADR 0054 (new, Proposed):
- Aspire integration with open questions on IConfiguration scope,
  package structure, testing patterns, and API stability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aspire integration will be handled separately since the provisioning
tool accepts connection strings directly. Replaced FR-4 (Aspire) with
a requirement to update WebAPI samples to use the new box provisioning
library for inbox/outbox instead of DbMaker. Deleted ADR 0054.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 9, 2026

ADR Review: Box Database Migration (ADR 0053 / Spec 0023)

This is a well-researched design document that addresses a real pain point in Brighter. The architecture is clearly reasoned, the alternatives are well-considered, and the Responsibility-Driven Design role table in section 1 is exemplary. The following is a detailed review to harden the design before implementation begins.

Strengths

  • Bootstrap path is explicitly modeled as a first-class BoxTableState record, eliminating the ambiguity of implicit null or sentinel version numbers.
  • Forward-only migrations rationale is well-articulated. The call-out that the Up prefix is retained for convention alignment (not to imply a Down) avoids future confusion.
  • Concurrency section is thorough: backend-specific locking mechanisms, TimeSpan-to-backend-unit conversion table, and the Spanner DDL/history atomicity failure window are all documented with mitigations.
  • The UpScript naming rationale (avoiding UpSql for future non-SQL backend extensibility) is a nice forward-looking detail.

Issues to Address Before Implementation

1. Bootstrap Race Condition (Medium)

The provisioner detects BoxTableState before the advisory lock is acquired (the lock lives in the migration runner, not the provisioner). Two concurrent instances could both detect BoxTableState(TableExists: true, HistoryExists: false, CurrentVersion: 1) before either holds the lock. The runner for the second instance would then receive stale state and attempt to insert synthetic history rows that the first instance already inserted, causing a primary-key violation on (SchemaName, BoxTableName, MigrationVersion).

The ADR states "The entire bootstrap path runs within the advisory lock", but does not show the runner re-verifying HistoryExists after lock acquisition.

Recommendation: Either (a) move DetectTableStateAsync to occur after the advisory lock is acquired inside the runner, or (b) use INSERT IF NOT EXISTS / MERGE semantics for synthetic history rows so duplicate inserts are idempotent.


2. MSSQL Single-Transaction Design (Medium)

Section 5 describes using a single transaction for both the advisory lock and all migration DDL/history inserts. While MSSQL supports transactional DDL (unlike MySQL), there are practical risks: with many migrations, holding one open transaction throughout all steps may cause lock escalation or timeout issues in high-concurrency schemas. There is also ambiguity about whether partial success is desired.

Recommendation: Clarify whether the design intends one transaction per migration (each UpScript + history insert as one unit) vs. one transaction for all migrations. If the latter, document the rationale explicitly.


3. SchemaName Interface Gap: Prerequisite Ordering (Minor)

The ADR correctly identifies in section 10 that IAmARelationalDatabaseConfiguration must gain a SchemaName property. However, this is a breaking change to a public interface — all implementors, including third-party ones, must add the member.

Recommendation: In the tasks document, mark this as the very first task and note it requires either a SemVer major bump or a default interface member (string? SchemaName => null;) to remain backward-compatible.


4. Existing Bug in SchemaCreation.cs (Informational)

There is a parameter-order mismatch in samples/WebAPI/WebAPI_Common/DbMaker/SchemaCreation.cs. It calls SqlInboxBuilder.GetExistsQuery(tableSchema, INBOX_TABLE_NAME) but the actual method signature is GetExistsQuery(string inboxTableName, string schemaName). The sample currently swaps schema and table name when checking inbox existence.

This is a pre-existing bug, not introduced by this PR. Worth fixing as part of the FR-4 sample update task.


5. BoxProvisioningOptions API Underspecified (Minor)

The ADR shows UseBoxProvisioning(options => { options.AddMsSqlOutbox(config); }) but does not define:

  • The full signature of AddMsSqlOutbox(config) vs. AddMsSqlOutbox() (the Aspire variant)
  • Whether BoxProvisioningOptions holds IAmABoxProvisioner instances directly or factories
  • How MigrationLockTimeout is configured (referenced in the concurrency table but not shown on any options class)

Recommendation: Add a BoxProvisioningOptions class definition to the ADR to make the registration API concrete before implementation starts.


6. BoxMigration Record Undefined (Minor)

BoxMigration is listed in the package structure as a "Simple record implementation" and is used in migration factory examples, but its definition is never shown. Worth adding to the ADR since it is the canonical concrete type implementors will use.


7. Payload Mode Validation Adds Startup Coupling (Minor)

Section 6 describes introspecting the actual column type at startup to validate binaryMessagePayload matches the deployed schema. This adds complexity (an extra DB round-trip, per-backend column-type mapping with ambiguous MySQL entries: LONGTEXT / TEXT) and creates a failure mode if INFORMATION_SCHEMA is slow or blocked.

Recommendation: Consider whether this validation is strictly necessary given that mismatches would surface at runtime via serialisation failures. If retained, the MySQL column type match logic needs to handle both LONGTEXT and TEXT explicitly.


Nits

  • The BoxType enum uses { Inbox, Outbox } without explicit integer values. Consider { Outbox = 0, Inbox = 1 } to make the ordering in OrderBy(p => p.BoxType == BoxType.Outbox ? 0 : 1) less fragile if the enum ever gains new members.
  • ADR title is "Database Migration" but the PR title is "Database Utilities" — worth aligning.
  • Section 8 states Aspire integration is out of scope, but the PR body says "It also leans into Aspire." The ADR and PR description should align on this scope boundary.

Summary

The design is architecturally sound and well-suited for the Brighter ecosystem. The main item to resolve before implementation is the bootstrap race condition (Issue 1). The spec/ADR workflow has been correctly followed. Looking forward to seeing the tasks document and MSSQL backend implementation.

Reviewed by Claude Sonnet 4.6

25 tasks across 8 phases covering prerequisites, core abstractions,
MSSQL/PostgreSQL/MySQL/SQLite/Spanner backends, and sample updates.
All behavioral tasks use TDD /test-first format with approval gates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

iancooper and others added 6 commits May 3, 2026 20:58
SQLite analogue of MSSQL Task 1.8a (09907f7) and Postgres Task 2.7a
(1d522c2). Per ADR §5a SQLite is whole-chain transactional like
MSSQL/Postgres — the BEGIN IMMEDIATE transaction wraps
EnsureHistoryTableAsync + bootstrap path + per-migration ALTER + history
inserts, so a mid-chain failure rolls back everything atomically. Task
4.4's runner already implements this; this commit ships a regression
guard proving the contract holds.

New TestDoubles/BrokenMigrationFactory.cs clones a real migration list,
substitutes one entry's UpScript with arbitrary failing SQL while
preserving Version / Description / LogicalColumns / SourceReference /
IdempotencyCheckSql. Identical to the MSSQL / Postgres / MySQL
counterparts (purely operates on BoxMigration records, no provider
deps).

Single [Fact] Should_roll_back_all_history_and_ddl_then_succeed_on_retry:

Phase 1 — broken-V6 chain. Seeds V3 outbox + marker, calls runner with
broken-V6 list. SQLite has no RAISE-EXCEPTION-from-DML form (RAISE
only valid inside triggers), so we force a SQL error by referencing a
non-existent table — SqliteException with code 1 (SQLITE_ERROR).
Bootstrap path inserts synthetic V3 + V4 + V5 (DDL + history); V6's
broken UpScript throws. Asserts:
- SqliteException raised
- Zero history rows for the box (synthetic V3 + V4/V5 applied rows
  all rolled back; if the history table itself was created in this
  transaction, that's also rolled back — defended via try/catch
  SqliteException in GetHistoryRowsByVersion)
- V3 column shape preserved: ContentType (V3) present (pre-transactional);
  PartitionKey (V4 ALTER) and Source (V5 ALTER) absent (rolled back)
- Marker row preserved (no DROP/recreate)

Phase 2 — retry with the real list via SqliteOutboxProvisioner. Runner
re-enters bootstrap (no history exists), re-detects V3, stamps synthetic
V3 + applies V4..V7 cleanly. Asserts:
- Exactly OutboxLatest - 3 + 1 = 5 history rows (V3 synthetic + V4..V7
  applied non-synthetic, no duplicates)
- DataRef (V7) and PartitionKey (V4) columns present
- Marker still preserved

GREEN on first run — Task 4.4's runner already wraps the chain in a
single BEGIN IMMEDIATE transaction with a try/catch that rolls back on
any throw. SQLite supports transactional DDL (ALTER TABLE ADD COLUMN
inside a transaction is fully rollback-able), so the V4 / V5 ALTERs
cleanly revert. No production change. Closes ADR §5a verification for
SQLite.

23/23 BoxProvisioning tests green sequentially on net9.0 + net10.0
(was 22/22; +1 Fact).

Also ticks tasks.md Task 4.8a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-out)

SQLite analogue of MSSQL Task 1.9 (3109da5), Postgres Task 2.8
(8c2d84c), MySQL Task 3.8 (da46dcc). Closes Phase 4 + AC-6 for
SQLite — verifies that a spec-0023-era prod install (V_latest-shape
outbox table from the live builder + a single V1 history row described
"spec 0023 fresh install") transitions cleanly to OutboxLatest=7
without DDL changes and without duplicate history.

Single [Fact] Should_transition_cleanly_via_normal_path_with_no_ddl_changes_and_no_duplicate_history:
- Builds V_latest-shape outbox via live SqliteOutboxBuilder.GetDDL(
  _tableName, hasBinaryMessagePayload: false) — the exact 22-column
  shape a spec-0023-era prod install lands at.
- Manually creates the history table (matching the runner's schema:
  MigrationVersion / BoxTableName / Description / AppliedAt; PK on
  (BoxTableName, MigrationVersion)) and seeds a single V1 row
  described "spec 0023 fresh install" — modelling the honest "I made
  this table" history that spec 0023 would have stamped.
- Runs SqliteOutboxProvisioner.ProvisionAsync(). Runner sees
  TableExists=true + HistoryExists=true under the BEGIN IMMEDIATE
  transaction, dispatches to RunNormalPathAsync. Normal path walks
  V2..V7; per ADR §6, ApplyOrSkipAsync evaluates each migration's
  IdempotencyCheckSql (pragma_table_info probe) — every probe returns
  >0 because the V_latest builder shape already has the column — so
  UpScript is skipped, history row inserted only.

Asserts:
- Column set unchanged (every V2..V7 ALTER no-op'd via
  IdempotencyCheckSql skip — captured columnsBefore vs columnsAfter)
- Exactly OutboxLatest=7 history rows total
- V1 description preserved verbatim ("spec 0023 fresh install" — the
  IsMigrationAppliedAsync gate in RunNormalPathAsync sees V1 already
  present and skips it entirely, not overwriting the description)
- V2..V7 are normal-path applied (descriptions don't start with
  "bootstrap:" or "fresh install")

GREEN on first run — Task 4.4's runner already covers all three
moving parts (normal path + IsMigrationAppliedAsync gate +
IdempotencyCheckSql skip). No production change. Validates the
"every column already present" arm of ApplyOrSkipAsync that wasn't
exercised directly by Tasks 4.6 / 4.7 (which seed legacy V_k shapes
where SOME columns are missing) — this test ensures the runner
correctly handles the case where ALL ALTERs are no-ops.

96/96 SQLite tests green sequentially on net9.0 + net10.0
(was 95/95; +1 Fact).

Closes Phase 4 (SQLite). All ten phase-4 tasks (4.1 / 4.2 / 4.3 / 4.4
test+impl / 4.5 / 4.6 / 4.7 / 4.8 / 4.8a / 4.9) complete. Spec 0023 R2
TOCTOU finding now resolved across all four relational backends
(MSSQL / Postgres / MySQL / SQLite). ADR §5a whole-chain rollback
verified across all three transactional backends. AC-6 closed for the
final relational backend.

Also ticks tasks.md Task 4.9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors 6d61d77's pattern (Phase 1 + 2 close-out): sweeps the
"Acceptance checklist" section of tasks.md to mark every AC whose
referenced tasks are now all complete.

Ticked (12 entries — Phase 4 close-out):
- AC-1   outbox V1..V6 → V7    (1.6, 2.6, 3.6, 4.6)
- AC-2   inbox V1 → V2          (1.7, 3.6a, 4.7)
- AC-4   no-op re-run idempotency (existing + 4.9)
- AC-5/AC-18 concurrent bootstrap (1.8, 2.7, 3.7, 4.8)
- AC-6/AC-19 spec-0023-era transition (1.9, 2.8, 3.8, 4.9)
- AC-9   V1..V7 outbox          (1.2, 2.2, 3.2, 4.2)
- AC-10  V1..V2 inbox           (1.3, 2.3, 3.3, 4.3)
- AC-11  housekeeping preserved (drift 1.1, 2.1, 3.1, 4.1)
- AC-12  SourceReference populated (1.2, 1.3, 2.2, 2.3, 3.2, 3.3, 4.2, 4.3)
- AC-16  bootstrap-at-V_k tests (1.6/1.7, 2.6, 3.6/3.6a, 4.6/4.7)
- AC-17  per-backend idempotency tests (existing + 4.9 IdempotencyCheckSql)
- ADR §5a whole-chain rollback (1.8a, 2.7a, 4.8a; MySQL 3.4)

Still open (gated on Phase 5/6/7):
- AC-3   fresh install assertion (needs Task 5.1 for Spanner)
- AC-7   Spanner fresh-only (Tasks 5.1, 5.2)
- AC-8   Spanner with history (Task 5.3)
- AC-14  box_provisioning.md rule (Task 7.1)
- AC-15  fresh-install test (needs Task 5.1)

No production change. Ticking only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… IsMigrationAppliedAsync gate

Closes spec 0023 R4 (Spanner history INSERT was unprotected) and AC-15
(per-backend fresh-install test asserts V_latest + history row).

Per ADR 0057 §6 the Spanner runner is degenerate (no V_k chain). On a
fresh install it now executes the current builder DDL and stamps history
at V_latest under an IsMigrationAppliedAsync gate, with description
"fresh install at V{N}". V_latest is chosen by BoxType: 7 outbox / 2 inbox.

Existing-table paths (TableExists=true) are preserved untouched and remain
the scope of Tasks 5.2 (existing-table-without-history with discriminator
gate) and 5.3 (existing-table-with-history no-op + delete Phase-0.3
compile bridges).

New tests in tests/Paramore.Brighter.Gcp.Tests/Spanner/BoxProvisioning/
When_spanner_fresh_install_runs_it_should_create_table_and_stamp_v_latest_and_skip_duplicate_history_insert.cs:
- SpannerOutboxFreshInstallTests + SpannerInboxFreshInstallTests
- Each fact: provision absent table → assert single history row at
  V_latest with description starting with "fresh install" → second
  ProvisionAsync() → assert no duplicate, description preserved.

Side effects: existing fresh-install tests Should_create_outbox_table /
Should_create_inbox_table (added by Phase 0.3a) flip RED → GREEN. The 2
existing When_spanner_*_provisioner_finds_existing_table_without_history
tests stay RED — pre-existing, slated for Task 5.2.

Spanner BoxProvisioning suite: 4/6 passing on net9.0 + net10.0 against
the Spanner emulator (was 0/6 RED).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tstrap gated by discriminator

Runner now branches on `tableState.HistoryExists` after the table-exists
check: absent history triggers a discriminator-gated bootstrap that queries
information_schema.columns for `HeaderBag` (outbox) / `CommandBody` (inbox);
absent -> throws ConfigurationException "not a Brighter outbox/inbox";
present -> stamps V_latest with the ADR section 6 description
"bootstrap: spanner-assumed-current (no known legacy installations, A-2)",
gated by IsMigrationAppliedAsync for re-run safety.

New helper SpannerBoxDetectionHelpers.DiscriminatorFor(BoxType) returns
the case-sensitive Ordinal column name (Spanner builders use PascalCase).

Side effect: the two pre-existing
When_spanner_*_provisioner_finds_existing_table_without_history tests
flip RED -> GREEN — they always asserted V_latest history rows, which the
new bootstrap path now produces.

10/10 Spanner BoxProvisioning tests green on net9.0 + net10.0
(was 4/6 — 2 RED bootstrap + 4 new RED arms now all green; +4 facts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ows on out-of-sync; delete Phase-0.3 bridges

Runner now has the third branch on tableState { TableExists:true, HistoryExists:true }:
MAX(V) == V_latest -> no-op (clean return); MAX(V) != V_latest -> throws
ConfigurationException "Migration list out of sync for table '...': installed V=X,
expected V=Y. Manual recovery required per ADR 0057 section 6." (covers MAX(V) > V_latest
and the undefined MAX(V) < V_latest path symmetrically — manual recovery either way).

Phase-0.3 compile bridges deleted:
- src/Paramore.Brighter.BoxProvisioning.Spanner/SpannerOutboxMigrations.cs
- src/Paramore.Brighter.BoxProvisioning.Spanner/SpannerInboxMigrations.cs

The runner's FreshInstallAsync now calls SpannerOutboxBuilder.GetDDL /
SpannerInboxBuilder.GetDDL directly via boxType dispatch (outbox respects
configuration.BinaryMessagePayload). Both provisioners pass Array.Empty<IAmABoxMigration>()
to MigrateAsync — the IAmABoxMigrationRunner contract is unchanged; the Spanner
runner ignores migrations because it is fresh-only per ADR section 6.

14/14 Spanner BoxProvisioning tests green on net9.0 + net10.0 (was 10/10; +4 facts).
Closes Phase 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Code Review — PR #4039 Database Utilities for our boxes

Reviewed commit range as of 2026-05-04. This is a fresh independent pass; items from prior automated reviews are verified against current code rather than taken on faith.


Architecture & Design

The overall design is well-considered. The three-path dispatch (fresh / bootstrap / normal), TOCTOU defeat via under-lock re-read, and discriminator-based identification of Brighter tables all reflect careful thought documented thoroughly in ADR 0053/0057. The MSSQL implementation in particular is production-quality: sp_getapplock-inside-transaction, schema-qualified queries, and the lock-name length guard are all done correctly.


Confirmed Open Issues

These were raised in earlier review passes and remain unaddressed in the current code.

M1 — MySQL GET_LOCK name has no length guard (MySqlBoxMigrationRunner.cs:191)

var lockName = $"BrighterMigration_{tableName}";
// No length check

MySQL silently truncates GET_LOCK names at 64 characters (MySQL 5.7) / 64 bytes (MySQL 8.0). A table name longer than 47 characters causes BrighterMigration_{tableName} to be truncated, making the lock key non-unique across different long table names and silently defeating mutual exclusion. MSSQL already has the analogous guard at line 193–199 of MsSqlBoxMigrationRunner.cs. Recommend adding:

if (lockName.Length > 64)
    throw new ArgumentException(
        $"GET_LOCK name '{lockName}' exceeds MySQL's 64-char limit. Use a shorter table name.",
        nameof(tableName));

M2 — PostgreSQL pg_advisory_unlock result silently discarded (PostgreSqlBoxMigrationRunner.cs:238)

await command.ExecuteScalarAsync(cancellationToken); // result ignored

pg_advisory_unlock returns false if the calling session does not hold the lock. Discarding this silently swallows a bug where a lock is released that was never acquired (e.g., after a connection recycle). Recommend:

var released = (bool)(await command.ExecuteScalarAsync(cancellationToken))!;
if (!released)
    _logger.LogWarning("pg_advisory_unlock returned false for '{TableName}' — lock was not held", tableName);

M3 — MSSQL EnsureHistoryTableAsync checks sys.tables without schema filter (MsSqlBoxMigrationRunner.cs:231)

IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = '__BrighterMigrationHistory')

This matches any schema. If a table named __BrighterMigrationHistory exists in a schema other than dbo, the CREATE TABLE is skipped, but subsequent queries against [__BrighterMigrationHistory] resolve against the session's default schema (typically dbo) and find nothing. The fix is to add a schema filter:

IF NOT EXISTS (
    SELECT 1 FROM sys.tables t
    JOIN sys.schemas s ON t.schema_id = s.schema_id
    WHERE t.name = '__BrighterMigrationHistory' AND s.name = 'dbo'
)

And qualify the CREATE TABLE itself: CREATE TABLE [dbo].[__BrighterMigrationHistory].

M5 — IAmABoxMigration.LogicalColumns is mutable ISet<string> (IAmABoxMigration.cs:36, BoxMigration.cs:29)

The interface documents the read-only-by-convention invariant but still exposes ISet<string>, which allows callers to mutate the column set and corrupt version detection. On netstandard2.0, IReadOnlySet<T> is unavailable, but IReadOnlyCollection<string> is, and IsSubsetOf can be replicated with LINQ or a helper. HashSet<string> already implements IReadOnlyCollection<string>. For a netstandard2.0-compatible fix:

IReadOnlyCollection<string> LogicalColumns { get; }

The DetectCurrentVersionAsync callers can use actualColumns.IsSupersetOf(migration.LogicalColumns) since HashSet<T>.IsSupersetOf accepts any IEnumerable<T>.


New Findings

N1 — Fresh path has no guard that migrations[0] is actually V1

In all five backends, the fresh path does:

await ExecuteUpScriptAsync(connection, transaction, migrations[0], cancellationToken);
var latest = migrations[migrations.Count - 1];
await InsertHistoryRowAsync(..., latest.Version, $"fresh install at V{latest.Version}", ...);

This assumes migrations[0].Version == 1 and that migrations[0].UpScript creates a V_latest-shape table. If migrations is passed out-of-order or with a missing V1 entry, the wrong DDL runs and the history is stamped at the wrong version — silently creating an inconsistent table. A defensive guard:

if (migrations.Count == 0 || migrations[0].Version != 1)
    throw new ArgumentException("migrations must begin at V1", nameof(migrations));

would catch this at call time rather than at a confusing data consistency failure later.

N2 — No validation that migrations list is ordered by ascending version

RunBootstrapPathAsync and RunNormalPathAsync iterate migrations assuming ascending version order for the migration.Version <= detected/maxVersion short-circuit. DetectCurrentVersionAsync also relies on ascending order for its subset walk. Nothing in MigrateAsync's signature or body validates this. A caller passing an unsorted list could skip migrations or detect the wrong version without any error. Consider either validating at the top of MigrateAsync:

for (var i = 1; i < migrations.Count; i++)
    if (migrations[i].Version <= migrations[i - 1].Version)
        throw new ArgumentException("migrations must be ordered by ascending Version", nameof(migrations));

or enforcing ordering in each backend's provisioner before calling MigrateAsync.

N3 — PostgreSQL history table created without schema qualification

EnsureHistoryTableAsync (PostgreSqlBoxMigrationRunner.cs:249):

CREATE TABLE IF NOT EXISTS "__BrighterMigrationHistory" (...)

Without schema qualification, the table is created in the first schema on the session's search_path (typically public, but not guaranteed). If two provisioners run with different search_path settings they will each create separate history tables and neither will see the other's records. Consider creating explicitly in public (or in the effectiveSchema) and qualifying all history queries consistently.


Minor Observations

  • BoxProvisioningHostedService log messages omit the table name (BoxProvisioningHostedService.cs:34,38). In deployments with multiple outbox/inbox tables the log only says "Provisioning Outbox..." with no table identifier. If IAmABoxProvisioner exposes the table name, include it in the log.

  • BoxMigration is a record with ISet<string> LogicalColumns. C# records use reference equality for reference-type properties in their default Equals/GetHashCode. Two BoxMigration instances with the same version and identical column sets but different HashSet instances will not be equal. This is likely fine in practice (migrations aren't compared for equality) but worth noting if deduplication logic is ever added.

  • MySQL/SQLite/Spanner migration chains are V1-only — appropriate for DRAFT, but blocking for exit.


Summary

The MSSQL and PostgreSQL backends are close to production quality. The blocking items before exiting DRAFT are:

Priority Issue Backends affected
Must-fix M1: MySQL GET_LOCK name length guard MySQL
Must-fix M3: History table schema check missing schema filter MSSQL
Must-fix N1: Fresh path no guard on migrations[0].Version == 1 All
Must-fix N2: No validation that migrations list is sorted All
Should-fix M2: pg_advisory_unlock result silently discarded PostgreSQL
Should-fix M5: LogicalColumns mutable ISet All
Nice-to-have N3: PostgreSQL history table lacks schema qualification PostgreSQL

The ADR quality and MSSQL implementation set a high bar — the remaining items are correctness gaps rather than design concerns.

iancooper and others added 9 commits May 4, 2026 02:20
…ose-out for Postgres)

New test file `tests/Paramore.Brighter.PostgresSQL.Tests/BoxProvisioning/When_postgres_provisioner_runs_against_existing_table_with_mismatched_payload_mode_it_should_throw_configuration_exception.cs` (4 facts, single class).

Each fact builds an existing table via the live `PostgreSqlOutboxBuilder` / `PostgreSqlInboxBuilder` in one payload mode, then provisions in the opposite mode and asserts `ConfigurationException` with "mismatch" in the message. Symmetric coverage:

- Outbox: existing text `Body` + binary-configured provisioner → throws
- Outbox: existing bytea `Body` + text-configured provisioner → throws
- Inbox: existing text `CommandBody` + binary-configured provisioner → throws
- Inbox: existing bytea `CommandBody` + text-configured provisioner → throws

GREEN on first run — `PostgreSqlPayloadModeValidator` + the `ValidatePayloadModeAsync` wiring on both `PostgreSqlOutboxProvisioner` (column `body`) and `PostgreSqlInboxProvisioner` (column `commandbody`) shipped by spec 0023. No production change. Closes spec 0023 R5 for Postgres; the inbox arm specifically pins that the validator is wired through both provisioner paths, not just outbox.

29/29 Postgres BoxProvisioning tests green sequentially on net9.0 (was 25/25; +4 facts); 4/4 new facts green on net10.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…5 close-out for MySQL)

New test file `tests/Paramore.Brighter.MySQL.Tests/BoxProvisioning/When_mysql_provisioner_runs_against_existing_outbox_with_mismatched_payload_mode_it_should_throw_configuration_exception.cs` (2 facts, single class).

Outbox-only per task 6.2 spec. Symmetric coverage:

- Existing TEXT-mode `Body` + binary-configured provisioner → throws `ConfigurationException`
- Existing BLOB-mode `Body` + text-configured provisioner → throws `ConfigurationException`

Each fact builds the existing table via the live `MySqlOutboxBuilder` in one payload mode, then provisions in the opposite mode and asserts `ConfigurationException` with "mismatch" in the message. `MySqlPayloadModeValidator` accepts `text`/`longtext` and `blob`/`longblob` data types — both arms exercise the symmetric throw paths.

GREEN on first run — `MySqlPayloadModeValidator` + `MySqlOutboxProvisioner.ValidatePayloadModeAsync` (column `Body`) shipped by spec 0023. No production change. Closes spec 0023 R5 for MySQL.

27/27 MySQL BoxProvisioning tests green sequentially on net9.0 (was 25/25; +2 facts). MySQL test project is net9.0-only per `BrighterTestNineOnlyTargetFrameworks` so no net10.0 run required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…R5 close-out for SQLite)

New test file `tests/Paramore.Brighter.Sqlite.Tests/BoxProvisioning/When_sqlite_provisioner_runs_against_existing_outbox_with_mismatched_payload_mode_it_should_throw_configuration_exception.cs` (2 facts, single class).

Outbox-only per task 6.3 spec. Symmetric coverage:

- Existing TEXT-mode `[Body]` + binary-configured provisioner → throws `ConfigurationException`
- Existing BLOB-mode `[Body]` + text-configured provisioner → throws `ConfigurationException`

Each fact builds the existing table via the live `SqliteOutboxBuilder` in one payload mode, then provisions in the opposite mode and asserts `ConfigurationException` with "mismatch" in the message. `SqlitePayloadModeValidator` queries `pragma_table_info` and accepts `TEXT`/`NTEXT` and `BLOB` declared types.

GREEN on first run — `SqlitePayloadModeValidator` + `SqliteOutboxProvisioner.ValidatePayloadModeAsync` (column `Body`) shipped by spec 0023. No production change. Closes spec 0023 R5 for SQLite.

26/26 SQLite BoxProvisioning tests green sequentially on net9.0 (was 24/24; +2 facts); 2/2 new facts green on net10.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(R5 close-out for Spanner; closes Phase 6)

New test file `tests/Paramore.Brighter.Gcp.Tests/Spanner/BoxProvisioning/When_spanner_provisioner_runs_against_existing_outbox_with_mismatched_payload_mode_it_should_throw_configuration_exception.cs` (2 facts, single class, `[Collection("SpannerBoxProvisioning")]` for serialization).

Outbox-only per task 6.4 spec. Symmetric coverage:

- Existing STRING(MAX)-mode `Body` + binary-configured provisioner → throws `ConfigurationException`
- Existing BYTES(MAX)-mode `Body` + text-configured provisioner → throws `ConfigurationException`

Each fact builds the existing table via the live `SpannerOutboxBuilder` in one payload mode (using `SpannerConnection.CreateDdlCommand` since Spanner DDL is split from DML), then provisions in the opposite mode and asserts `ConfigurationException` with "mismatch" in the message. `SpannerPayloadModeValidator` queries `INFORMATION_SCHEMA.COLUMNS` and matches on `STRING`/`BYTES` prefixes via `SPANNER_TYPE`.

GREEN on first run — `SpannerPayloadModeValidator` + `SpannerOutboxProvisioner.ValidatePayloadModeAsync` (column `Body`) shipped by spec 0023. No production change. Closes spec 0023 R5 for Spanner; closes Phase 6 end-to-end.

16/16 Spanner BoxProvisioning tests green sequentially on net9.0 against the emulator (was 14/14; +2 facts); 2/2 new facts green on net10.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 5.1/5.2/5.3

Acceptance-criteria sweep matching the close-out actually delivered by Phase 0.3a (relational fresh-install retargets) and Phase 5 (Spanner runner — 5.1 fresh-install, 5.2 bootstrap-on-existing, 5.3 normal-path no-op + out-of-sync throw):

- AC-3 fresh install produces V_latest + single history row — closed end-to-end by 0.3a (MSSQL/Postgres/MySQL/SQLite) + 5.1 (Spanner)
- AC-7 Spanner fresh-only — closed by 5.1, 5.2
- AC-8 Spanner with history — closed by 5.3

Open ACs after this commit: AC-14 (`.agent_instructions/box_provisioning.md` rule, Task 7.1) only. AC-15 was already ticked in commit `ad7c7b3b5`. Tests-only docs change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… columns

Adds a prominent "Mandatory Rule" section near the top of
.agent_instructions/box_provisioning.md enforcing that every column added
to a *OutboxBuilder or *InboxBuilder MUST ship with a new V(N+1)
BoxMigration entry, with required fields (LogicalColumns, SourceReference,
IdempotencyCheckSql for SQLite, idempotent provider-specific UpScript) and
a CI-enforced drift-detection backstop. References ADR 0057 and spec 0027
README. Also tightens the existing "Adding New Columns" section so its
checklist agrees with the strengthened rules (no NOT-NULL adds, V1
UpScript stays the live builder DDL, drift test must continue to pass).

Closes spec 0027 AC-14.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n source-break

Adds a "Box Schema Versioning and Migrations (spec 0027)" subsection under
## Master in release_notes.md describing:
- The new versioned migration chain (V1..V_latest outbox; V1..V2 inbox on
  MSSQL/MySQL/SQLite, V1-only on Postgres; Spanner fresh-only).
- The source-breaking IAmABoxMigration additions (LogicalColumns required;
  SourceReference required from V2; IdempotencyCheckSql SQLite-only) — same
  break model as spec 0023's SchemaName addition because netstandard2.0
  cannot support default interface members.
- IAmABoxMigrationRunner.MigrateAsync gains a BoxType parameter so the
  runner can pick the discriminator under the lock for legacy bootstrap.
- AC-6 transition note: spec-0023-era V=1 history rows are preserved
  verbatim and the runner advances to V_latest without re-running DDL.
- Spanner degenerate-runner contract per ADR 0057 §6.

Doc-only — no tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds inline "CLOSED by spec 0027" resolution blocks to the three remaining
open findings in specs/0023-box_database_migration/review-code.md:

- R2 (TOCTOU race in bootstrap path) — closed by per-backend TOCTOU
  re-check + IsMigrationAppliedAsync gate (Tasks 1.4 / 2.4 / 3.4 / 4.4)
  and concurrent-bootstrap tests (1.8 / 2.7 / 3.7 / 4.8).
- R4 (Spanner history INSERT unprotected) — closed by Task 5.1's
  IsMigrationAppliedAsync gate on fresh install + 5.2's discriminator-
  gated bootstrap path.
- R5 (payload-mode tests MSSQL-only) — closed by Phase 6 Tasks 6.1–6.4
  adding one payload-mode-mismatch test per backend (Postgres / MySQL /
  SQLite / Spanner).

Each block links to the closing commit SHA, the relevant ADR 0057 section,
and spec 0027 tasks.md. Adds a top-level "Resolution status" pointer near
the metadata block so the historical "NEEDS WORK" verdict is preserved
alongside the current state.

Doc-only — no code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mentation complete)

Closes Phase 7 in tasks.md:
- Task 7.1 ticked, references commit c3f0c6e (box_provisioning.md rule)
- Task 7.2 ticked, references commit 5ced21c (release_notes.md entry)
- Task 7.3 ticked, references commit e42b8c5 (spec 0023 R2/R4/R5 close-out)
- AC-14 ticked (closed by Task 7.1)
- AC-15 ticked (closed by Task 0.3a + Task 5.1; was missed in 59e7431)

All 19 acceptance criteria are now ticked. Spec 0027 implementation is
complete end-to-end across 4 relational backends (MSSQL, PostgreSQL,
MySQL, SQLite) and Spanner; spec 0023 R1/R2/R4/R5 are all closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Code Review -- PR 4039: Database Utilities for our boxes

Thanks for a well-designed PR. The ADRs are excellent, the three-path provisioning model (fresh / bootstrap / normal) is solid, and the TOCTOU fix via re-reading state under the advisory lock is exactly right. Notes below are in priority order.

High Priority

1. Thin test coverage for the migration runner and detection logic

The only new test file is DdlColumnExtractorTests (4 cases for the DDL parser helper). The riskiest code -- MsSqlBoxMigrationRunner, DetectCurrentVersionAsync, the three-path dispatch, BoxProvisioningHostedService -- has no unit tests. At minimum the bootstrap path deserves:

  • Unit tests for DetectCurrentVersionAsync: column sets matching each of V1..V7, a set with nothing matching (returns 0), a set missing the discriminator (returns -1).
  • Tests for each migration path: fresh, bootstrap at various detected versions, normal with some already-applied.
  • A BoxProvisioningHostedService test verifying outbox-before-inbox ordering and that a provisioner failure throws ConfigurationException.

The ADR mentions drift tests in tests/Paramore.Brighter.MSSQL.Tests/BoxProvisioning but none appear in the PR file list. Are those in a follow-up?

2. sp_getapplock timeout cast may overflow

In MsSqlBoxMigrationRunner.AcquireLockAsync:

command.Parameters.AddWithValue("@lockTimeoutMs", (int)lockTimeout.TotalMilliseconds);

TimeSpan.TotalMilliseconds is a double. Casting to int silently overflows above ~24.8 days. A long-to-int range check before casting eliminates the footgun.

Medium Priority

3. Three database connections per ProvisionAsync call

Each ProvisionAsync opens up to three connections: DetectTableStateAsync, ValidatePayloadModeAsync, MigrateAsync. Payload validation could be folded into the runner under the lock (it only runs when the table exists, which is re-confirmed under the lock anyway). Passing a single connection into both helpers avoids two extra pool acquisitions per startup.

4. No SQL command timeout on migrations

None of the SqlCommand instances set CommandTimeout. The ADO.NET default is 30 seconds. A V1 CREATE TABLE bootstrapped against a large existing table could exceed that. Consider threading a configurable command timeout through BoxProvisioningOptions.

5. IsMigrationAppliedAsync is redundant in the normal path

RunNormalPathAsync fetches maxVersion, skips anything <= maxVersion, then calls IsMigrationAppliedAsync for each remaining migration. Since the history table is only written inside this same transaction, a version above maxVersion will never be in history. The extra per-migration round-trip adds latency for no safety gain. Remove it, or add a comment explaining what specific race it defends against.

6. Dual API for lock timeout in UseBoxProvisioning

The migrationLockTimeout optional parameter is applied before configure is called, so the delegate can silently override it -- and the delegate can also set options.MigrationLockTimeout directly. Two mechanisms for one value is confusing. Consider removing the parameter and making BoxProvisioningOptions.MigrationLockTimeout the single configuration point.

Low Priority / Nits

7. __BrighterMigrationHistory hardcoded to dbo

EnsureHistoryTableAsync and DoesHistoryExistAsync hardcode dbo as the history table schema. Operators who restrict the app DB user to a custom schema must also grant dbo access. Worth surfacing in the XML summary on the class, not just the ADR.

8. BoxMigration record value-equality caveat with ISet

BoxMigration (a record) compares ISet<T> fields by reference, not set equality. A with-expression copy produces a different set identity, so record1 == record2 can be false even when both hold identical columns. Worth noting for test authors asserting migration equality.

9. Default table names in connection-name overloads

Verify that "Outbox" and "Inbox" used as defaults in the connection-name overloads match the defaults inside RelationalDatabaseConfiguration's own constructor. Diverging defaults cause inconsistent table names depending on which registration path is used.

10. Leftover Console.WriteLine in SchemaCreation.cs

Pre-existing, but SchemaCreation.cs was substantially changed in this PR -- good time to convert the catch-block Console.WriteLine calls to ILogger.

Summary

The architecture is clean and the ADRs give future maintainers the context they need. The main work remaining before this leaves DRAFT is integration-level test coverage for the migration runner paths. The bootstrap detection logic is the riskiest code (runs on every existing deployment's first upgrade) and has no test net currently.

Items 2 (overflow) and 5 (redundant history check) are the most actionable code fixes. Everything else is polish.

iancooper and others added 9 commits May 4, 2026 15:20
Extract MySqlMigrationLockName.For(tableName) helper. Names <=46 chars keep
the historical `BrighterMigration_<name>` format (preserves interlock with
running deployments holding a lock under the old name); longer names get a
SHA-256 hashed suffix that guarantees <=64 chars (MySQL GET_LOCK limit per
ER_USER_LOCK_WRONG_NAME from 5.7.5+) and remains collision-resistant
across distinct tables sharing a long common prefix. Both call sites in
MySqlBoxMigrationRunner (Acquire + Release) delegate to the helper.

3 new unit facts in MySqlMigrationLockNameTests cover short-form
preservation, length-cap on long names, and collision resistance for two
long names sharing a 46-char prefix. 30/30 MySQL BoxProvisioning tests
green sequentially on net9.0 against live MySQL.

Closes Boy Scout item A from PR #4039 reviews #46 M1 / #45 M1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EnsureHistoryTableAsync's IF NOT EXISTS filtered sys.tables by name only,
so any unrelated schema containing a [__BrighterMigrationHistory] would
make the check misfire and skip the [dbo] create — leaving subsequent
unqualified INSERT/SELECT statements to fail with SqlException 208
"Invalid object name". Now filter by schema_id = SCHEMA_ID('dbo') and
schema-qualify every history-table reference (CREATE, SELECT, INSERT)
in MsSqlBoxMigrationRunner + MsSqlBoxDetectionHelpers. New
HISTORY_TABLE_SCHEMA = "dbo" const documents the design intent that the
history table is global, regardless of the configured box schema.

Test: When_history_table_exists_in_a_non_dbo_schema_runner_should_still_create_it_in_dbo
pre-creates [stage_for_history_clash_test].[__BrighterMigrationHistory]
with a deliberately wrong shape, drops the dbo history table, runs the
provisioner, asserts no exception + dbo table created + 1 history row +
colliding stage table untouched. Self-restoring DisposeAsync. Sequential
execution required (per branch convention).

31/31 MSSQL BoxProvisioning tests green sequentially on net9.0 + net10.0
against live azure-sql-edge.

Closes Boy Scout item B from PR #4039 reviews #46 M3 / #39 B4 /
#42 #6 (was R9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every reference to "__BrighterMigrationHistory" in the Postgres runner +
detection helpers was unqualified, so an unqualified CREATE/SELECT/INSERT
resolved through the connection's search_path. A connection whose
search_path put a non-public schema first (or a colliding history table
in any earlier-resolved schema) would scatter history rows across the
cluster — or hit a wrong-shape table and raise PostgresException 42703
"undefined column". Now schema-qualify every history-table reference with
"public" in PostgreSqlBoxMigrationRunner + PostgreSqlBoxDetectionHelpers.
New HISTORY_TABLE_SCHEMA = "public" const documents the design intent
that the history table is global, regardless of the configured box schema.

Test: When_history_table_exists_in_a_non_public_schema_runner_should_still_create_it_in_public
opens the runner connection with `Search Path=stage_for_history_clash_test,public`,
pre-creates the colliding bogus-shape table in stage_for_history_clash_test,
drops public's history table, runs the provisioner, asserts no exception
+ public table created + 1 history row for the box + colliding stage
table untouched. Self-restoring DisposeAsync. Sequential execution
required (per branch convention).

30/30 Postgres BoxProvisioning tests green sequentially on net9.0 +
net10.0 against live postgres container.

Closes Boy Scout item C from PR #4039 review #46 N3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… guard

sp_getapplock takes @LockTimeout as a SQL Server INT (milliseconds), so
any TimeSpan whose TotalMilliseconds exceeds int.MaxValue (~24.85 days)
silently overflows when cast and may produce -1 — which sp_getapplock
interprets as "wait indefinitely". Negative timeouts overflow into
sp_getapplock's reserved-value range too. Validate at construction with
a static ValidateLockTimeout that rejects both overflow and negative
inputs with ArgumentOutOfRangeException; the validated value is held in
_lockTimeout and used by AcquireLockAsync in place of the primary-ctor
parameter.

Tests in MsSqlBoxMigrationRunnerLockTimeoutValidationTests cover three
fact rows: overflow above int.MaxValue throws, negative throws, and the
boundary value at int.MaxValue ms is accepted.

34/34 MSSQL BoxProvisioning tests green sequentially on net9.0 + net10.0
against live azure-sql-edge.

Closes Boy Scout item E from PR #4039 review #47 #2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IAmABoxMigration.LogicalColumns and BoxMigration.LogicalColumns were
exposed as ISet<string> — mutable through the public surface (Add /
Remove / Clear), contradicting the documented "populate once at
construction and never mutate" invariant. Tighten to
IReadOnlyCollection<string>: callers can still enumerate and the runner's
IsSupersetOf / SetEquals continue to work because they accept
IEnumerable<T> arguments. Implementations stay backed by HashSet<string>
with the backend-appropriate StringComparer (Ordinal vs OrdinalIgnoreCase
per ADR 0057 §1) — only the public seam is read-only.

Each backend's private static `Cumulative(int upToVersion)` helper
returned ISet<string>, which is NOT assignable to IReadOnlyCollection<T>
(ICollection<T> doesn't inherit from IReadOnlyCollection<T>); switched
all 7 helpers to return IReadOnlyCollection<string> — bodies unchanged
because HashSet<T> IS IReadOnlyCollection<T>. The release_notes.md entry
was already forward-written documenting the IReadOnlyCollection<string>
contract, so this commit aligns the implementation with the already-
published source-break — no additional release_notes update needed.

Tests in LogicalColumnsPublicApiTests verify the property declared types
on both IAmABoxMigration and BoxMigration via reflection. 120/120
BoxProvisioning tests green sequentially across MSSQL/Postgres/MySQL/
SQLite on net9.0; Spanner builds clean.

Closes Boy Scout item F from PR #4039 review #46 M5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the 12 review-driven items surfaced from re-reading PR #4039
Claude reviews after Phase 7 closed. Tier 1 correctness items A/B/C/E
all closed (commits d71162e, 7c6b32f, 950a12b, be910cf). Tier 2
public-API source-break has F closed (b8a629d); G remains. Tiers 3-5
(H/I/D/K/J/L) all pending. Phase 8 sits after the Acceptance checklist
so the spec retains a permanent record of the post-merge-review work
without disturbing the AC-* numbering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-lock-timeout API

Removes the `TimeSpan? migrationLockTimeout = null` parameter from
`BrighterBuilderBoxProvisioningExtensions.UseBoxProvisioning`; the timeout is
set exclusively through `BoxProvisioningOptions.MigrationLockTimeout` inside
the configure delegate. The previous dual surface had a real ordering bug —
the parameter was applied to options BEFORE the delegate ran, so a delegate
that called `AddXxxOutbox(...)` and then assigned `opts.MigrationLockTimeout`
would silently lose the assignment because backend extensions capture the
timeout at registration time.

New `UseBoxProvisioningPublicApiTests` (3 facts) pins the consolidated
signature via reflection: exactly one overload, parameters
`(IBrighterBuilder, Action<BoxProvisioningOptions>)`, no parameter named
`migrationLockTimeout`. RED before fix (2/3 fail — extra `TimeSpan?`
parameter present); GREEN after. All 6 in-tree call sites (2 tests, 4
samples) used the default and required no source change.

`release_notes.md` extended under spec 0027 with a "Source-breaking change:
`UseBoxProvisioning` overload consolidation" subsection that includes a
before/after migration snippet for downstream callers.

Closes review #47 #6 / #37 #5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ard (4 backends)

Each `RunFreshPathAsync` in the MSSQL / Postgres / MySQL / SQLite runners now
checks `migrations[0].Version != 1` after the empty-list guard and throws
`ConfigurationException` with an actionable "the first migration must be V1,
but the supplied migrations list starts at V{n}" message before any DDL
fires. Without the guard, a misordered or filtered list (e.g. callers
passing `realMigrations.Skip(1)`) would silently execute V2's
`ALTER TABLE ... ADD COLUMN` against the not-yet-created box table — the
runner surfaced the provider's opaque "object not found" exception
(SQL 208 / 42P01 / 1146 / SQLITE_ERROR=1) instead of the actionable
misconfiguration error.

SQLite's variant uses `'{tableName}'` only (no schema concept). Spanner
runner is exempt: degenerate per ADR §6, ignores the migrations parameter.

4 new RED-then-GREEN integration tests
`When_{mssql,postgres,mysql,sqlite}_runner_fresh_path_is_called_with_migrations_not_starting_at_v1_it_should_throw`
exercise the guard with a `realMigrations.Skip(1)` list (first entry V2).
Each test asserts: `ConfigurationException` thrown, message contains "V1",
and the box table was NOT created (the guard fires before any DDL, the
surrounding transaction also rolls back `EnsureHistoryTableAsync`'s create
on MSSQL/PG/SQLite; MySQL has implicit per-DDL commit but the history-
table create is idempotent so harmless). All 4 tests verified RED first by
reverting the impls; then GREEN after re-applying. Backend BoxProvisioning
test counts: MSSQL 35/35, Postgres 31/31, MySQL 31/31, SQLite 27/27 green
sequentially per TFM (MSSQL/PG/SQLite on net9.0 + net10.0; MySQL is
net9.0-only). Pre-existing multi-TFM-parallel flake on
`When_history_table_exists_in_a_non_public_schema...` and
`When_two_postgres_provisioners_race_on_legacy_table` confirmed unrelated:
both pass per-TFM sequential, fail only when net9.0+net10.0 race the
shared Postgres DB.

Closes review #46 N1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ding (4 backends)

Each `MigrateAsync` in the MSSQL / Postgres / MySQL / SQLite runners now
calls a new private static `ValidateMigrationsMonotonic` helper as its
first action — before opening a connection — and throws
`ConfigurationException` when the supplied migrations list is not
contiguous and strictly ascending (i.e. each `V_{i+1} == V_i + 1`).
Catches duplicates, gaps, and out-of-order pairs uniformly. The message
names the offending pair: `"Migration list for '{schema}.{table}' is not
contiguous and ascending: V{prev} followed by V{curr} (expected
V{prev+1})."` (SQLite variant uses just `'{table}'` — no schema concept).

Validation sits at MigrateAsync entry rather than inside one of the path
branches so the rule applies uniformly across fresh / bootstrap / normal
paths. A malformed list corrupts any of them: history-table PK violation
on duplicate inserts, silently-skipped ALTERs that V_latest depends on,
double-applied DDL, or stamping rows in the wrong sequence. Without the
guard, a misordered list (e.g. caller appending V2 to a [V1, V3] list, or
filtering a List with `Where(m => m.Version != 2)`) would surface as the
provider's opaque PK-violation or syntax error rather than the actionable
misconfiguration.

Empty and single-element lists pass through unchanged (the existing
`RunFreshPathAsync` empty-list contract is preserved; a single-element
list trivially has no pair to compare). Item H's `migrations[0].Version
== 1` fresh-path guard is complementary, not redundant — Item H rejects
non-V1-rooted lists (any path); Item I rejects malformed pairwise
sequences. Spanner runner is exempt: degenerate per ADR §6, ignores the
migrations parameter.

4 new RED-then-GREEN integration tests
`When_{mssql,postgres,mysql,sqlite}_runner_is_called_with_non_monotonic_migrations_it_should_throw`
each contain 3 `[Fact]`s plus a parametrised helper:

- `Should_throw_when_versions_contain_a_duplicate` — list is `[V1, V1]`,
  asserts `"V1 followed by V1"` in the exception message.
- `Should_throw_when_versions_have_a_gap` — list is `[V1, V3]`, asserts
  `"V1 followed by V3"`.
- `Should_throw_when_versions_are_not_strictly_ascending` — list is
  `[V1, V2, V3, V2]` (valid prefix isolates the V3→V2 descent as the
  sole violation), asserts `"V3 followed by V2"`.

Each `[Fact]` also asserts the box table was NOT created (proves the
guard fires before any DDL). All 3 malformed lists start at V1 so Item
H's V1-must-be-first guard does not pre-empt the new check. Lists are
constructed by indexing `realMigrations` (the live V1..V7 chain) so the
test stays anchored to the production migration set.

The descending case originally used `[V1, V3, V2]`, which was triggered
by the V1→V3 gap rather than the V3→V2 descent (caught by initial GREEN
sweep on SQLite); fixed to `[V1, V2, V3, V2]` so the descent is the
first and only violation.

Backend BoxProvisioning test counts after this commit (sequential per
TFM):
- MSSQL: 38/38 net9.0 + 38/38 net10.0 (was 35/35; +3 facts)
- Postgres: 34/34 net9.0 + 34/34 net10.0 (was 31/31; +3)
- MySQL: 34/34 net9.0 (net9-only project; was 31/31; +3)
- SQLite: 30/30 net9.0 + 30/30 net10.0 (was 27/27; +3)
- Shared `BoxProvisioning.Tests`: 9/9 net9.0 + 9/9 net10.0 (no change)

All sweeps run with `--settings /tmp/xunit-seq.runsettings` against live
docker-compose containers (MSSQL/PG/MySQL on local ports
11433/5432/3306; SQLite in-process). Closes review #46 N2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Failed
New code is healthy (2 new files with code health below 9.00)
Enforce critical code health rules (1 file with Bumpy Road Ahead)
Enforce advisory code health rules (17 files with Code Duplication, Missing Arguments Abstractions, Primitive Obsession, Excess Number of Function Arguments)

Gates Passed
1 Quality Gates Passed

See analysis details in CodeScene

Reason for failure
New code is healthy Violations Code Health Impact
MsSqlBoxMigrationRunner.cs 4 rules 8.55 Suppress
PostgreSqlBoxMigrationRunner.cs 4 rules 8.55 Suppress
Enforce critical code health rules Violations Code Health Impact
SqliteBoxMigrationRunner.cs 1 critical rule 9.54 Suppress
Enforce advisory code health rules Violations Code Health Impact
MsSqlBoxMigrationRunner.cs 4 advisory rules 8.55 Suppress
PostgreSqlBoxMigrationRunner.cs 4 advisory rules 8.55 Suppress
MySqlBoxMigrationRunner.cs 2 advisory rules 9.39 Suppress
SpannerBoxMigrationRunner.cs 2 advisory rules 9.39 Suppress
SqliteBoxMigrationRunner.cs 1 advisory rule 9.54 Suppress
MsSqlBoxProvisioningExtensions.cs 1 advisory rule 9.69 Suppress
MsSqlPayloadModeValidator.cs 1 advisory rule 9.69 Suppress
PostgreSqlBoxProvisioningExtensions.cs 1 advisory rule 9.69 Suppress
MySqlBoxProvisioningExtensions.cs 1 advisory rule 9.69 Suppress
MySqlPayloadModeValidator.cs 1 advisory rule 9.69 Suppress
PostgreSqlPayloadModeValidator.cs 1 advisory rule 9.69 Suppress
SpannerPayloadModeValidator.cs 1 advisory rule 9.69 Suppress
SqlitePayloadModeValidator.cs 1 advisory rule 9.69 Suppress
MsSqlBoxDetectionHelpers.cs 1 advisory rule 9.69 Suppress
MySqlBoxDetectionHelpers.cs 1 advisory rule 9.69 Suppress
PostgreSqlBoxDetectionHelpers.cs 1 advisory rule 9.69 Suppress
SqliteBoxDetectionHelpers.cs 1 advisory rule 9.69 Suppress

Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

Comment on lines +320 to +336
private static async Task<bool> IsMigrationAppliedAsync(
SqlConnection connection, SqlTransaction transaction,
string schemaName, string tableName,
int version, CancellationToken cancellationToken)
{
using var command = connection.CreateCommand();
command.Transaction = transaction;
command.CommandText = $@"
SELECT COUNT(1) FROM [{HISTORY_TABLE_SCHEMA}].[{MIGRATION_HISTORY_TABLE}]
WHERE [SchemaName] = @SchemaName AND [BoxTableName] = @BoxTableName AND [MigrationVersion] = @Version";
command.Parameters.AddWithValue("@SchemaName", schemaName);
command.Parameters.AddWithValue("@BoxTableName", tableName);
command.Parameters.AddWithValue("@Version", version);

var count = (int)(await command.ExecuteScalarAsync(cancellationToken))!;
return count > 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Code Duplication
The module contains 2 functions with similar structure: InsertHistoryRowAsync,IsMigrationAppliedAsync

Suppress

@@ -0,0 +1,355 @@
#region Licence
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Missing Arguments Abstractions
The average number of function arguments in this module is 4.82 across 11 functions. The average arguments threshold is 4.00

Suppress

@@ -0,0 +1,355 @@
#region Licence
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Primitive Obsession
In this module, 34.0% of all function arguments are primitive types, threshold = 30.0%

Suppress

Comment on lines +305 to +321
private static async Task<bool> IsMigrationAppliedAsync(
NpgsqlConnection connection, NpgsqlTransaction transaction,
string schemaName, string tableName,
int version, CancellationToken cancellationToken)
{
using var command = connection.CreateCommand();
command.Transaction = transaction;
command.CommandText = $@"
SELECT COUNT(1) FROM ""{HISTORY_TABLE_SCHEMA}"".""{MIGRATION_HISTORY_TABLE}""
WHERE ""SchemaName"" = @SchemaName AND ""BoxTableName"" = @BoxTableName AND ""MigrationVersion"" = @Version";
command.Parameters.AddWithValue("@SchemaName", schemaName);
command.Parameters.AddWithValue("@BoxTableName", tableName);
command.Parameters.AddWithValue("@Version", version);

var count = (long)(await command.ExecuteScalarAsync(cancellationToken))!;
return count > 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Code Duplication
The module contains 2 functions with similar structure: InsertHistoryRowAsync,IsMigrationAppliedAsync

Suppress

@@ -0,0 +1,340 @@
#region Licence
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Missing Arguments Abstractions
The average number of function arguments in this module is 4.91 across 11 functions. The average arguments threshold is 4.00

Suppress

@@ -0,0 +1,340 @@
#region Licence
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Primitive Obsession
In this module, 35.2% of all function arguments are primitive types, threshold = 30.0%

Suppress

@@ -0,0 +1,320 @@
#region Licence
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Primitive Obsession
In this module, 41.7% of all function arguments are primitive types, threshold = 30.0%

Suppress

@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Code Review — PR #4039: Database Utilities for our boxes

Note: This PR is a DRAFT. Review is offered early to surface issues before the final pass.


Overview

This is a well-architected feature that addresses a genuine usability gap: the complexity of managing Brighter's inbox/outbox schema lifecycle. The bundling of specs 0023 and 0027 makes sense — shipping 0023 without 0027 would have left the bootstrap path broken for pre-DataRef tables.

The three-path dispatch (fresh / bootstrap / normal), the TOCTOU-defeating re-detection under lock, and the per-backend lock strategies are all solid design decisions clearly grounded in the ADRs. The drift-detection test utility (DdlColumnExtractor) is a clever way to keep LogicalColumns declarations honest without duplicating schema definitions.


Issues

🐛 Bug: XML doc comment misattributed in MySqlBoxMigrationRunner.cs

The <summary> block that describes EnsureAllowUserVariables is physically placed on ValidateMigrationsMonotonic (line ~287). The actual EnsureAllowUserVariables method at the bottom of the file has no doc comment. This is clearly a copy-paste slip but it will appear in generated API docs on the wrong method.

/// <summary>
/// Ensures the connection string sets <c>AllowUserVariables=true</c>. ...
/// </summary>
private static void ValidateMigrationsMonotonic(   // ← wrong target

⚠️ Potential issue: MySQL lock timeout precision silently truncated

var timeoutSeconds = (int)lockTimeout.TotalSeconds;

TotalSeconds returns a double, and casting to int truncates toward zero. If the caller passes TimeSpan.FromSeconds(30.9) the effective timeout becomes 30 s. Consider (int)Math.Ceiling(lockTimeout.TotalSeconds) to avoid giving users less time than they asked for, or document that sub-second precision is discarded.


⚠️ Potential issue: Spanner provisioners hardcode column sets independently of builder DDL

SpannerOutboxProvisioner and SpannerInboxProvisioner each contain hardcoded column name sets used for bootstrap detection (e.g., the 21-column Spanner outbox set). These are not derived from the SpannerOutboxBuilder/SpannerInboxBuilder DDL or from the migrations list.

If a future V8 migration adds a column to the Spanner outbox, these hardcoded sets become stale and bootstrap detection will silently downgrade (returning an earlier version or 0). The other backends avoid this by deriving LogicalColumns from the migration objects themselves. Spanner's degenerate-fresh-only design makes this harder to fix cleanly, but the risk should at least be called out in the code with a // NOTE: must be kept in sync with SpannerOutboxBuilder comment, or ideally driven from the builder.


⚠️ Concern: PostgreSQL advisory lock namespace (74726) could collide

The lock key 74726 is used as the first argument to pg_try_advisory_lock. PostgreSQL advisory locks are instance-wide — any other library or application in the same Postgres instance that happens to use 74726 (perhaps another framework with a similar magic number derivation) will block or be blocked. ADR 0057 should document that this value was chosen deliberately and record what it represents. If it is derived from a hash of "BRIGHTER" or similar, showing the derivation makes it defensible.


⚠️ Concern: No integration tests for migration runners

The tests shipped with this PR are:

  • DdlColumnExtractor unit tests (DDL parsing)
  • Two reflection-based public API tests (LogicalColumns type, UseBoxProvisioning signature)

The migration runners contain the most complex and risky logic in the PR — the three-path dispatch, lock acquisition, history table management, and idempotency guarantees — and none of it is covered by integration tests. A mid-chain MySQL failure, a TOCTOU race, or an incorrect LogicalColumns declaration in a migration would not be caught.

I understand integration tests against real databases are expensive to run, but given the breaking-change risk and the number of backends, at minimum consider:

  • SQLite integration tests (no external dependency, fast) covering all three paths for both box types.
  • A drift test that compares LogicalColumns union across all versions against the V1 builder DDL for each backend (the DdlColumnExtractor infrastructure is already there).

⚠️ Concern: No tests for PayloadModeValidator or BoxProvisioningHostedService

MsSqlPayloadModeValidator, PostgreSqlPayloadModeValidator, etc., throw ConfigurationException when binary/text mode mismatches existing schema. These are startup-fail-fast paths — getting them wrong would silently brick an application on boot. They warrant at least unit-level tests with a mock/stub connection that returns the expected INFORMATION_SCHEMA shape.

Similarly, BoxProvisioningHostedService orchestrates provisioner ordering (outbox before inbox) and wraps exceptions — some unit tests for its failure modes would add confidence.


Observations / Minor Points

MySQL connection string mutation (EnsureAllowUserVariables)
This is well-documented in the code, but it is a side effect that could surprise users who discover it via connection pool metrics or audit logs. Worth a one-liner in the public extension method XML doc as well (not just the internal method).

InsertHistoryRowAsync in MySQL bootstrap path has no pre-existence check
In RunBootstrapPathAsync, the bootstrap history row is inserted without first checking for a duplicate. The code is correct because the GET_LOCK ensures single-writer access, but a comment explaining why the IsMigrationAppliedAsync guard present in the normal path is intentionally absent here would help reviewers and future maintainers.

DdlColumnExtractor does not handle inline SQL comments
Column declarations containing -- trailing comments or /* ... */ blocks would likely confuse the scanner. This is acceptable for the current use (all builder DDL is under team control), but is worth noting in the class summary so nobody feeds it arbitrary DDL.

SQLite PRAGMA busy_timeout = 0 is connection-scoped
Since each runner opens its own using var connection, the pragma doesn't leak — good. A brief comment confirming this is intentional (as opposed to accidentally affecting a long-lived connection) would aid future readers.


Summary

The architecture is sound and the ADR coverage is commendable. The main pre-merge risks are:

  1. The misattributed XML doc (trivially fixed).
  2. Absent integration tests — especially for the three-path dispatch logic. SQLite requires no external infrastructure and would cover a large portion of the shared logic.
  3. Spanner column set maintenance risk — hardcoded lists will silently drift if the schema evolves.

The breaking change to IAmARelationalDatabaseConfiguration is appropriately flagged and the migration path (return null) is clear. No security concerns identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant