fix(db): V39_5 test fixture — unblock V40 in CI (audit_logs_pkey collision)#101
Merged
Merged
Conversation
V40 (partition audit_logs) renames audit_logs to audit_logs_legacy, then creates a new partitioned audit_logs root with a composite PK named audit_logs_pkey. On a fresh Testcontainers DB this fails with: ERROR: relation "audit_logs_pkey" already exists Location: db/migration/V40__partition_audit_logs.sql because PG keeps the V5-declared inline PK index/constraint name attached to the renamed (legacy) table. The new composite-PK ADD collides on the shared name space. Prod does not hit this: V40 was BASELINE-SKIPPED in 2026-04 (see V57 header comment + DB review 2026-04-30 §4), so the migration is recorded as success=t in flyway_schema_history but never ran. Prod's audit_logs is still relkind='r' and V57 (idempotent partman-aware partitioning) is the prod-side conversion path; it's fail-soft when partman is missing (current pgvector/pgvector:pg17 image) so deploys are safe. CI runs every migration end-to-end on a fresh DB, so it hits the V40 PG name collision. The Integration tests (Testcontainers) CI job has been red on every PR (including #99 V61 and #100 P0 bundle) since pre-existing 2026-05-04 baseline; it's continue-on-error so it does not block merges but is permanent noise on every PR's checks page. Fix: test-only fixture V39_5 loaded only by the integration profile via spring.flyway.locations=classpath:db/migration,classpath:db/test-fixtures. It renames audit_logs_pkey -> audit_logs_v5_pkey before V40 runs, freeing the global name. Idempotent: only renames when audit_logs is still a plain heap with the V5-default constraint name. Matches the V28_5__align_default_login_flow_for_v29.sql precedent exactly (test-only fixture pre-fixing state for a checksum-locked production migration). Zero prod surface area: the file lives in src/test/resources/db/test-fixtures/ which is never on the prod classpath. Verified locally against pgvector/pgvector:pg17: - V5-style CREATE TABLE + V40-style RENAME+CREATE+ADD-CONSTRAINT reproduces ERROR exactly. - Same sequence with V39_5 inserted between V5 and V40 succeeds; both the legacy table (audit_logs_v5_pkey) and the new partitioned root (audit_logs_pkey) coexist with distinct constraint names. - Idempotency confirmed: re-running V39_5 against a post-V40 state falls through the IF guards and is a no-op. Why not edit V40 in place: three previous in-place V40 fixes (commits 2b00a7b, a30287b, 81101f0) were each reverted (43d0050, 614a94f, f404d12) because V40's checksum is locked in prod flyway_schema_history. Why not a new V62 production migration: V57 already provides the prod-side path (idempotent heap->partitioned conversion + partman setup, fail-soft when partman absent). A V62 would duplicate V57. The CI-side problem is purely the V40 in-place hard failure, addressed surgically by this test fixture. Unblocks the Integration tests check on PR #99 (V61 audit_logs.tenant_id NOT NULL) and #100 (P0/P1 senior-review bundle). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_5 yaml edit The previous commit added 6 lines of comment to the spring.flyway section of application-integration.yml, shifting the test-only jwt.secret / totp.enc-key lines from 55,62 to 61,68. Update the .gitleaksignore fingerprints to match so the scan job stays green. Secrets unchanged (both are pre-existing test-only fixtures, allowlisted since the file was added). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary (UPDATED 2026-05-12 after first CI run)
V40 (
partition_audit_logs.sql) has two latent bugs that surface when run end-to-end on a fresh Testcontainers DB. This PR resolves the first one structurally; the second is documented for operator decision because the brief's constraints do not permit a code-side fix.Bug #1:
audit_logs_pkeyname collision (FIXED in this PR)V5 declares
audit_logs.id UUID PRIMARY KEYinline → PG auto-names the PK constraint and indexaudit_logs_pkey. V40 then:ALTER TABLE audit_logs RENAME TO audit_logs_legacy(constraint name is NOT renamed, stays attached to legacy)CREATE TABLE audit_logs ... PARTITION BY RANGE (created_at)(new heap)ADD CONSTRAINT audit_logs_pkey PRIMARY KEY (id, created_at)(collision)ERROR: relation "audit_logs_pkey" already existsFix: test-only fixture
V39_5__rename_audit_logs_pkey_for_v40.sql(loaded only byintegrationprofile viaspring.flyway.locations=classpath:db/migration,classpath:db/test-fixtures) renamesaudit_logs_pkey→audit_logs_v5_pkeybefore V40 runs. Idempotent. Matches theV28_5precedent exactly.CI log from this PR's first run (commit 20d61ea) confirms the fixture executes correctly:
Bug #2:
||operator insideCOMMENT ON TABLE(NOT fixed; operator decision required)After my fix freed the pkey name, V40 progressed further and surfaced a SECOND bug at line 270:
PG grammar for
COMMENT ON ... ISrequires a single string literal, not an expression. The||concatenation operator is invalid in DDL context:I reproduced this locally against
pgvector/pgvector:pg17:This means V40 cannot succeed on any clean DB anywhere. Prod escaped this because V40 was BASELINE-SKIPPED in 2026-04 (per V57 header comment +
BACKEND_REVIEW_2026-04-30§4); the migration is recorded as success=t in prod'sflyway_schema_historybut its body never ran.Why this PR cannot also fix Bug #2
The task brief explicitly forbids:
flyway repair.flyway_schema_historydirectly in prod or in test fixtures" — the only way to mark V40 as success without running it.Bug #2 has no purely additive fix:
IF NOT EXISTS (audit_logs heap) THEN RAISE EXCEPTIONwould fire → V40 marked failure → subsequent migrations don't run. Same blocker.baselineVersion=40: would also baseline V1-V39 (single point), leaving the test DB schemaless for V41+.target=39: skips V40 AND V41+; loses the entire subsequent migration chain.beforeMigrate.sql): would need to manipulate flyway_schema_history, forbidden by brief.Recommended next step (operator decision)
The narrowly-scoped fix is to edit V40 line 270-273 in place to use a single string literal (no
||), commit it, then runmvn flyway:repairagainst prod to refresh V40's checksum inflyway_schema_history. This is a 2-line code change + 1 operator command. The brief's "don't edit V40 in place" rule was written before Bug #2 was discovered and likely needs explicit operator approval to relax.Alternative: accept current state — Integration tests stays continue-on-error indefinitely, but Bug #1 is now off the critical path. When the operator deploys Option A (custom
pgvector+partmanimage,OPERATOR_ACTIONS_2026-05-12.mditem 1) and rebuilds api, V57 will idempotently convert prod's heap → partitioned regardless of V40's state.Why not edit V40 in place in THIS PR
Three previous in-place V40 patches were each reverted:
The reverts predate today's brief and may not have known about Bug #2. The brief's explicit "Don't rewrite V40 in place" is what I am honoring here. If the operator chooses to relax that on review, a 2-line follow-up patch + flyway repair clears CI fully.
Why not a new V62 production migration
V57 (
audit_logs_pg_partman.sql, merged in PR #68) already provides the prod-side conversion path: idempotent detection ofrelkind='r', full conversion when pg_partman is available, fail-soft when it isn't. A V62 duplicating that logic would be dead weight. And V62 would never RUN in CI anyway, because V40 fails first.Verification
Local SQL simulation (pre-CI):
CI run on this PR (commit 20d61ea):
||syntax error (Bug Add comprehensive README documentation for identity-core-api #2).Idempotency:
relkind='r'+ constraint name match. Re-running against a post-V40 DB falls through and no-ops. Verified locally.Coordination
flyway_schema_history.Deployment notes (operator)
feedback_audit_delta_before_rebuildstill applies whenever the next prod rebuild is scheduled.pgvector+partmanimage (OPERATOR_ACTIONS_2026-05-12.mditem 1) — is independent of this fix. V57's heap→partitioned conversion remains the prod-side path.Test plan
pgvector/pgvector:pg17locally — confirmed.||inCOMMENT ON ... IS) locally — confirmed unfixable in-grammar.mvn test-compilesucceeds — confirmed.flyway repairto resolve Bug Add comprehensive README documentation for identity-core-api #2 fully, OR accept partial fix with Integration tests staying continue-on-error.🤖 Generated with Claude Code