Skip to content

fix(lang-var-migration): use JDBC savepoint to isolate unique_fields failures#35565

Merged
dsilvam merged 2 commits intomainfrom
fix/lang-var-migration-unique-fields-cascade
May 5, 2026
Merged

fix(lang-var-migration): use JDBC savepoint to isolate unique_fields failures#35565
dsilvam merged 2 commits intomainfrom
fix/lang-var-migration-unique-fields-cascade

Conversation

@fabrizzio-dotCMS
Copy link
Copy Markdown
Member

@fabrizzio-dotCMS fabrizzio-dotCMS commented May 5, 2026

Problem

Closes #35568
Related to support ticket: https://helpdesk.dotcms.com/a/tickets/35427

When StartupTasksExecutor.executeDataUpgrades() wraps the entire migration task in a single PostgreSQL transaction via HibernateUtil.startTransaction(), a duplicate-key violation in unique_fields aborts the entire shared connection. Every subsequent variable then fails with:

ERROR: current transaction is aborted, commands ignored until end of transaction block

even though only the first variable had a conflict. In production this caused hundreds of Language Variables to fail in cascade after a single collision.

Root Cause of the Orphan

Orphaned unique_fields rows exist because DBUniqueFieldValidationStrategy.innerValidate() is annotated @CloseDBIfOpened, which forces the INSERT INTO unique_fields to commit on a separate connection independently of the outer contentlet transaction. If that outer transaction rolls back, the unique_fields row persists. Root cause fix is in PR #35567 (issue #35566).

Fix

Added publishWithSavepoint() in LegacyLangVarMigrationHelper. Before each publish() call:

  1. A JDBC savepoint is set on the current connection.
  2. On failure: conn.rollback(savepoint) resets the PostgreSQL connection from aborted back to active, without touching the outer transaction.
  3. On success: the savepoint is released.

Only the conflicting variable is skipped; all remaining variables in the file continue normally.

Test

Added testMigrationVariableFailureDoesNotCascadeToSubsequentVariables() which:

  1. Starts an outer transaction reproducing the StartupTasksExecutor context.
  2. Pre-inserts an orphaned unique_fields row for key_conflict.
  3. Runs the migration with a two-entry file (key_conflict, key_valid).
  4. Asserts key_conflict fails (expected) and key_valid succeeds (was also failing before fix).

Verified: test fails before the fix, passes after.

Scope

Only LegacyLangVarMigrationHelper.java and the integration test are modified. No changes to unique_fields infrastructure.

Related

Test plan

  • Task240306MigrateLegacyLanguageVariablesTest#testMigrationVariableFailureDoesNotCascadeToSubsequentVariables — must pass
  • Full Task240306MigrateLegacyLanguageVariablesTest suite — no regressions
  • testExecuteUpgrade, testDataTaskIdempotency, testDropThenRecreateLanguageVariableContentType — must pass

🤖 Generated with Claude Code

…failures

When StartupTasksExecutor wraps the migration in a single PostgreSQL
transaction, a duplicate-key violation in unique_fields aborts the entire
connection, causing all subsequent variables to fail with "current
transaction is aborted". Using a JDBC savepoint before each publish()
resets the connection state after a failure so only the conflicting
variable is skipped.

Adds a regression test that reproduces the production cascade failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

❌ Issue Linking Required

This PR could not be linked to an issue. All PRs must be linked to an issue for tracking purposes.

How to fix this:

Option 1: Add keyword to PR body (Recommended - auto-removes this comment)
Edit this PR description and add one of these lines:

  • This PR fixes #123 or Fixes: #123

  • This PR closes #123 or Closes: #123

  • This PR resolves #123 or Resolves: #123

  • Other supported keywords: fix, fixed, close, closed, resolve, resolved
    Option 2: Link via GitHub UI (Note: won't clear the failed check)

  1. Go to the PR → Development section (right sidebar)

  2. Click "Link issue" and select an existing issue

  3. Push a new commit or re-run the workflow to clear the failed check
    Option 3: Use branch naming
    Create a new branch with one of these patterns:

  • 123-feature-description (number at start)

  • issue-123-feature-description (issue-number at start)

  • feature-issue-123 (issue-number anywhere)

Why is this required?

Issue linking ensures proper tracking, documentation, and helps maintain project history. It connects your code changes to the problem they solve.---

This comment was automatically generated by the issue linking workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Claude finished @fabrizzio-dotCMS's task in 2m 45s —— View job


PR Review: fix(lang-var-migration): JDBC savepoint isolation

Overall: The fix is conceptually correct and well-targeted. The savepoint approach is appropriate for this problem. Several implementation issues worth addressing.


Issues

1. Duplicate catch blocks — use finally instead

LegacyLangVarMigrationHelper.java lines 378–391: Both catch blocks do identical savepoint rollback logic. If publish() throws an Error (e.g., OutOfMemoryError), neither fires and the connection stays in aborted state.

Replace with a finally block:

private Contentlet publishWithSavepoint(final Contentlet contentlet) throws DotSecurityException, DotDataException {
    final Connection conn = Try.of(DbConnectionFactory::getConnection).getOrNull();
    final Savepoint sp = conn != null ? Try.of(() -> conn.setSavepoint()).getOrNull() : null;
    boolean success = false;
    try {
        final Contentlet result = this.publish(contentlet);
        success = true;
        return result;
    } finally {
        if (sp != null) {
            if (success) {
                Try.run(() -> conn.releaseSavepoint(sp))
                        .onFailure(ex -> Logger.warn(this, "Could not release savepoint: " + ex.getMessage()));
            } else {
                Try.run(() -> conn.rollback(sp))
                        .onFailure(ex -> Logger.warn(this, "Could not rollback savepoint: " + ex.getMessage()));
            }
        }
    }
}

Fix this →


2. Silent degradation when savepoint cannot be set

Lines 368–369:

final Connection conn = Try.of(DbConnectionFactory::getConnection).getOrNull();
final Savepoint sp = conn != null ? Try.of(() -> conn.setSavepoint()).getOrNull() : null;

If getConnection() fails or setSavepoint() throws (e.g., connection already in auto-commit mode), sp is null and the method silently falls back to the original unsafe behavior — no warning is logged. A failed savepoint setup should at least emit a Logger.warn so the situation is visible in production logs.

Fix this →


3. Test success assertion doesn't verify key_valid specifically

Task240306MigrateLegacyLanguageVariablesTest.java (new test, near end):

assertFalse("'key_valid' must have been successfully migrated", successes.isEmpty());

successes contains inodes, not keys — there's no way to confirm the inode belongs to key_valid without querying the contentlet. But at minimum the assertion should also confirm key_valid is not missing from a content search, or restructure to check the inode is non-null after the migration call. As written, this passes if any variable was previously migrated and left a lingering entry in successSummary across test state.

Consider adding a check: after migration, look up the contentlet by language and key key_valid and assert it exists.


4. Minor: Docstring in UniqueFieldCriteria.criteria() has inverted logic

UniqueFieldCriteria.java line 113–117 (not changed in this PR, but related): The docstring says "If uniquePerSite is true, concatenate without site ID; otherwise add site ID" — which is exactly backwards from what the code does. Low risk here since the test's criteria construction is correct, but the misleading docstring could trap the next person touching this code.


Not an issue

  • DbConnectionFactory.getConnection() is thread-local and ManagedConnection/DotConnectionWrapper properly delegates setSavepoint()/rollback(Savepoint)/releaseSavepoint() to the inner connection — verified in source. The savepoint is on the right connection.
  • The test's hand-constructed criteria string (contentTypeId + fieldVar + languageId + value) matches what UniqueFieldCriteria.criteria() actually produces for uniquePerSite=false. The hash collision will fire correctly.
  • The HibernateUtil.rollbackTransaction() in the test's finally block correctly cleans up both the orphan row and any contentlets created during the migration.

@dsilvam dsilvam added this pull request to the merge queue May 5, 2026
Merged via the queue into main with commit 47956c3 May 5, 2026
51 checks passed
@dsilvam dsilvam deleted the fix/lang-var-migration-unique-fields-cascade branch May 5, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

fix(lang-var-migration): unique_fields duplicate key cascades and aborts entire migration transaction

2 participants