Skip to content

fix(account-merge): stop stale-collection save from orphaning re-parented KYC steps#3805

Merged
davidleomay merged 4 commits into
developfrom
fix/account-merge-kyc-step-reparent
Jun 3, 2026
Merged

fix(account-merge): stop stale-collection save from orphaning re-parented KYC steps#3805
davidleomay merged 4 commits into
developfrom
fix/account-merge-kyc-step-reparent

Conversation

@joshuakrueger-dfx
Copy link
Copy Markdown
Collaborator

@joshuakrueger-dfx joshuakrueger-dfx commented Jun 1, 2026

Problem (corrected per @davidleomay review)

The original framing was wrong: the concat + save(master) in mergeUserData already persists the slave→master FK re-parent (TypeORM OneToManySubjectBuilder). The real root cause of #3800 is a stale-collection save:

updateUserData loads relations: { users, kycSteps, wallet } and then save(userData). save() reconciles loaded @OneToMany collections, so when an admin edits a master whose kycSteps/users snapshot predates a merge, the builder sets the just-re-parented slave rows back to userDataId = NULL. Confirmed by 2083 orphaned kyc_step rows on merged userDatas in prod.

Fix

  • Root cause: in updateUserData, persist scalars/FKs only by excluding the loaded @OneToMany collections (kycSteps, users) from the saved object — save() can no longer orphan rows a concurrent merge re-parented. The returned entity keeps the collections for the HTTP response; user changes are persisted separately via userRepo.setUserRef.
  • Removed the redundant KycAdminService.reassignKycSteps (and its call + spec) — save(master) already persists the re-parent.
  • Kept the backfill migration BackfillMergedKycStepUserData (pairs with the fix so no new orphans appear after each merge).
  • Added a regression test asserting updateUserData's save() payload excludes kycSteps/users.

Verification

type-check, lint, format:check, build, and the full DB-free unit suite (twice, deterministic) all green; backfill is Postgres-valid (migration-psql-check).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

⚠️ Unverified Commits (1)

The following commits are not signed/verified:

  • c4f8c15 fix(account-merge): re-parent slave KYC steps to master at DB level (Joshua Krüger)
How to sign commits
# SSH signing (recommended)
git config --global gpg.format ssh
git config --global user.signingkey ~/.ssh/id_ed25519.pub
git config --global commit.gpgsign true

# Re-sign last commit
git commit --amend -S --no-edit
git push --force-with-lease

@TaprootFreak
Copy link
Copy Markdown
Collaborator

Review

Verdict: ⚠️ Changes requested

Solves #3800 for future merges (kycStepRepo.update({userData: {id: slave.id}}, {userData: {id: master.id}}) persists the FK re-parent; after commit findBy({userData: {id: master.id}}) finds the slave rows). In-memory concat kept as in-transaction defence — correct.

Blocking

1. Backfill for existing data is missing. The fix is forward-only. Already-merged UserDatas in PRD have orphan KycStep rows with userDataId = <slave.id>. Since slave is typically MERGED/deactivated post-merge, those steps stay hidden from master forever. A one-shot data-only migration is required, roughly:

UPDATE kyc_step
SET userDataId = ud.masterId
FROM user_data ud
WHERE ud.id = kyc_step.userDataId
  AND ud.status = 'Merged'
  AND ud.masterId IS NOT NULL;

Without this, #3800 remains unresolved for historical cases.

Non-blocking

2. Sequence-collision risk in mergeUserData. Slave steps are shifted to master.min - 100 + slave.seq. The unique index [userData, name, type, sequenceNumber] on kyc-step.entity.ts:22 can collide if slave had sequenceNumber ≥ 100 and master has a matching (name, type). Edge case, but not guarded and not tested.

3. Missing logger call. kyc-admin.service.ts:121 does the FK update silently — this.logger.info(\Reassigned ${result.affected} kyc steps from ${slaveId} to ${masterId}`)` would help audit trails.

4. Test coverage gaps. Spec only verifies the mock echo (update called with correct args). Missing:

  • Integration test in user-data.service.spec.ts asserting mergeUserData calls reassignKycSteps(slave.id, master.id)
  • Edge case: slave with 0 steps (no-op confirm)
  • Edge case: unique-index conflict on (name, type, sequenceNumber)

Idempotency

Safe on retry — UPDATE ... WHERE userDataId = slave.id is a no-op after the first run. No side effects.

CI: 7/7 green.

@joshuakrueger-dfx
Copy link
Copy Markdown
Collaborator Author

Thanks @TaprootFreak — addressed in 0cb523f:

Blocking #1 — backfill: added migration 1780394829872-BackfillMergedKycStepUserData.js. Note: user_data has no masterId column (the only ManyToOne(() => UserData) is accountOpener), so the suggested SQL wouldn't run. The authoritative slave→master mapping is the completed account_merge rows, so the migration does:

UPDATE "kyc_step" SET "userDataId" = am."masterId"
FROM "account_merge" am
WHERE am."slaveId" = "kyc_step"."userDataId" AND am."isCompleted" = true

looped until no orphans remain (resolves chained merges A→B→C) and idempotent. Worth a dry-run on a PRD snapshot before apply.

Non-blocking #2 — seq collision: can't actually occur on re-parent: the original mergeUserData already shifts slave sequenceNumber to master.min − 100 + seq before this runs, and seqs are per-(name,type) retry counts (tiny), so shifted slave seqs sit strictly below all master seqs → disjoint under the [userData, name, type, sequenceNumber] unique index.

Non-blocking #3 — logger: reassignKycSteps now logs the affected step count.

Suite green (72 suites).

@davidleomay
Copy link
Copy Markdown
Member

davidleomay commented Jun 2, 2026

Review

Problem statement is incorrect

The PR says save(master) only reparents kycSteps in-memory, but that's not how TypeORM works. The OneToManySubjectBuilder updates the FK on child entities during save() even without cascade: true — this is by design (typeorm#2859). The concat + save pattern at lines 1285/1342 of user-data.service.ts does persist the FK change to the DB. Production data confirms this: there are no orphaned kyc_step rows with sequenceNumber < 0 on merged userDatas — the reparent works.

The explicit reassignKycSteps is redundant

Since save(master) already reparents the FK, adding a second kycStepRepo.update() that does the same thing just makes the merge flow harder to follow with no benefit. It's not "hardening" — it's duplication.

What's actually going on?

If specific steps (the report mentions FINANCIAL_DATA) are going missing after a merge, the root cause needs to be found. The most likely explanation is a concurrent save overwriting the FK: some other code path loads master with a stale kycSteps array and calls save(), causing OneToManySubjectBuilder to see the slave steps as "removed" and undo the reparent. Adding a redundant UPDATE before the save doesn't fix this — the concurrent save would still overwrite it.

Recommend investigating which operation is causing the overwrite before adding code that masks the real bug.

Backfill migration

If there are actually orphaned steps from past merges, the migration could hit the (userDataId, name, type, sequenceNumber) unique constraint for merges before 2024-11-11 (when the sequenceNumber offset started being persisted). Worth verifying against prod data before running.

@joshuakrueger-dfx
Copy link
Copy Markdown
Collaborator Author

@davidleomay good catch — you're right, and I'll correct the PR description.

save(master) after the concat does persist the FK: OneToManySubjectBuilder updates the child FK on save() even without cascade (typeorm#2859), confirmed by prod (no orphaned sequenceNumber < 0 rows). The reparent already works on the normal path.

Reframing the PR: the explicit reassignKycSteps (via kycStepRepo.update()) is hardening against exactly the concurrent-save race you describe — a stale kycSteps array on master saved elsewhere having OneToManySubjectBuilder undo the reparent. The atomic UPDATE bypasses the subject builder, so it can't be clobbered. Belt-and-suspenders.

On the backfill migration: agreed it assumes the slave offset was persisted (post-2024-11-11 via updateKycStepInternal); pre-2024-11-11 merges applied it only in-memory, so the backfill could collide on the (userDataId, name, type, sequenceNumber) unique constraint. I'll verify against prod for pre-2024-11-11 orphaned steps and guard the backfill (skip conflicting rows) before it runs.

Joshua Krüger and others added 2 commits June 2, 2026 18:38
mergeUserData only concatenated the slave's kycSteps onto the master in-memory.
The slave rows kept userData_id = slave.id, so on the next reload of the master
they vanished and already-completed steps (e.g. FINANCIAL_DATA) were re-flagged
as missing. Add KycAdminService.reassignKycSteps and call it after the slave-step
update loop so the FK is persisted.

Closes #3800
…gnment

Address review on #3800:
- Add a data migration that re-parents KYC steps still pointing at merged-away
  (slave) accounts onto their surviving master, using the completed account_merge
  rows as the slave->master mapping (user_data has no master FK). Looped to resolve
  chained merges; idempotent.
- Log the affected step count in reassignKycSteps for audit trails.
@joshuakrueger-dfx joshuakrueger-dfx force-pushed the fix/account-merge-kyc-step-reparent branch from 0cb523f to 610abd4 Compare June 2, 2026 16:38
@joshuakrueger-dfx
Copy link
Copy Markdown
Collaborator Author

@davidleomay I dug into the merge flow with your hypothesis in mind — you're right on both counts. Walking the code:

1. reassignKycSteps is redundant

UserData.kycSteps is @OneToMany(() => KycStep, (s) => s.userData) (no cascade), KycStep.userData is the owning @ManyToOne. At the end of mergeUserData, master.kycSteps is loaded fresh (getKycSteps(masterId)) and concatenated with slave.kycSteps, then userDataRepo.save(master) runs (user-data.service.ts:1347). TypeORM's OneToManySubjectBuilder issues the FK UPDATE for the slave rows in that collection on save — exactly the typeorm#2859 behaviour you cited. So the explicit kycStepRepo.update() does the same write twice. I'll drop it.

2. The real bug is a stale-collection save, and I found the mechanism

updateUserDataInternal ends with userDataRepo.save(userData) (user-data.service.ts:386), not a scoped update(). Whenever a caller passes a userData whose kycSteps relation is loaded but stale (e.g. fetched before a merge reparented the slave steps), that save makes OneToManySubjectBuilder see the reparented steps as 'removed from the collection' and nulls their FK — the steps (FINANCIAL_DATA included) silently disappear from the master. It has 19 callers, so a stale kycSteps on any post-merge path reproduces the report. A redundant UPDATE before save(master) can't prevent this — the later stale save overwrites it, just as you said.

Proposed direction (your call as CODEOWNER)

  • Revert the redundant reassignKycSteps (forward path already correct).
  • Keep the backfill migration (now re-signed) only if we confirm orphans exist; if prod is clean as you found, I'll drop it too.
  • Fix the actual orphaning at updateUserDataInternal:386 — e.g. don't reconcile the kycSteps collection on a partial update (scope the save, or delete userData.kycSteps before save), or guarantee callers never pass a stale collection. This is the part worth a focused PR.

How would you like to scope it — fold the real fix into this PR, or close this one and open a dedicated updateUserDataInternal fix? Commits are re-signed/verified either way.

@davidleomay
Copy link
Copy Markdown
Member

davidleomay commented Jun 3, 2026

The redundant reassignKycSteps doesn't add value — save(master) already persists the FK reparent, and a stale-collection save from updateUserDataInternal would overwrite both equally. As you identified, the real bug is in updateUserDataInternal:386 where a stale kycSteps array causes OneToManySubjectBuilder to undo the reparent.

There are 2083 orphaned kyc_step rows on merged userDatas in prod, confirming the stale-save race is real. A backfill migration has value, but should be paired with the updateUserDataInternal fix — otherwise new orphans keep appearing after each merge.

Please close this and open a dedicated PR that fixes the updateUserDataInternal stale-save + includes the backfill.

…d KYC steps

Address David May's review: drop the redundant reassignKycSteps and fix the real
root cause. updateUserData loads kycSteps+users and `save(userData)` makes
TypeORM's OneToManySubjectBuilder reconcile those collections — so a master
loaded with a pre-merge snapshot nulls out the slave steps a merge just
re-parented (2083 orphaned kyc_step rows in prod). Persist scalars/FKs only by
excluding the loaded OneToMany collections from the saved object; the returned
entity keeps them for the HTTP response.

- remove KycAdminService.reassignKycSteps (the concat + save(master) in
  mergeUserData already persists the re-parent) + its call + its spec
- keep the backfill migration (pairs with the fix so no new orphans appear)
- add regression test: updateUserData's save() excludes kycSteps/users
@joshuakrueger-dfx joshuakrueger-dfx changed the title fix(account-merge): re-parent slave KycStep.userData FK to master fix(account-merge): stop stale-collection save from orphaning re-parented KYC steps Jun 3, 2026
@joshuakrueger-dfx
Copy link
Copy Markdown
Collaborator Author

@davidleomay done — repurposed this PR to your guidance (commit 9d818c2):

  • Dropped the redundant reassignKycSteps.
  • Fixed the real stale-collection save in updateUserData: it now excludes the loaded kycSteps/users @OneToMany collections from save(), so an admin edit on a master with a pre-merge snapshot can no longer null out the re-parented slave steps.
  • Kept the backfill migration (paired with the fix).
  • Added a regression test asserting the save() payload excludes those collections.

Full suite green (twice). PR title/description updated to the corrected root-cause framing.

Backfill of 1510 orphaned kyc_step rows was run manually against prod.
573 remain from pre-account_merge era (Nov 2023) with no resolvable
master — harmless on dead Merged accounts.
Copy link
Copy Markdown
Member

@davidleomay davidleomay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backfill of 1510 orphaned kyc_step rows was executed manually. 573 remain from pre-account_merge era (Nov 2023, no resolvable master, harmless). Migration removed from PR. Code fix looks good.

@davidleomay davidleomay merged commit 1f573f1 into develop Jun 3, 2026
7 checks passed
@davidleomay davidleomay deleted the fix/account-merge-kyc-step-reparent branch June 3, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants