fix: Add paginated merge and load-vocab-source command#13
Open
nicoloesch wants to merge 2 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #12
orm-loaderso large staging-to-target merges commit in bounded batches rather than one transaction.load-vocab-sourcecommand toomop-alchemywith bulk mode, progress feedback, and crash-resilient retry.orm-loader: paginated merge via
_rownumStaging tables now get a
_rownum BIGINT GENERATED ALWAYS AS IDENTITYcolumn at creation time.merge_insert,merge_replace, andmerge_upsertall accept amerge_batch_sizeparameter (default 1 M rows). For tables larger than one batch, a_rownumindex is built on the staging table and rows are processed in range-keyed batches, each committed independently. This bounds WAL accumulation to one batch per transaction instead of the full table. Small tables (belowmerge_batch_size) fall through to the original single-statement path.The COPY statement was updated to include an explicit column list so the identity column is excluded from input.
omop-alchemy:
load-vocab-sourcecommandNew
cli_vocab.pyimplementing aload-vocab-sourcecommand with:--bulk-mode: disables FK triggers and drops indexes before loading, then rebuilds after. Substantially faster than per-table management for a full vocabulary reload.--merge-strategy:replace,upsert, orinsert_if_empty.--merge-batch-size: passed through to the orm-loader paginated merge.concept_ancestor.insert_if_empty, the partially loaded table is truncated before retrying. Safe because FK triggers are disabled viaALTER TABLE ... DISABLE TRIGGER ALLand that state persists across crash and recovery.