refactor!: explicit backend api cleanup#6
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors orm-loader to make database-specific behavior explicit via a backend abstraction, while updating the PostgreSQL path to use psycopg (v3) and expanding tests/docs accordingly.
Changes:
- Introduces a
DatabaseBackendcontract with concrete SQLite/PostgreSQL implementations plusresolve_backend()dispatch. - Routes staging-table creation, fast-path loading, merge behavior, FK toggling, and materialized view operations through the resolved backend.
- Updates packaging to add a
postgresextra (psycopg[binary]), bumps version to0.4.0, and adds/updates backend-focused test coverage and docs.
Reviewed changes
Copilot reviewed 34 out of 36 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Bumps package version and adds locked dependencies for new postgres extra (psycopg). |
| TODO.txt | Adds future-facing notes about optional malformed text repair. |
| tests/models.py | Adds an index to support index-management behavior in merge tests. |
| tests/conftest.py | Switches PostgreSQL test DSN to SQLAlchemy psycopg driver. |
| tests/backends/test_sqlite_backend.py | Adds unit tests for SQLite backend staging, FK toggling, merge SQL, and pragmas hooks. |
| tests/backends/test_postgres_backend.py | Adds unit tests for Postgres backend staging, FK toggling, merge SQL, and materialized view SQL emission. |
| tests/backends/test_base_backend.py | Adds contract tests for backend base class, capabilities, backend resolution, and import behavior without psycopg installed. |
| src/orm_loader/tables/typing.py | Tightens typing (generics, TypedDict/Unpack) and adds index_strategy/updated merge signatures to the protocol. |
| src/orm_loader/tables/serialisable_table.py | Improves type annotations for JSON/dict serialization helpers. |
| src/orm_loader/tables/orm_table.py | Tightens ORMTableBase typing and reduces cast() usage. |
| src/orm_loader/tables/loadable_table.py | Refactors staging/merge/index-management to use backend resolution and introduces index_strategy. |
| src/orm_loader/mappers/materialised_view_mixin.py | Routes MV create/refresh through backend implementations and improves typing. |
| src/orm_loader/loaders/loading_helpers.py | Updates PostgreSQL COPY helper to a psycopg3-style copy loop and introduces COPY_BLOCK_SIZE. |
| src/orm_loader/loaders/data_classes.py | Removes stale commented-out dedupe code and trailing whitespace. |
| src/orm_loader/helpers/sqlite.py | Replaces direct event-hook/pragmas logic with backend-delegated helpers for SQLite. |
| src/orm_loader/helpers/logging.py | Simplifies log-level coercion and adds typing for formatter methods. |
| src/orm_loader/helpers/discovery.py | Updates typing and changes defaulting behavior for the base parameter. |
| src/orm_loader/helpers/bulk.py | Delegates FK toggling and bulk-load context behavior to the resolved backend; delegates replica-role engine context. |
| src/orm_loader/helpers/bootstrap.py | Adds type annotations for bootstrap/schema creation helpers. |
| src/orm_loader/helpers/init.py | Re-exports new/renamed SQLite helpers. |
| src/orm_loader/backends/sqlite.py | Adds SQLite backend implementation (staging, merge SQL, FK toggling, connect hooks, FK error explanation, journal restore). |
| src/orm_loader/backends/resolve.py | Adds backend resolver selecting a concrete backend based on SQLAlchemy dialect. |
| src/orm_loader/backends/postgres.py | Adds Postgres backend implementation (unlogged staging, COPY fast-path, merge SQL, MV operations, replica-role context). |
| src/orm_loader/backends/base.py | Adds backend base contract, capabilities, shared helpers, and a generic bulk-load context manager. |
| src/orm_loader/backends/init.py | Exposes backend public API and resolver. |
| README.md | Updates project description and documents backend support and staged ingestion approach. |
| pyproject.toml | Bumps version, adds postgres optional dependency, and expands metadata/classifiers and tooling config. |
| docs/tables/mat_view.md | Updates MV docs to reflect backend resolution and PostgreSQL-only support. |
| docs/tables/loadable_table.md | Updates staged ingestion docs to reflect backend-specific behavior and index handling. |
| docs/loaders/loaders.md | Clarifies loader responsibilities vs table/backend merge behavior and updates dedupe description. |
| docs/loaders/index.md | Updates loader overview to emphasize staged loading and table/backend ownership of merge semantics. |
| docs/loaders/helpers.md | Updates helper docs, including PostgreSQL COPY behavior and fallback behavior. |
| docs/loaders/context.md | Documents quote_mode in loader context. |
| docs/index.md | Updates project overview and explicitly documents current backend support limits. |
| CHANGELOG.md | Adds 0.4.0 notes about psycopg and backend API cleanup. |
| .gitignore | Adds _temp/ ignore entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
nicoloesch
left a comment
There was a problem hiding this comment.
Solid work overall. The backend abstraction is clean and the three stated goals are met. A few things to address before merge:
Medium
-
SQL injection in
quick_load_pg(loading_helpers.py):encodinganddelimiterare interpolated directly into the COPY SQL string without validation.delimiteris safe in practice (only,or\t), butencodingcomes from chardet and should be checked against an allow-list before interpolation. -
quote_modedefault mismatch:load_csvdefaults to"csv",quick_load_pgdefaults to"auto". Auto-detection is the main new feature but it's unreachable through the normal API path. Suggest changingload_csv's default to"auto"(and updatingLoaderContextto match) so it actually fires by default.
Low / Cleanup
-
session.binddeprecation (loadable_table.pylines 96, 120, 180, 221, 441): Deprecated in SA 2.x; can returnNonesilently with the recommended session construction pattern. Replace withsession.get_bind(). -
Bare
assert isinstance(...)in production backend code (postgres.pylines 101, 107;sqlite.pylines 115, 121): Assertions are stripped bypython -O. Useif not isinstance(...): raise RuntimeError(...). -
Orphaned comment block in
data_classes.py(lines 174–199): The_dedupe_dbmethod stub was removed but its commented-out body wasn't. Delete these lines. -
get_model_by_tablename(discovery.py): Only searches directBasesubclasses. Won't find models behind a mixin layer. Functionality is also not in main but we may need to address this. -
helpers/sqlite.pyclarity: File is a valid compat shim (old callers usingenable_sqlite_foreign_keysas a bare event hook function still work). The docstring should say so explicitly and point callers towardattach_sqlite_bulk_load_pragmasas the preferred path.
nicoloesch
left a comment
There was a problem hiding this comment.
2 asserts are still present, one comment about docstring. Apart from that, approved!
|
All comments addressed - merging |
BREAKING CHANGE: known breaking change - removal of the dialect argument from _merge_replace() and _merge_upsert().
Opinionated backend now means that in the unlikely event that someone was using this on some other dialect, it will no longer work, but normal usage on SQLite and PostgreSQL is intended to remain functionally the same.
CLOSES: #3, CLOSES: #5, CLOSES: #7