feat(api): bulk upload id contract — sequence advance and sanity bound (#2362) by jh-RLI · Pull Request #2367 · OpenEnergyPlatform/oeplatform

jh-RLI · 2026-07-03T20:47:47Z

Summary of the discussion

This PR stacks ontop of #2364 and #2366 wait until both are merged and rebase this PR.

Part of #2362 (Slice 4 — id contract). Clients may include or omit the id
column in a bulk upload:

Omitted: the table's id sequence assigns ids as usual.
Included: after COPY, still inside the upload's transaction, the id
sequence is advanced to the table's max(id) (setval(GREATEST(…))), so
a subsequent row-API insert can't hit a duplicate key. A
pg_advisory_xact_lock on the sequence serializes concurrent uploads —
setval is non-transactional, and two racing reads could otherwise move
the sequence backwards. The sequence never moves backwards.
Sanity bound: uploads that raise the table's max(id) above 2^48
are rejected and rolled back, so one upload can't exhaust the sequence
for every writer of a shared table. The bound judges only ids introduced
by the upload itself — a pre-existing high id (the row API enforces no
bound) does not poison the table for future bulk uploads.

The review pass caught two real defects before commit: the bound originally
judged the whole table's max(id) (a table with one legitimately huge
pre-existing id would reject every future id-bearing upload), and the
unserialized setval race. Both fixed and tested.

Tests: module grows to 25 HTTP-seam tests (no-id sequence assignment,
explicit-ids-then-row-insert collision check, bound rejection with
rollback, monotonic sequence, pre-existing-high-id regression). Full suite
green. Changelog entry included.

Automation

Closes #

PR-Assignee

🐙 Follow the workflow in
CONTRIBUTING.md
📝 Update the
CHANGELOG.md
📙 Update the documentation on
mkdocs

Reviewer

🐙 Follow the
Reviewer Guidelines
🐙 Provided feedback and show sufficient appreciation for the work done

Adds POST /api/v0/tables/<table>/bulk-upload - the tracer bullet of the bulk upload path (slice 2 of #2362): - The request body IS the CSV (text/csv); the server streams it into PostgreSQL COPY FROM STDIN without buffering the file in memory. - Append-only, one transaction per request: a malformed row anywhere rolls back the entire upload. - The delimiter is a required, whitelisted parameter (comma, semicolon, tab) - never inferred from metadata or content. - The CSV header (required) maps columns by name; header names are whitelisted against the table's actual columns and quoted, so no unvalidated identifier ever reaches the SQL. - Same authorization chain as the row API: token auth, write permission, embargo check, table-registry resolution (internal tables are unreachable by construction). - Deliberately bypasses the edit-journal meta tables: bulk-loaded rows have no per-row change history. This trades the (currently unread) per-row provenance for order-of-magnitude ingestion speed; an audit event record follows in a later slice. - COPY is FROM STDIN only; no code path for COPY FROM file/PROGRAM. New HTTP-seam test module api/tests/test_bulk_upload.py (14 tests) covers the happy path per delimiter, auth/permission/embargo denials, all-or-nothing rollback, journal bypass, and target-table containment. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Slice 3 of #2362, on top of the bulk upload tracer bullet: - Header preflight before the body streams: reject duplicate column names, names not in the table, and missing required columns (NOT NULL without default), each with a 400 naming the offenders. - Strip a UTF-8 BOM from the header (Excel exports). - FORCE_NULL on all uploaded columns: an empty field is NULL whether quoted or not - a deliberate deviation from COPY's native CSV rule, because many writers quote every field and would silently store empty strings instead of NULLs. - Sanitized failure responses: the database's data-level message plus the CSV line number and column (header-adjusted), never raw SQL, server context dumps, or internal paths - including the no-diagnostics fallback (lost connection), which stays generic. Test module grows to 20 HTTP-seam tests covering each contract rule. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

#2362) Slice 4 of #2362. Clients may include or omit the id column: - Omitted: the table's id sequence assigns ids as usual. - Included: after COPY, still inside the upload's transaction, the id sequence is advanced to the table's max(id) via setval(GREATEST(...)), so a subsequent row-API insert cannot hit a duplicate key. GREATEST plus a pg_advisory_xact_lock on the sequence keep the sequence from ever moving backwards, including under concurrent uploads (setval is non-transactional, so racing reads could otherwise regress it). - Uploads that RAISE the table's max(id) above a generous sanity bound (2^48) are rejected and rolled back, so a single upload cannot exhaust the id sequence for every writer of a shared table. The bound only judges ids introduced by the upload itself: a pre-existing high id (the row API enforces no bound) does not poison the table for future bulk uploads. Test module grows to 25 HTTP-seam tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jh-RLI and others added 3 commits July 3, 2026 18:51

jh-RLI self-assigned this Jul 3, 2026

jh-RLI added the blocked label Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(api): bulk upload id contract — sequence advance and sanity bound (#2362)#2367

feat(api): bulk upload id contract — sequence advance and sanity bound (#2362)#2367
jh-RLI wants to merge 3 commits into
developfrom
feature-2362-bulk-upload-id-contract

jh-RLI commented Jul 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jh-RLI commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of the discussion

Automation

PR-Assignee

Reviewer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jh-RLI commented Jul 3, 2026 •

edited

Loading