Skip to content

fix(postgres): strip trailing semicolon before subquery-wrapping#2407

Merged
goldmedal merged 3 commits into
Canner:mainfrom
Bartok9:fix/postgres-strip-trailing-semicolon
Jun 30, 2026
Merged

fix(postgres): strip trailing semicolon before subquery-wrapping#2407
goldmedal merged 3 commits into
Canner:mainfrom
Bartok9:fix/postgres-strip-trailing-semicolon

Conversation

@Bartok9

@Bartok9 Bartok9 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • PostgresConnector.query() and dry_run() wrap user SQL as SELECT * FROM ({sql}) AS _sub LIMIT N. A trailing ; in the user SQL made the wrapped subquery invalid: SELECT * FROM (SELECT 1;) AS _sub LIMIT N → Postgres syntax error at or near ";".
  • So any query ending in a semicolon failed whenever a limit or dry-run wrap was applied — even though the same query runs fine unwrapped.

Motivation

The canner, trino, and clickhouse connectors already strip the terminating semicolon run before subquery-wrapping (each has a _strip_trailing_semicolon helper). The postgres connector — the reference psycopg implementation — was the inconsistent one and wrapped the raw SQL:

sql = f"SELECT * FROM ({sql}) AS _sub LIMIT {limit}"   # (SELECT 1;) -> invalid

Fix

Add the same _strip_trailing_semicolon helper (regex [;\s]+\Z, so it strips only the trailing run and preserves semicolons inside string literals like SELECT 'a;b') and call it at both wrap sites (query and dry_run).

Real behavior proof

Real environment: Python 3.12, core/wren via uv run --no-sync (with the postgres extra installed for the import), repo main.

Commands:

uv run --no-sync pytest tests/unit/test_postgres_strip_semicolon.py -q
uv run --no-sync pytest tests/unit -q

AFTER fix:

4 passed                       # test_postgres_strip_semicolon.py
780 passed, 42 skipped          # full unit suite

Pre-fix wrap (demonstrates the bug):

pre-fix sent: SELECT * FROM (SELECT 1;) AS _sub LIMIT 5   # invalid subquery
post-fix:     SELECT * FROM (SELECT 1) AS _sub LIMIT 5

Tests

tests/unit/test_postgres_strip_semicolon.py uses a mocked connection (no live DB) and asserts the SQL the connector actually executes for query and dry_run is the stripped, valid form. The two behavior tests fail without the call-site fix.

Scope / risk

Apache-2.0 path (core/wren). Touches only the two wrap sites + a new pure helper. SQL without a trailing semicolon is byte-identical to before; semicolons inside string literals are preserved. (The same gap exists in redshift/duckdb/datafusion; happy to follow up with a consistent sweep if you'd prefer one PR for all.)

Summary by CodeRabbit

  • Bug Fixes

    • Improved PostgreSQL query handling so trailing semicolons and extra whitespace are removed before SQL is wrapped and limited.
    • Applied the same behavior to both normal queries and dry runs, ensuring more reliable results.
    • Preserved semicolons that appear inside quoted text, avoiding unintended query changes.
  • Tests

    • Added coverage for trailing-semicolon handling in both query execution and dry-run paths.
    • Added checks to confirm quoted semicolons remain unchanged.

query()/dry_run() wrap user SQL as 'SELECT * FROM ({sql}) AS _sub LIMIT N'.
A trailing ';' made the subquery invalid (Postgres: syntax error at or
near ";"), so any query ending in a semicolon failed when a limit/dry-run
wrap was applied. The canner, trino, and clickhouse connectors already
strip the terminating semicolon run before wrapping; postgres did not.
Add the same _strip_trailing_semicolon helper (preserves semicolons inside
string literals) and use it at both wrap sites.
@github-actions github-actions Bot added python Pull requests that update Python code core labels Jun 28, 2026
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ba3e6927-4051-44c9-985e-9d23572350cb

📥 Commits

Reviewing files that changed from the base of the PR and between f419c90 and 5cf7eb0.

📒 Files selected for processing (1)
  • core/wren/tests/connectors/test_postgres.py

Walkthrough

The PostgreSQL connector now strips trailing semicolons and whitespace from SQL before wrapping it in a subquery with LIMIT. A _strip_trailing_semicolon helper and compiled regex were added, applied in both query() and dry_run(). Unit tests cover all new code paths without a live database.

Changes

Trailing semicolon stripping in Postgres connector

Layer / File(s) Summary
Helper, regex, and call sites
core/wren/src/wren/connector/postgres.py
Adds re import, _TRAILING_SEMICOLONS_RE compiled regex, and _strip_trailing_semicolon(sql) helper. Applies the helper in query() and dry_run() when constructing the SELECT * FROM (...) AS _sub LIMIT ... wrapper.
Unit tests
core/wren/tests/connectors/test_postgres.py
Adds a non-container test section with a mocked connector factory and tests for query(), dry_run(), and direct helper behavior including embedded and absent semicolons.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A semicolon crept in at the end,
But the rabbit said "No, I won't bend!"
🐇 Strip it clean, wrap it neat,
The subquery's complete —
No syntax errors around the bend! ✂️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: Postgres subquery wrapping now strips trailing semicolons first.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@goldmedal

Copy link
Copy Markdown
Collaborator

@Bartok9, please check the CI fails. Thanks.

The 'unit tests' CI job does not install the psycopg/postgres extra (only the
dedicated 'postgres tests' job does). test_postgres_strip_semicolon.py imports
wren.connector.postgres, which imports psycopg at module load, so collection
raised ModuleNotFoundError: No module named 'psycopg' and failed the entire
unit suite.

Add pytest.importorskip('psycopg') so the test skips cleanly in the unit job
and still runs in the postgres job (which has the driver). No product code
change.
@Bartok9

Bartok9 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @goldmedal — CI fix pushed. ✅

Root cause: the unit tests job failed at collection with ModuleNotFoundError: No module named 'psycopg'. test_postgres_strip_semicolon.py imports wren.connector.postgres, which imports psycopg at module load — but the lightweight unit tests job doesn't install the postgres extra (only the dedicated postgres tests job does), so the whole unit suite errored out.

Fix: added pytest.importorskip("psycopg") so the test skips cleanly in the unit tests job and still runs in the postgres tests job (which has the driver). No product-code change.

Verified locally: without psycopg the file now reports 1 skipped instead of a collection error. CI re-running on the new commit.

@goldmedal

Copy link
Copy Markdown
Collaborator

Thanks for the fix — the change itself is correct and nicely consistent with the canner/clickhouse helpers. One thing on the test coverage though:

The new test doesn't actually run in any CI job, so the regression it's meant to guard isn't protected.

Tracing the two jobs in .github/workflows/wren-ci.yml:

  • unit tests job (pytest tests/unit/ -v ...) installs only the default + dev deps — no postgres extra, so psycopg is absent. The pytest.importorskip("psycopg") therefore makes the whole file skip here. (That's the same reason the first CI run failed — psycopg genuinely isn't installed in this job.)
  • postgres tests job runs pytest tests/connectors/test_postgres.py -v -m postgres — it only collects that single file, with -m postgres. It does not collect tests/unit/, and the new file is marked pytest.mark.unit anyway.

So importorskip keeps CI green by making the test disappear, not by running it in the postgres job.

Suggestion: move these four tests into tests/connectors/test_postgres.py (it's already pytestmark = pytest.mark.postgres, runs in the postgres job, and has psycopg). They're all mocked/pure and don't need the testcontainer, so they'll execute cleanly there alongside the existing test_query_limit_parameter / test_dry_run. Then the separate tests/unit/ file and the importorskip can be dropped.

@Bartok9

Bartok9 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Good catch, and you're exactly right — importorskip was keeping CI green by making the test vanish, not by running it. Fixed properly:

Moved the four mocked tests into tests/connectors/test_postgres.py (already pytestmark = pytest.mark.postgres, runs in the postgres tests job which installs psycopg) and deleted tests/unit/test_postgres_strip_semicolon.py entirely. They're fully mocked/pure, so no container needed — they collect and run alongside the live postgres tests.

Verified locally in the postgres job's scope:

uv run --no-sync pytest tests/connectors/test_postgres.py -q -k semicolon
4 passed, 16 deselected

Now the regression is actually guarded. Thanks for the careful trace through the CI jobs. 🙏

@goldmedal goldmedal left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Bartok9 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants