Skip to content

Add LEFT SEMI JOIN pushdown for EXISTS subqueries#58

Merged
theory merged 1 commit into
mainfrom
subquery-pushdown
Dec 2, 2025
Merged

Add LEFT SEMI JOIN pushdown for EXISTS subqueries#58
theory merged 1 commit into
mainfrom
subquery-pushdown

Conversation

@iskakaushik
Copy link
Copy Markdown
Collaborator

@iskakaushik iskakaushik commented Nov 25, 2025

Push down EXISTS`` subqueries to ClickHouse using its native LEFT SEMI JOIN` syntax. This change enables all but one TPC-H query to execute efficiently and 12 to fully push down.

Key changes:

  • Add JOIN_SEMI case in chfdw_get_jointype_name(), returning "LEFT SEMI"
  • Handle SEMI joins without the ALL modifier in deparseFromExprForRel()
  • Add semijoin_target_ok() to validate that a target doesn't reference columns from the outer relation
  • Add JOIN_SEMI support to foreign_join_ok() with proper condition routing
  • Adjust cost estimates to make join pushdown more attractive to planner

Add test/sql/where_sub.sql a test for the basic pushdown behavior of WHERE EXISTS() and test/sql/subquery_pushdown.sql to test pushdown for TPC-H query 4.

Example query against pg_clickhouse tables:

SELECT * FROM orders WHERE EXISTS (SELECT 1 FROM lineitem ...)

Resolves to ClickHouse query:

SELECT ... FROM orders r1 LEFT SEMI JOIN lineitem r2 ON (...)

Note that this change improves the plans for existing queries in the binary_queries.sql and http.sql, pushing sorts down to ClickHouse that previously ran in PostgreSQL.

Other changes:

  • Add Makefile targets to build compile_commands.json for IDE/clangd support
  • Add the tempcheck target to the Makefile to run tests against a temporary PostgreSQL cluster
  • Add additional files and directories to .gitignore

@iskakaushik iskakaushik force-pushed the subquery-pushdown branch 2 times, most recently from f758f49 to 3aa87a3 Compare November 25, 2025 02:44
@iskakaushik
Copy link
Copy Markdown
Collaborator Author

Hey @theory , I (with generous help from pg_fdw code) got TPC-H Q4 style EXISTS subqueries pushing down to ClickHouse. The planner converts this to a JOIN_SEMI. ClickHouse doesn't support correlated EXISTS (only uncorrelated), but it does support LEFT SEMI JOIN directly - so we deparse to that instead.

Key things:

  1. chfdw_get_jointype_name() returns "LEFT SEMI" for JOIN_SEMI (no ALL modifier)
  2. semijoin_target_ok() validates target only references outer relation
  3. extract_join_equals() moves join keys to ON clause (ClickHouse requires this)
  4. Cost model tweaks so planner prefers pushdown over local nested loop

When you get a chance, can you take a look at the commit: #58

Caveats:

  • I only tested on one PG version - would appreciate help getting it tested across supported versions
  • Could use a review of edge cases (multiple EXISTS, complex correlations, etc.)
  • Only simple Var = Var equalities extracted as join keys. Complex expressions like l_orderkey = o_orderkey + 1 won't be recognized.
  • Multiple EXISTS should work via chained LEFT SEMI JOINs, but needs more testing.

Comment thread Makefile Outdated
Comment thread Makefile Outdated
Comment thread test/expected/subquery_pushdown.out
Comment thread test/expected/subquery_pushdown.out
Comment thread src/include/fdw.h Outdated
@theory theory force-pushed the subquery-pushdown branch 2 times, most recently from d91e78f to 0bef1b6 Compare December 1, 2025 22:47
@theory theory marked this pull request as ready for review December 1, 2025 22:48
@theory theory force-pushed the subquery-pushdown branch from 0bef1b6 to 0a771cd Compare December 1, 2025 22:57
@theory theory self-assigned this Dec 1, 2025
@theory theory added the enhancement New feature or request label Dec 1, 2025
@theory theory force-pushed the subquery-pushdown branch from 0a771cd to 93edd46 Compare December 1, 2025 23:08
@theory
Copy link
Copy Markdown
Collaborator

theory commented Dec 1, 2025

Added my own test and then updated the test results for all supported versions. I moved your to-dos to #62. I think this is ready to go!

@theory theory changed the title SEMI JOIN Push down Add LEFT SEMI JOIN pushdown for EXISTS subqueries Dec 1, 2025
Comment thread src/fdw.c.in Outdated
}

/*
* ClickHouse requires SEMI JOINs to have an ON clause with join conditions.
Copy link
Copy Markdown
Member

@serprex serprex Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with injecting ON TRUE?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in Postgres you can do something like

SELECT * FROM x
WHERE EXISTS(SELECT true from y);

There is no join from x to y; it's an uncorrelated subquery. ClickHouse doesn't support them; you have a SEMI JOIN, you need an ON expression. So the above is incompatible, but this is compatible:

SELECT * FROM x
WHERE EXISTS(SELECT true from y WHERE y.a = x.a);

Copy link
Copy Markdown
Collaborator

@theory theory Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I could see eventually allowing it to use ClickHouse EXISTS in that case. What do you think, @iskakaushik?

Copy link
Copy Markdown
Member

@serprex serprex Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select * from test1 left semi join test2 on true;

SELECT *
FROM test1
SEMI LEFT JOIN test2 ON true

Query id: 1bb46678-a9e1-44eb-992a-2058290e9aec

   ┌─id─┬─val──┬─test2.id─┬─test2.val─┐
1. │  1 │ arst │        3 │ qwfp      │
2. │  2 │ asdf │        3 │ qwfp      │
   └────┴──────┴──────────┴───────────┘

The syntactic hurdle can be worked around, & in your example I think this would be the correct result (note in my example test2 has a 2nd row (4, qwer) which is omitted for some reason, I'm still not solid on semi joins since I didn't expect to see test2.id & test2.val columns in result)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like your example is effectively a FULL JOIN; weird that (4, qwer) is missing tho.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, WHERE EXISTS translates just fine:

SELECT *
FROM test1
WHERE exists((
    SELECT 1
    FROM test2
))

   ┌─id─┬─val──┐
1. │  1 │ arst │
2. │  2 │ asdf │
   └────┴──────┘

@theory theory requested a review from serprex December 1, 2025 23:37
Push down `EXISTS`` subqueries to ClickHouse using its native `LEFT SEMI
JOIN` syntax. This change enables all but one TPC-H query to execute
efficiently and 12 to fully push down.

Key changes:
- Add `JOIN_SEMI` case in `chfdw_get_jointype_name()`, returning "LEFT
  SEMI"
- Handle SEMI joins without the `ALL` modifier in
  `deparseFromExprForRel()`
- Add `semijoin_target_ok()` to validate that a target doesn't reference
  columns from the outer relation
- Add JOIN_SEMI support to `foreign_join_ok()` with proper condition
  routing
- Adjust cost estimates to make join pushdown more attractive to planner

Add `test/sql/where_sub.sql` a test for the basic pushdown behavior
of `WHERE EXISTS()` and `test/sql/subquery_pushdown.sql` to test
pushdown for TPC-H query 4.

Example query against pg_clickhouse tables:

    SELECT * FROM orders WHERE EXISTS (SELECT 1 FROM lineitem ...)

Resolves to ClickHouse query:

    SELECT ... FROM orders r1 LEFT SEMI JOIN lineitem r2 ON (...)

Note that this change improves the plans for existing queries in the
`binary_queries.sql` and `http.sql`, pushing sorts down to ClickHouse
that previously ran in PostgreSQL.

Other changes:

- Add `Makefile` targets to build `compile_commands.json` for IDE/clangd
  support
- Add the `tempcheck` target to the `Makefile` to run tests against a
  temporary PostgreSQL cluster
- Add additional files and directories to `.gitignore`

Signed-off-by: David E. Wheeler <david.wheeler@clickhouse.com>
@theory theory force-pushed the subquery-pushdown branch from 93edd46 to b345682 Compare December 2, 2025 03:01
@theory theory merged commit b345682 into main Dec 2, 2025
60 checks passed
@theory theory deleted the subquery-pushdown branch December 2, 2025 03:06
@theory theory mentioned this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants