Add LEFT SEMI JOIN pushdown for EXISTS subqueries#58
Conversation
f758f49 to
3aa87a3
Compare
|
Hey @theory , I (with generous help from pg_fdw code) got TPC-H Q4 style EXISTS subqueries pushing down to ClickHouse. The planner converts this to a JOIN_SEMI. ClickHouse doesn't support correlated EXISTS (only uncorrelated), but it does support LEFT SEMI JOIN directly - so we deparse to that instead. Key things:
When you get a chance, can you take a look at the commit: #58 Caveats:
|
d91e78f to
0bef1b6
Compare
0bef1b6 to
0a771cd
Compare
0a771cd to
93edd46
Compare
|
Added my own test and then updated the test results for all supported versions. I moved your to-dos to #62. I think this is ready to go! |
| } | ||
|
|
||
| /* | ||
| * ClickHouse requires SEMI JOINs to have an ON clause with join conditions. |
There was a problem hiding this comment.
what's wrong with injecting ON TRUE?
There was a problem hiding this comment.
I think in Postgres you can do something like
SELECT * FROM x
WHERE EXISTS(SELECT true from y);There is no join from x to y; it's an uncorrelated subquery. ClickHouse doesn't support them; you have a SEMI JOIN, you need an ON expression. So the above is incompatible, but this is compatible:
SELECT * FROM x
WHERE EXISTS(SELECT true from y WHERE y.a = x.a);There was a problem hiding this comment.
Though I could see eventually allowing it to use ClickHouse EXISTS in that case. What do you think, @iskakaushik?
There was a problem hiding this comment.
select * from test1 left semi join test2 on true;
SELECT *
FROM test1
SEMI LEFT JOIN test2 ON true
Query id: 1bb46678-a9e1-44eb-992a-2058290e9aec
┌─id─┬─val──┬─test2.id─┬─test2.val─┐
1. │ 1 │ arst │ 3 │ qwfp │
2. │ 2 │ asdf │ 3 │ qwfp │
└────┴──────┴──────────┴───────────┘
The syntactic hurdle can be worked around, & in your example I think this would be the correct result (note in my example test2 has a 2nd row (4, qwer) which is omitted for some reason, I'm still not solid on semi joins since I didn't expect to see test2.id & test2.val columns in result)
There was a problem hiding this comment.
Seems like your example is effectively a FULL JOIN; weird that (4, qwer) is missing tho.
There was a problem hiding this comment.
However, WHERE EXISTS translates just fine:
SELECT *
FROM test1
WHERE exists((
SELECT 1
FROM test2
))
┌─id─┬─val──┐
1. │ 1 │ arst │
2. │ 2 │ asdf │
└────┴──────┘
Push down `EXISTS`` subqueries to ClickHouse using its native `LEFT SEMI
JOIN` syntax. This change enables all but one TPC-H query to execute
efficiently and 12 to fully push down.
Key changes:
- Add `JOIN_SEMI` case in `chfdw_get_jointype_name()`, returning "LEFT
SEMI"
- Handle SEMI joins without the `ALL` modifier in
`deparseFromExprForRel()`
- Add `semijoin_target_ok()` to validate that a target doesn't reference
columns from the outer relation
- Add JOIN_SEMI support to `foreign_join_ok()` with proper condition
routing
- Adjust cost estimates to make join pushdown more attractive to planner
Add `test/sql/where_sub.sql` a test for the basic pushdown behavior
of `WHERE EXISTS()` and `test/sql/subquery_pushdown.sql` to test
pushdown for TPC-H query 4.
Example query against pg_clickhouse tables:
SELECT * FROM orders WHERE EXISTS (SELECT 1 FROM lineitem ...)
Resolves to ClickHouse query:
SELECT ... FROM orders r1 LEFT SEMI JOIN lineitem r2 ON (...)
Note that this change improves the plans for existing queries in the
`binary_queries.sql` and `http.sql`, pushing sorts down to ClickHouse
that previously ran in PostgreSQL.
Other changes:
- Add `Makefile` targets to build `compile_commands.json` for IDE/clangd
support
- Add the `tempcheck` target to the `Makefile` to run tests against a
temporary PostgreSQL cluster
- Add additional files and directories to `.gitignore`
Signed-off-by: David E. Wheeler <david.wheeler@clickhouse.com>
93edd46 to
b345682
Compare
Push down
EXISTS`` subqueries to ClickHouse using its nativeLEFT SEMI JOIN` syntax. This change enables all but one TPC-H query to execute efficiently and 12 to fully push down.Key changes:
JOIN_SEMIcase inchfdw_get_jointype_name(), returning "LEFT SEMI"ALLmodifier indeparseFromExprForRel()semijoin_target_ok()to validate that a target doesn't reference columns from the outer relationforeign_join_ok()with proper condition routingAdd
test/sql/where_sub.sqla test for the basic pushdown behavior ofWHERE EXISTS()andtest/sql/subquery_pushdown.sqlto test pushdown for TPC-H query 4.Example query against pg_clickhouse tables:
Resolves to ClickHouse query:
Note that this change improves the plans for existing queries in the
binary_queries.sqlandhttp.sql, pushing sorts down to ClickHouse that previously ran in PostgreSQL.Other changes:
Makefiletargets to buildcompile_commands.jsonfor IDE/clangd supporttempchecktarget to theMakefileto run tests against a temporary PostgreSQL cluster.gitignore