Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL case, This feature is not implemented: Physical plan does not support logical expression EXISTS (<subquery>) #5789

Open
Tracked by #5483
jiangzhx opened this issue Mar 30, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@jiangzhx
Copy link
Contributor

jiangzhx commented Mar 30, 2023

Describe the bug

DataFusion v21.0.0

CREATE EXTERNAL TABLE t1 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
CREATE EXTERNAL TABLE t2 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
SELECT a, b FROM t1 WHERE EXISTS (SELECT count(*) FROM t2);

return:
This feature is not implemented: Physical plan does not support logical expression EXISTS ()

worked branch
19.0.0

To Reproduce

create data.csv

echo "1,2" > data.csv

use datafusion-cli


CREATE EXTERNAL TABLE t1 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
CREATE EXTERNAL TABLE t2 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
SELECT a, b FROM t1 WHERE EXISTS (SELECT count(*) FROM t2);

Expected behavior

No response

Additional context

No response

@jiangzhx jiangzhx added the bug Something isn't working label Mar 30, 2023
@jiangzhx jiangzhx closed this as not planned Won't fix, can't repro, duplicate, stale Mar 30, 2023
@jiangzhx jiangzhx reopened this Mar 30, 2023
@jiangzhx
Copy link
Contributor Author

jiangzhx commented Mar 30, 2023

i'm not sure,does this pr #5419 cause this issue.

@alamb
Copy link
Contributor

alamb commented Mar 30, 2023

I agree that #5419 likely caused this issue.

I think the fix is to simply ignore such errors when creating pruning predicates (aka when the predicate is not supported).

What do you think @crepererum ?

@jiangzhx jiangzhx changed the title This feature is not implemented: Physical plan does not support logical expression EXISTS (<subquery>) SQL case, This feature is not implemented: Physical plan does not support logical expression EXISTS (<subquery>) Mar 31, 2023
@crepererum
Copy link
Contributor

To me this looks like a bug. Who's trying to push down / apply a sub-query predicate to a parquet file read? Shouldn't the logical optimizer remove these kind of expressions?

@jiangzhx
Copy link
Contributor Author

jiangzhx commented Apr 6, 2023

@crepererum I think you're right, just your PR let this issue expose.

these Exists, InSubquery, ScalarSubquery Expr did not complete the corresponding processing
in the create_physical_expr method of planner.rs.
https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/physical-expr/src/planner.rs#L501-L503

Before your PR takes effect, it also does not actually handle the subquery scene at datasource
https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/core/src/datasource/file_format/mod.rs#L83-L89

so, i think the way is to optimize non-correlated subquery at decorrelate_where_exists optimizer
https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/optimizer/src/decorrelate_where_exists.rs#L185-L191

the current decorrelate_where_exists only optimize

SELECT t1.id FROM t1
WHERE exists
(
   SELECT t2.id FROM t2 WHERE t1.id = t2.id
)
/// and optimizes it into:
SELECT t1.id
FROM t1 LEFT SEMI
JOIN t2
ON t1.id = t2.id

we need process ScalarSubquery in the create_physical_expr and then rewrite non-correlated exists subquery to ScalarSubquery

WHERE EXISTS (SELECT A FROM TABLE B WHERE COL1 > 10)
///will be rewritten to
WHERE (SELECT 1 FROM (SELECT A FROM TABLE B WHERE COL1 > 10) LIMIT 1) IS NOT NULL

@alamb
Copy link
Contributor

alamb commented Feb 2, 2024

For the record this still happens:

(venv-310) andrewlamb@Andrews-MacBook-Pro:~/Downloads$ datafusion-cli
DataFusion CLI v35.0.0
❯ CREATE EXTERNAL TABLE t1 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';

0 rows in set. Query took 0.030 seconds.

❯ CREATE EXTERNAL TABLE t2 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
0 rows in set. Query took 0.001 seconds.

❯ SELECT a, b FROM t1 WHERE EXISTS (SELECT count(*) FROM t2);

This feature is not implemented: Physical plan does not support logical expression Exists(Exists { subquery: <subquery>, negated: false })
❯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants