Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eligible filters are not pushed down if any are ineligible, resulting in none being pushed down #16339

Open
proddata opened this issue Jul 18, 2024 · 3 comments
Labels

Comments

@proddata
Copy link
Member

proddata commented Jul 18, 2024

CrateDB version

5.7.3

CrateDB setup information

Local 5.7.3
Remote Postgres 16

Problem description

While not strictly defined as supported:

WHERE clauses can in some circumstances be pushed to the foreign system, but that depends on the concrete foreign data wrapper implementation. You can check if this is the case by using the EXPLAIN statement.

I believe that filters which can be pushed down to the foreign system should be. Currently, if any filter cannot be pushed down, none of the filters are. This means that a simple sub-select or additional filter can turn a very fast query into a very slow one.

Steps to Reproduce

CREATE FOREIGN TABLE doc.foreign_t01 (
    c1 TEXT,
    c2 OBJECT(IGNORED)
) SERVER neon OPTIONS (schema_name 'public', table_name 't01');

EXPLAIN SELECT * FROM doc.foreign_t01 WHERE c1 = 'Hello';
EXPLAIN SELECT * FROM doc.foreign_t01 WHERE c1 = 'Hello' and c2['module']  = 'Test%';

Actual Result

✅ filters are pushed down

EXPLAIN SELECT * FROM doc.foreign_t01 WHERE c1 = 'Hello';
+----------------------------------------------------------------------------+
| QUERY PLAN                                                                 |
+----------------------------------------------------------------------------+
| ForeignCollect[doc.foreign_t01 | [c1, c2] | (c1 = 'Hello')] (rows=unknown) |
+----------------------------------------------------------------------------+

❌ filters are not pushed down

EXPLAIN SELECT * FROM doc.foreign_t01 WHERE c1 = 'Hello' or c1 LIKE 'Test%';
+----------------------------------------------------------------------+
| QUERY PLAN                                                           |
+----------------------------------------------------------------------+
| Filter[((c1 = 'Hello') AND ('Test%' = c2['module']))] (rows=0)       |
|   └ ForeignCollect[doc.foreign_t01 | [c1, c2] | true] (rows=unknown) |
+----------------------------------------------------------------------+

Expected Result

🆗 some filters are pushed down

EXPLAIN SELECT * FROM doc.foreign_t01 WHERE c1 = 'Hello' or c1 LIKE 'Test%';
+----------------------------------------------------------------------+
| QUERY PLAN                                                           |
+----------------------------------------------------------------------+
| Filter[(('Test%' = c2['module']))] (rows=0)                          |
|   └ ForeignCollect[doc.foreign_t01 | [c1, c2] |  (c1 = 'Hello')]] (rows=unknown) |
+----------------------------------------------------------------------+
@proddata proddata added the triage An issue that needs to be triaged by a maintainer label Jul 18, 2024
@matriv matriv added feature: performance feature: fdw Foreign data wrapper and removed triage An issue that needs to be triaged by a maintainer labels Jul 18, 2024
@matriv
Copy link
Contributor

matriv commented Jul 18, 2024

In the first version of filter push down: #15653 we only enabled this for safe operators, which is the standard comparison and logical operators, and if anything else is included we don't push down the filter at all.

I see 2 improvements here:

  • Maybe enable more functions for which we have 100% compatibility with PostgreSQL
  • When expressions are combined with AND split the query and do partial push-down.

@mfussenegger
Copy link
Member

🆗 some filters are pushed down

EXPLAIN SELECT * FROM doc.foreign_t01 WHERE c1 = 'Hello' or c1 LIKE 'Test%';
-- Filter[(c1 LIKE 'Test%')] (rows=unknown)
-- └ ForeignCollect[doc.foreign_t01 | [c1, c2] | (c1 = 'Hello')] (rows=unknown)

Splitting on OR and pushing down one side won't work.
The given example would exclude cases where c1 = 'Hello' is false but c1 LIKE 'Test%' is true.

+--------------+-----------------+-------+
| c1 = 'Hello' | c1 LIKE 'Test%' |  OR   |
+--------------+-----------------+-------+
| true         | true            | true  |
| true         | false           | true  |
| true         | null            | true  |

^ only case above would be visible for Filter c1 LIKE 'Test%'

                                         |
| false        | true            | true  | < would be missing
| false        | false           | false |
| false        | null            | null  |
                                         |
| null         | true            | true  | < would be missing
| null         | false           | null  |
| null         | null            | null  |
+--------------+-----------------+-------+

Maybe enable more functions for which we have 100% compatibility with PostgreSQL

It is not just about PostgreSQL. JDBC fdw should work for any foreign database that has a JDBC driver.
That said - we could consider using DatabaseMetaData.getFunctions() in some way to figure out if functions are eligible. Or specialize the PostgreSQL case more.

@proddata
Copy link
Member Author

proddata commented Jul 24, 2024

Splitting on OR and pushing down one side won't work.
The given example would exclude cases where c1 = 'Hello' is false but c1 LIKE 'Test%' is true.

Makes sense. The example has been chosen badly in this case. I updated it to better reflect the original issue.


We also tried to explicitly split the filters into a cte like ...

cr> EXPLAIN WITH foreign_select AS (SELECT * FROM doc.foreign_t01 WHERE c1 = 'Hello')
    SELECT * FROM foreign_select WHERE c2['module']  = 'Test%';
+------------------------------------------------------------------------+
| QUERY PLAN                                                             |
+------------------------------------------------------------------------+
| Rename[c1, c2] AS foreign_select (rows=0)                              |
|   └ Filter[(('Test%' = c2['module']) AND (c1 = 'Hello'))] (rows=0)     |
|     └ ForeignCollect[doc.foreign_t01 | [c1, c2] | true] (rows=unknown) |
+------------------------------------------------------------------------+

... which (un)fortunately gets optimised.

With SET optimizer_move_filter_beneath_rename = false; this leads to the desired output

cr> EXPLAIN  WITH foreign_select AS (SELECT * FROM doc.foreign_t01 WHERE c1 = 'Hello')
    SELECT * FROM foreign_select WHERE c2['module']  = 'Test%';
+----------------------------------------------------------------------------------+
| QUERY PLAN                                                                       |
+----------------------------------------------------------------------------------+
| Filter[('Test%' = c2['module'])] (rows=0)                                        |
|   └ Rename[c1, c2] AS foreign_select (rows=unknown)                              |
|     └ ForeignCollect[doc.foreign_t01 | [c1, c2] | (c1 = 'Hello')] (rows=unknown) |
+----------------------------------------------------------------------------------+

Deactivating this by default however seems problematic.


Within PostgreSQL this most likely could be prevented with the MATERIALIZED keyword in the CTE, forcing a separate evaluation of the CTE. https://www.postgresql.org/docs/16/queries-with.html#QUERIES-WITH-CTE-MATERIALIZATION

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants