Skip to content

bug: filter_push_down still ends up scanning postgres table without hitting any indexes for filters #369

@nojvek

Description

@nojvek

What happens?

filter_projection_pushdown doesn't actually push down the filter to postgres. It still ends up scanning the table taking 10s+ when filter should hit the index and return in milliseconds.

To Reproduce

Create a table with roughly 10M rows in postgres.

ATTACH 'postgres://u:pass@....' AS pg_reader (TYPE POSTGRES, READ_ONLY)
explain analyze select count(*) from pg_reader.main.order_lines where updated_at = '2025-06-09 14:35:00.742342+00';

┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││               Total Time: 9.27s              ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│        count_star()       │
│                           │
│           1 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│     Table: order_lines    │
│                           │
│        Projections:       │
│         updated_at        │
│                           │
│          Filters:         │
│ updated_at='2025-06-09 14 │
│     :35:00.742342+00':    │
│ :TIMESTAMP WITH TIME ZONE │
│                           │
│           2 Rows          │
│          (53.29s)         │
└───────────────────────────┘

Even though query plan shows a filter, the filter doesn't seem to get applied at scan time. Postgres still ends up scanning all rows of table. updated_at column has an index. It ends up taking ~10s.

For our larger tables, with 10M+ rows, duckdb extension places huge load on the database.

Same query via postgres_query runs in milliseconds

explain analyze select count(*) from postgres_query('pg_reader', 'select updated_at from main.order_lines where updated_at = ''2025-06-09 14:35:00.742342+00''');

┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││              Total Time: 0.525s              ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│        count_star()       │
│                           │
│           1 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│         Function:         │
│       POSTGRES_QUERY      │
│                           │
│        Projections:       │
│         updated_at        │
│                           │
│           2 Rows          │
│          (0.16s)          │
└───────────────────────────┘

use_ctid_scan should be disabled if any filters are specified. So the query is equivalent to postgres_query, but much faster.

pg_use_ctid_scan

SET pg_use_ctid_scan = true; -- default

D .timer on
D select count(*) from pg_reader.main.order_lines where updated_at = '2025-06-09 14:35:00.742342+00'::timestamptz;
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│      2       │
└──────────────┘
Run Time (s): real 10.491 user 0.108536 sys 0.083701


SET pg_use_ctid_scan = false; -- disable ctid_scan

D select count(*) from pg_reader.main.order_lines where updated_at = '2025-06-09 14:35:00.742342+00'::timestamptz;
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│      2       │
└──────────────┘
Run Time (s): real 0.246 user 0.001300 sys 0.001340

50x faster when SET pg_use_ctid_scan = false.

OS:

MacOS 14.6.1

PostgreSQL Version:

15

DuckDB Version:

1.3.2

DuckDB Client:

cli

Full Name:

Noj V

Affiliation:

Recurrency

Have you tried this on the latest main branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions