Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres_scan_pushdown doesn't pushdown predicate #162

Closed
2 tasks done
theelderbeever opened this issue Jan 12, 2024 · 5 comments
Closed
2 tasks done

postgres_scan_pushdown doesn't pushdown predicate #162

theelderbeever opened this issue Jan 12, 2024 · 5 comments

Comments

@theelderbeever
Copy link

What happens?

I have a very large database that I want to be able to read time ranges of data from. In order to do this I was going to use the duckdb pushdown to avoid overfetching data (its actually impossible to fetch the whole table).

I ran the query below to test this. However, as this test query should take just milliseconds to run but was taking significantly longer I went and looked for long running queries against the database. The query running in the database is shown below as well. There was no where clause applied.

duckdb.sql(
    f"select account from postgres_scan_pushdown('{libpq_uri}', 'usage', 'events') where timestamp >= NOW() - '1min'::interval limit 1"
)
COPY (SELECT "timestamp", "account" FROM "usage"."events" ) TO STDOUT (FORMAT binary);

To Reproduce

The description above should somewhat suffice. This maybe hard to reproduce without a sufficiently large enough database to stall the query.

The long running query check against postgres is below though.

SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  query,
  state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '20 sec';

OS:

MacOS Sonoma and Ubuntu 22.04

PostgreSQL Version:

Postgres 15.4/Timescaledb 2.12

DuckDB Version:

duckdb-0.9.2-cp311-cp311-manylinux_2_17_x86_64

DuckDB Client:

Python

Full Name:

Taylor Beever

Affiliation:

QuickNode

Have you tried this on the latest main branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@Mytherin
Copy link
Contributor

Thanks for the report! This is actually a limitation in DuckDB itself - not in the Postgres scanner. NOW() as a function is not pushed down into scans currently. You could use postgres_query to run the query within Postgres directly as a work-around.

@theelderbeever
Copy link
Author

@Mytherin Ah okay! So if I understand correctly I could also just not use NOW() and that would pushdown? The actual usage wasn't going to necessarily use NOW() anyway. It would have fixed timestamps. timestamp >= NOW() - '1min'::interval has just always been an easy way to get recent data.

@Mytherin
Copy link
Contributor

Yes, if you use a constant timestamp the value should be pushed down.

@Mytherin
Copy link
Contributor

I've fixed this in upstream DuckDB in the mean-time as well, so NOW() can now be pushed down (see duckdb/duckdb#10426)

@theelderbeever
Copy link
Author

theelderbeever commented Mar 19, 2024

Nice! Thanks @Mytherin!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants