Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed scan pushdown in self-join #9165

Open
1 task done
jonashaag opened this issue Sep 29, 2023 · 4 comments
Open
1 task done

Missed scan pushdown in self-join #9165

jonashaag opened this issue Sep 29, 2023 · 4 comments

Comments

@jonashaag
Copy link
Contributor

jonashaag commented Sep 29, 2023

What happens?

When doing a self join like

select *
from t as t1
left join t as t2  -- or inner join
  on t1.f = t2.f
where right(t1.f, 1) = '2'

DuckDB pushes down the right(...) filter to one of the ts but not both, missing the opportunity to multiple order of magnitude speedup. In my real world example, adding an extra

  and right(t2.f, 1) = '2'

speeds up the query ~ 100x.

To Reproduce

create table t (f string);

insert into t (f) values ('42'), ('43');

explain
select *
from t as t1
left join t as t2
  on t1.f = t2.f
where right(t1.f, 1) = '2';
┌───────────────────────────┐
│         HASH_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            LEFT           │
│           f = f           ├──────────────┐
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
│           EC: 0           │              │
│          Cost: 0          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│           FILTER          ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│    (right(f, 1) = '2')    ││             t             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 0           ││             f             │
│                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││           EC: 0           │
└─────────────┬─────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             t             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             f             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 0           │
└───────────────────────────┘

OS:

macOS

DuckDB Version:

0.9

DuckDB Client:

cli

Full Name:

Jonas Haag

Affiliation:

QuantCo

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
Copy link

github-actions bot commented Jan 4, 2024

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Jan 4, 2024
@jonashaag
Copy link
Contributor Author

Don't think this is fixed

@szarnyasg szarnyasg removed the stale label Jan 4, 2024
Copy link

github-actions bot commented Apr 4, 2024

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Apr 4, 2024
@jonashaag
Copy link
Contributor Author

Not fixed in 0.10

@szarnyasg szarnyasg removed the stale label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants