Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: propagate WHERE conditions across joins #45242

Closed
twotwotwo opened this issue Jan 13, 2023 · 4 comments
Closed

Performance: propagate WHERE conditions across joins #45242

twotwotwo opened this issue Jan 13, 2023 · 4 comments

Comments

@twotwotwo
Copy link

If I join two tables and put a condition on the join key in one table, applying that condition to the corresponding column in the other table could sometimes make the query run much faster than it does now. Given these tables:

create table a (i int) engine=MergeTree primary key i as select * from numbers(100000000);
create table b (i int) engine=MergeTree primary key i as select * from numbers(100000000);

This query hashes all of b and takes several seconds:

select count(*) from a join b using(i) where a.i < 1000;

However, if I explicitly make a copy of the condition, most of b doesn't isn't read/hashed so I get results instantly:

select count(*) from a join b using(i) where a.i < 1000 and b.i < 1000;

(Concretely, I get eight seconds and 100m rows processed for the first, vs .003 seconds and 16 thousand rows processed for the second.)

I recognize the general case of this is more complicated than my example queries. Even a partial implementation could speed up many queries.

If ClickHouse someday takes advantage of two tables having a (prefix of the) sort key in common for joins, the first example might no longer be slow; in that case, maybe changing the join_algorithm could make it slow again.

@den-crane
Copy link
Contributor

den-crane commented Jan 15, 2023

@twotwotwo
Copy link
Author

@den-crane It's great that this and #45286 are in discussion for the 2023 roadmap! Thinking about CH issue tracker etiquette, should I leave this issue open (to indicate user interest, say) or close it because the feature is already planned?

@alexey-milovidov
Copy link
Member

Duplicate of #10913.

@kitaisreal kitaisreal self-assigned this Apr 24, 2024
@kitaisreal
Copy link
Collaborator

Closed by #61216.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants