Performance: propagate WHERE conditions across joins #45242

twotwotwo · 2023-01-13T01:29:05Z

If I join two tables and put a condition on the join key in one table, applying that condition to the corresponding column in the other table could sometimes make the query run much faster than it does now. Given these tables:

create table a (i int) engine=MergeTree primary key i as select * from numbers(100000000);
create table b (i int) engine=MergeTree primary key i as select * from numbers(100000000);

This query hashes all of b and takes several seconds:

select count(*) from a join b using(i) where a.i < 1000;

However, if I explicitly make a copy of the condition, most of b doesn't isn't read/hashed so I get results instantly:

select count(*) from a join b using(i) where a.i < 1000 and b.i < 1000;

(Concretely, I get eight seconds and 100m rows processed for the first, vs .003 seconds and 16 thousand rows processed for the second.)

I recognize the general case of this is more complicated than my example queries. Even a partial implementation could speed up many queries.

If ClickHouse someday takes advantage of two tables having a (prefix of the) sort key in common for joins, the first example might no longer be slow; in that case, maybe changing the join_algorithm could make it slow again.

The text was updated successfully, but these errors were encountered:

den-crane · 2023-01-15T17:36:08Z

JOINs reordering and extended pushdown. Roadmap 2023 #44767

twotwotwo · 2023-01-15T18:52:13Z

@den-crane It's great that this and #45286 are in discussion for the 2023 roadmap! Thinking about CH issue tracker etiquette, should I leave this issue open (to indicate user interest, say) or close it because the feature is already planned?

alexey-milovidov · 2023-06-19T15:38:01Z

Duplicate of #10913.

kitaisreal · 2024-04-24T18:51:17Z

Closed by #61216.

twotwotwo added the performance label Jan 13, 2023

den-crane added the comp-joins JOINs label Jan 15, 2023

alexey-milovidov added the duplicate label Jun 19, 2023

kitaisreal self-assigned this Apr 24, 2024

kitaisreal closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: propagate WHERE conditions across joins #45242

Performance: propagate WHERE conditions across joins #45242

twotwotwo commented Jan 13, 2023

den-crane commented Jan 15, 2023 •

edited

twotwotwo commented Jan 15, 2023

alexey-milovidov commented Jun 19, 2023

kitaisreal commented Apr 24, 2024

Performance: propagate WHERE conditions across joins #45242

Performance: propagate WHERE conditions across joins #45242

Comments

twotwotwo commented Jan 13, 2023

den-crane commented Jan 15, 2023 • edited

twotwotwo commented Jan 15, 2023

alexey-milovidov commented Jun 19, 2023

kitaisreal commented Apr 24, 2024

den-crane commented Jan 15, 2023 •

edited