Join two big tables

**Use case**
In data analysis, the fact table is big, TB level or even bigger, very easy to got OOM now. But it is inevitable to join two big tables in nowadays.
Impala/Drill can join two big tables, while clickhouse could not. I think that helps clickhouse to expand the scope.

Our team have an initial idea, that is rewrite the join clause to several clauses 1) create two tmp special distributed tables, keys are join keys. 2) insert data into the tmp table from base table, which is shuffling data to all nodes by join keys, 3) do 
 joining process locally, then construct an output stream to fetch join results for latter process.

Wha's the plan of  community? Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Join two big tables #34134

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Join two big tables #34134

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions