Skip to content

Join two big tables #34134

@zhanglistar

Description

@zhanglistar

Use case
In data analysis, the fact table is big, TB level or even bigger, very easy to got OOM now. But it is inevitable to join two big tables in nowadays.
Impala/Drill can join two big tables, while clickhouse could not. I think that helps clickhouse to expand the scope.

Our team have an initial idea, that is rewrite the join clause to several clauses 1) create two tmp special distributed tables, keys are join keys. 2) insert data into the tmp table from base table, which is shuffling data to all nodes by join keys, 3) do
joining process locally, then construct an output stream to fetch join results for latter process.

Wha's the plan of community? Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions