Use case
In data analysis, the fact table is big, TB level or even bigger, very easy to got OOM now. But it is inevitable to join two big tables in nowadays.
Impala/Drill can join two big tables, while clickhouse could not. I think that helps clickhouse to expand the scope.
Our team have an initial idea, that is rewrite the join clause to several clauses 1) create two tmp special distributed tables, keys are join keys. 2) insert data into the tmp table from base table, which is shuffling data to all nodes by join keys, 3) do
joining process locally, then construct an output stream to fetch join results for latter process.
Wha's the plan of community? Thanks in advance.
Use case
In data analysis, the fact table is big, TB level or even bigger, very easy to got OOM now. But it is inevitable to join two big tables in nowadays.
Impala/Drill can join two big tables, while clickhouse could not. I think that helps clickhouse to expand the scope.
Our team have an initial idea, that is rewrite the join clause to several clauses 1) create two tmp special distributed tables, keys are join keys. 2) insert data into the tmp table from base table, which is shuffling data to all nodes by join keys, 3) do
joining process locally, then construct an output stream to fetch join results for latter process.
Wha's the plan of community? Thanks in advance.