How to insert local table from local on cluster efficiently? #50917

ZerveN · 2023-06-13T09:22:00Z

our table DDL like this:

Create Table my_table_local ON CLUSTER '{cluster}'
...
Create Table my_table ON CLUSTER '{cluster}'  as my_table_local Engine=Distributed(cluster, database, my_table_local, rand());

in this case, we got table 'my_table_a' and 'my_table_b' with both local and distributed , on cluster

now , we need to run process like :

INSERT INTO my_table_a_local select ..... from my_table_b_local   ON CLUSTER

or write in this way, but expect to run in local on cluster , like the first one

INSERT INTO my_table_a select .... from my_table_b

the process mean to: insert table from local to local , and execute on cluster at the same time, without distributed sharding

We had checked the Docs, find settings parallel_distributed_insert_select , but it cannot do this

If insert into distributed tables , it will collect at one point and do shardings to other points

Is there any other settings can do the case? or maybe other 'writing style' to do this

need help , thanks.

The text was updated successfully, but these errors were encountered:

cangyin · 2023-06-13T12:29:15Z

if that is a frequent task, you can write a simple script like:

hosts=$(do_query localhost "SELECT host_name FROM system.clusters WHERE cluster='your_cluster'")

FOR EACH host IN hosts; DO
   do_query  $host  "INSERT INTO my_table_a SELECT * FROM my_table_b"
DONE

or you can just do it manually.

AFAWK, there is no such one-line command.

ZerveN · 2023-06-14T02:19:45Z

fine...we do get a script like :

for host in clusters ; do clickhouse-client --query="INSERT INTO my_table_a_local select * from my_table_b_local" ; done 

or

for host in clusters ; do nohup clickhouse-client --query="INSERT INTO my_table_a_local select * from my_table_b_local" & ; done

but, this cause another question: synchronously will cost much time, or parallel in background will lose the monitor.
if we need to run a batch for month with each day , it will be horrible......
and do it manually is also inconvenient

cangyin · 2023-06-14T10:31:32Z

is it meant to be a backup process ? then consider the backup feature or the tool clickhouse-backup.

SaltTan · 2023-06-15T00:46:50Z

Have you tried parallel_distributed_insert_select=2?

ZerveN · 2023-06-20T02:36:11Z

this time "parallel_distributed_insert_select=2" is helpful

but last time i tried was useless with something wrong.
maybe I need to do more exercise to make it stable

ZerveN · 2023-06-20T03:16:38Z

not good enough

DB::Exception: Timeout exceeded: elapsed 1254.625784006 seconds, maximum: 1200. (TIMEOUT_EXCEEDED)

when i exec on local, it cost about 400 seconds, but on distribute with settings parallel_distributed_insert_select=2 caused Timtout Exception

ZerveN added the question Question? label Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to insert local table from local on cluster efficiently? #50917

How to insert local table from local on cluster efficiently? #50917

ZerveN commented Jun 13, 2023

cangyin commented Jun 13, 2023 •

edited

ZerveN commented Jun 14, 2023

cangyin commented Jun 14, 2023 •

edited

SaltTan commented Jun 15, 2023

ZerveN commented Jun 20, 2023

ZerveN commented Jun 20, 2023

How to insert local table from local on cluster efficiently? #50917

How to insert local table from local on cluster efficiently? #50917

Comments

ZerveN commented Jun 13, 2023

cangyin commented Jun 13, 2023 • edited

ZerveN commented Jun 14, 2023

cangyin commented Jun 14, 2023 • edited

SaltTan commented Jun 15, 2023

ZerveN commented Jun 20, 2023

ZerveN commented Jun 20, 2023

cangyin commented Jun 13, 2023 •

edited

cangyin commented Jun 14, 2023 •

edited