Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to insert local table from local on cluster efficiently? #50917

Open
ZerveN opened this issue Jun 13, 2023 · 6 comments
Open

How to insert local table from local on cluster efficiently? #50917

ZerveN opened this issue Jun 13, 2023 · 6 comments
Labels
question Question?

Comments

@ZerveN
Copy link

ZerveN commented Jun 13, 2023

our table DDL like this:

Create Table my_table_local ON CLUSTER '{cluster}'
...
Create Table my_table ON CLUSTER '{cluster}'  as my_table_local Engine=Distributed(cluster, database, my_table_local, rand());

in this case, we got table 'my_table_a' and 'my_table_b' with both local and distributed , on cluster

now , we need to run process like :

INSERT INTO my_table_a_local select ..... from my_table_b_local   ON CLUSTER

or write in this way, but expect to run in local on cluster , like the first one

INSERT INTO my_table_a select .... from my_table_b 

the process mean to: insert table from local to local , and execute on cluster at the same time, without distributed sharding

We had checked the Docs, find settings parallel_distributed_insert_select , but it cannot do this

If insert into distributed tables , it will collect at one point and do shardings to other points

Is there any other settings can do the case? or maybe other 'writing style' to do this

need help , thanks.
@ZerveN ZerveN added the question Question? label Jun 13, 2023
@cangyin
Copy link
Contributor

cangyin commented Jun 13, 2023

if that is a frequent task, you can write a simple script like:

hosts=$(do_query localhost "SELECT host_name FROM system.clusters WHERE cluster='your_cluster'")

FOR EACH host IN hosts; DO
   do_query  $host  "INSERT INTO my_table_a SELECT * FROM my_table_b"
DONE

or you can just do it manually.

AFAWK, there is no such one-line command.

@ZerveN
Copy link
Author

ZerveN commented Jun 14, 2023

fine...we do get a script like :

for host in clusters ; do clickhouse-client --query="INSERT INTO my_table_a_local select * from my_table_b_local" ; done 

or

for host in clusters ; do nohup clickhouse-client --query="INSERT INTO my_table_a_local select * from my_table_b_local" & ; done 

but, this cause another question: synchronously will cost much time, or parallel in background will lose the monitor.
if we need to run a batch for month with each day , it will be horrible......
and do it manually is also inconvenient

@cangyin
Copy link
Contributor

cangyin commented Jun 14, 2023

is it meant to be a backup process ? then consider the backup feature or the tool clickhouse-backup.

@SaltTan
Copy link
Contributor

SaltTan commented Jun 15, 2023

Have you tried parallel_distributed_insert_select=2?

@ZerveN
Copy link
Author

ZerveN commented Jun 20, 2023

this time "parallel_distributed_insert_select=2" is helpful

but last time i tried was useless with something wrong.
maybe I need to do more exercise to make it stable

@ZerveN
Copy link
Author

ZerveN commented Jun 20, 2023

not good enough

DB::Exception: Timeout exceeded: elapsed 1254.625784006 seconds, maximum: 1200. (TIMEOUT_EXCEEDED)

when i exec on local, it cost about 400 seconds, but on distribute with settings parallel_distributed_insert_select=2 caused Timtout Exception

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question?
Projects
None yet
Development

No branches or pull requests

3 participants