Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector] Add clickhouse-file sink support clickhouse bullk load #1382

Closed
3 tasks done
Hisoka-X opened this issue Mar 3, 2022 · 3 comments · Fixed by #1501
Closed
3 tasks done

[Feature][Connector] Add clickhouse-file sink support clickhouse bullk load #1382

Hisoka-X opened this issue Mar 3, 2022 · 3 comments · Fixed by #1501

Comments

@Hisoka-X
Copy link
Member

Hisoka-X commented Mar 3, 2022

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

Summary

In the scenario where massive data is written to Clickhouse, traditional jdbc cannot carry such a large amount of data. Similar to hbase's bulk load function, seatunnel can provide support for clickhouse to directly write data files.

Plan

This is original plan: ClickHouse/ClickHouse#10473
Ours plan:
Clickhouse_bulk_load

Details:

  1. create table use clickhouse-local
  2. receive data from upstream
  3. execute insert sql with data into clickhouse-local table
  4. use zero copy send data file to clickhouse server, path is : '${clickhouse_data_location}/${database}/${table}/detached'
  5. use clickhouse Attacth statements make data file can query

Options

name type required default value
bulk_size number no 100000
database string yes -
clickhouse_local_path string yes -
fields array no -
host string yes -
password string no -
table string yes -
username string no -
sharding_key string no -
common-options string no -

Some problem

  1. clickhouse-local program should be installed in every spark node with same path before seatunnel application start.
  2. not all engine can work great, we make MergeTree series and Distribute work fine first.
  3. how send file from spark node to clickhouse node

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Mar 4, 2022

@CalvinKirs Can we talk about this?

@William-GuoWei
Copy link

about the question: how to send files from spark node to ClickHouse node
I think maybe use Rsync, and use a config file to save password?
So I think we need a document describing how to use it.

@IVitamin-C
Copy link

about the question: how to send files from spark node to ClickHouse node I think maybe use Rsync, and use a config file to save password? So I think we need a document describing how to use it.

rsync can use ssh protocol or configure daemon,provide a generic rsync daemon config,Compatible with Ordinary Database engine or Atomic Database engine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants