Skip to content

Comments

[INLONG-4814][Sort][Manager] Declare the primary key for sink table when the query contains update/delete#4882

Merged
EMsnap merged 6 commits intoapache:masterfrom
Greedyu:bugfix-sort_clickhouse_conf
Aug 2, 2022
Merged

[INLONG-4814][Sort][Manager] Declare the primary key for sink table when the query contains update/delete#4882
EMsnap merged 6 commits intoapache:masterfrom
Greedyu:bugfix-sort_clickhouse_conf

Conversation

@Greedyu
Copy link
Contributor

@Greedyu Greedyu commented Jul 5, 2022

Handling exceptions: please declare primary key for sink table when query contains update/delete record.

Briefly describe ideas

  1. The problem is that with flink JDBC SQL you have to specify the primary key.
    wecom-temp-5cf30d79023ca92a09bfb0eaee00ac84

  2. When the Sort module sees the processing of primaryKey, it should be a problem with the parameters in the node. Gotta go check it out
    wecom-temp-ae9621b69bbce4feb44abe2b79454ee2

  3. Debug test locates that the primaryKey in the file read by sort is indeed empty

image

  1. To query the source of assignment of GroupInfo.StreamInfo.nodes, there are only CreateSortConfigListenerV2 and CreateStreamSortConfigListener classes, and they are both created by the createNodesForStream() method.
    image

  2. Test after bug fixes

image

@healchow healchow changed the title [INLONG-4814][Sort][Manager] please declare primary key for sink table when query contains update/delete record [INLONG-4814][Sort][Manager] Declare the primary key for sink table when the query contains update/delete. Jul 6, 2022
@healchow healchow changed the title [INLONG-4814][Sort][Manager] Declare the primary key for sink table when the query contains update/delete. [INLONG-4814][Sort][Manager] Declare the primary key for sink table when the query contains update/delete Jul 6, 2022
@thesumery
Copy link
Contributor

thesumery commented Jul 6, 2022

@Greedyu
Hello, thanks for your issue!
There are some reasons that clickhouse sink didn't release the ability of update/delete:

  1. clickhouse update operations is heavy, it will takes precedence over all others operation to execute, so it has a poor performance. It is more suitable for batch scenes but not streaming scenes where operations are performed frequently.

@thesumery
Copy link
Contributor

thesumery commented Jul 6, 2022

@yunqingmoswu @Greedyu
If user want to use update scenes, how about this two different solution:

  1. Import update ability,but should tell users the limitation(all code and docs should comment it). In additional what you have submitte in this pr, you should also change update and delete statement in ClickHouseDialect.
  2. Enable upstream append-mode,so upstream produce record is only append stream. Add time and rowkind field in clickhouse table, those two filed comes from upstream table metadata fileds.Usersdecides how to deduplicate the downstream olap sql.

@Greedyu
Copy link
Contributor Author

Greedyu commented Jul 6, 2022

@yunqingmoswu @Greedyu If user want to use update scenes, how about this two different solution:

  1. Import update ability,but should tell users the limitation(all code and docs should comment it). In additional what you have submitte in this pr, you should also change update and delete statement in ClickHouseDialect.
  2. Enable upstream append-mode,so upstream produce record is only append stream. Add time and rowkind field in clickhouse table, those two filed comes from upstream table metadata fileds.Usersdecides how to deduplicate the downstream olap sql.

I thought this was just a bug before, but I didn't expect this ability to be purposely not supported.
But there is still a question, why does the normal mode create group also generate Flink cdc sql

@dockerzhang dockerzhang requested a review from healchow July 6, 2022 09:16
Copy link
Member

@EMsnap EMsnap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some commentary on the cost of clickhouse update when added primary key

@Greedyu Greedyu force-pushed the bugfix-sort_clickhouse_conf branch from a6407c1 to 2a4a11e Compare July 30, 2022 06:40
@Greedyu Greedyu force-pushed the bugfix-sort_clickhouse_conf branch from 69bf9aa to 66cbc06 Compare August 1, 2022 03:28
@Greedyu
Copy link
Contributor Author

Greedyu commented Aug 1, 2022

added

Add in ClickHouseLoadNode class

2. JdbcDialects removes clickhouse dialect, specified by table creation
Copy link
Contributor

@yunqingmoswu yunqingmoswu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EMsnap EMsnap merged commit 635c974 into apache:master Aug 2, 2022
@thesumery
Copy link
Contributor

LGTM

bruceneenhl pushed a commit to bruceneenhl/inlong that referenced this pull request Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][Sort][Manager] Declare the primary key for sink table when the query contains update/delete

7 participants