Skip to content

[INLONG-7249][Sort] JDBC accurate dirty data archive and metric calculation#7580

Merged
dockerzhang merged 12 commits intoapache:masterfrom
Yizhou-Yang:jdbc-enhance-standalone
Apr 11, 2023
Merged

[INLONG-7249][Sort] JDBC accurate dirty data archive and metric calculation#7580
dockerzhang merged 12 commits intoapache:masterfrom
Yizhou-Yang:jdbc-enhance-standalone

Conversation

@Yizhou-Yang
Copy link
Contributor

@Yizhou-Yang Yizhou-Yang commented Mar 13, 2023

Prepare a Pull Request

Motivation

JDBC needs to archive dirty data. In this pr, I tried to explore to possibility of accurate dirty data archive for jdbc by modifying the executors. The modifications are well-tested and stable in most cases, and in other cases it will give a warning "jdbc enhance failed for class:{}", and won't affect normal code.

Modification

Support dirty data accurate archive by using reflection to replace some kinds of executors, and replace the tablesimplestatementexecutor with one that adds metrics.

To-do and to-improve

1.There are 8 total implementations of flink-cdc-executors.
The currently supported sink executor types are:
TableBufferedStatementExecutor
TableBufferReducedStatementExecutor
TableSimpleStatementExecutor

unsupported executor types: (will not archive dirty data, but will not break code)
KeyedBatchStatementExecutor
NoOPStatementExecutor
TableInsertOrUpdateStatementExecutor

no need to support:
InsertOrUpdateJdbcExecutor (TableInsertOrUpdateStatementExecutor is enough)
SimpleBatchStatementExecutor (TableSimpleStatementExecutor is enough)

2.This pr uses Java reflection depending on flink-cdc-connectors. This design can be improved by introducing the executors into Inlong-jdbc-connector, and modifying the builder to directly change the executor class, instead of doing all this reflection. Also, throughput can increase by refactoring too.

Even Though this pr still has imperfections, I think it can be merged for the following reasons:

  1. This pr can support all the common cases of JDBC dirty data archive, and can support more in the future.
  2. For cases where it does not support, it will not cause exceptions, but simply give a warning and go on.
  3. This will not slow down the existing code and normal data integration.

@Yizhou-Yang
Copy link
Contributor Author

@gong @yunqingmoswu Thanks for the review. I've addressed your comments.

@gong
Copy link
Contributor

gong commented Apr 4, 2023

@Yizhou-Yang pls resolve conflict

Yizhou Yang added 2 commits April 4, 2023 14:19
@dockerzhang dockerzhang merged commit 10a153b into apache:master Apr 11, 2023
@Yizhou-Yang Yizhou-Yang deleted the jdbc-enhance-standalone branch May 31, 2023 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Sort] JDBC accurate dirty data archive and metric calculation

6 participants