Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipeline-connector][kafka] add kafka pipeline data sink connector. #2938

Merged
merged 2 commits into from
Apr 23, 2024

Conversation

lvyanquan
Copy link
Contributor

This closes #2691.

  • support value format of debeium-json and canal-json.
  • The written topic of Kafka will be namespace.schemaName.tableName string of TableId,this can be changed using route function of pipeline.
  • If the written topic of Kafka is not existed, we will create one automatically.

@github-actions github-actions bot added the docs Improvements or additions to documentation label Dec 28, 2023
@lvyanquan lvyanquan force-pushed the pipeline_kafka branch 2 times, most recently from 7e5fd66 to e571c1a Compare December 28, 2023 07:19
@lvyanquan lvyanquan changed the title [WIP][pipeline-connector][kafka] add kafka pipeline data sink connector. [pipeline-connector][kafka] add kafka pipeline data sink connector. Dec 28, 2023
@lvyanquan
Copy link
Contributor Author

@Shawn-Hx PTAL.

Copy link
Contributor

@Shawn-Hx Shawn-Hx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great contribution. Just left some small comments.

And can log4j2-test.properties be added?

@lvyanquan
Copy link
Contributor Author

Thanks for these advice, addressed it.

Copy link
Contributor

@Shawn-Hx Shawn-Hx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvyanquan
Copy link
Contributor Author

@leonardBang @PatrickRen CC.

@lvyanquan lvyanquan force-pushed the pipeline_kafka branch 2 times, most recently from b5b18e7 to aa6210d Compare January 24, 2024 13:04
@svea-vip
Copy link

I tested it, but did not get the output result I expected. The column names were changed to f1, f2, and I noticed that the schema change event was skipped in the code. I think it should be configurable here, and I am trying to compile and modify it. I hope to get your help.

@lvyanquan
Copy link
Contributor Author

@svea-vip Hi,can you show more about your situation, and what's the output result you expected?
What impact will the change in table structure have if I only output data here.

@svea-vip
Copy link

svea-vip commented Feb 26, 2024

@lvyanquan
now col1:1 col2:2 --> f1:1 f2:2
expect col1:1 col2:2 --> col1:1 col2:2

@svea-vip
Copy link

@lvyanquan I would like to know your pipeline design for Kafka, which does not have independent metadata like a database. How do you plan to implement MetadataAccessor and MetadataApplier for Kafka.

@maver1ck
Copy link
Contributor

Can we use Schema Registry as Metadata provider ?

@lvyanquan
Copy link
Contributor Author

@lvyanquan I would like to know your pipeline design for Kafka, which does not have independent metadata like a database. How do you plan to implement MetadataAccessor and MetadataApplier for Kafka.

As Kafka does not have independent metadata like a database, I actually do nothing in MetadataApplier, and skip processing SchemaChangeEvent.

@melin
Copy link

melin commented Feb 28, 2024

"The written topic of Kafka will be namespace. SchemaName. TableName string of TableId, this can be changed using route function of pipeline."

If there are many tables and one topic is written to each table, too many topics may cause kafka to write randomly. Specifies whether to write a topic, kakfka header record database & table name

@lvyanquan
Copy link
Contributor Author

lvyanquan commented Feb 29, 2024

Specifies whether to write a topic, kakfka header record database & table name

I've added two options to specify this, PTAL. @melin

@lvyanquan
Copy link
Contributor Author

Can we use Schema Registry as Metadata provider ?

Yes, but I am still considering whether to output and how to output table structure changes.

@lvyanquan
Copy link
Contributor Author

Rebased to master.

Copy link
Contributor

@PatrickRen PatrickRen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lvyanquan Thanks for the PR! I left some comments

@github-actions github-actions bot added the build label Apr 11, 2024
@lvyanquan lvyanquan force-pushed the pipeline_kafka branch 3 times, most recently from 440479e to b3164ad Compare April 12, 2024 09:41
@melin
Copy link

melin commented Apr 15, 2024

Support add custom key and value to kafka header, Value is a constant value, example: region = hangzhou

Copy link
Contributor

@PatrickRen PatrickRen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lvyanquan Thanks for the update! LGTM

@PatrickRen PatrickRen merged commit 253ef92 into apache:master Apr 23, 2024
15 checks passed
wuzhenhua01 pushed a commit to wuzhenhua01/flink-cdc-connectors that referenced this pull request Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[flink-cdc-pipeline-connectors] Add Implementation of DataSink in Kafka
7 participants