Introduce optional external.version.header config #697

rjosal-indeed · 2023-05-09T23:25:31Z

Introduce optional external.version.header config to use for ES external version instead of kafka offset for non-datastream indices

Problem

After changing an index mapping, we need to reindex. To do this, we create a new index, run a batch job with all the existing input documents, while at the same time streaming realtime updates, ensuring we do not miss any documents. Today both can't run at the same time because we might overwrite a new streamed update with a batch index.

Solution

ES external versioning can be used to compare document versions for idempotency. We will be able to pull a version out of the document into a kafka header and use the same version in the batch job. We choose header so that deletes with null values are also idempotent.

Does this solution apply anywhere else?

[ x] yes
no

If yes, where?

There are a couple of other uses cases that may be solved by this control.
https://github.com/confluentinc/kafka-connect-elasticsearch/issues?q=is%3Aissue+is%3Aopen+external+version

Test Strategy

Testing done:

[x ] Unit tests
Integration tests
System tests
Manual tests

Release Plan

If merged, we will run this connector in Confluent Cloud. Else, we will run the fork. As an optional feature, with a default that does not affect existing usages, it is backwards compatible.

master

n/a

Thank you, and please pick this apart!

cla-assistant · 2023-09-11T09:31:28Z

All committers have signed the CLA.

rjosal-indeed · 2023-10-26T17:49:52Z

src/main/java/io/confluent/connect/elasticsearch/DataConverter.java

+        request.version((Long) record.headers().lastWithName(
+                config.externalVersionHeader()).value());


When we put this in service, we noticed that even though we used an SMT to write a long type header, it didn't come out that way in this code. We got it to work by using org.apache.kafka.connect.storage.SimpleHeaderConverter #fromConnectHeader and then reading it as a String and parsing it to a Long. Does that sound correct or did we misunderstand something about kafka connect headers?

Hey @rjosal-indeed
Apologies, I missed this before.

Would it be possible for you to elaborate it? I am not sure if I quite understood how you used SimpleHeaderConverter.

Thanks!

yes, the code for this lines with the connect header converter looks like:

final Header versionHeader = record.headers().lastWithName(config.externalVersionHeader()); final byte[] versionValue = HEADER_CONVERTER.fromConnectHeader( record.topic(), versionHeader.key(), versionHeader.schema(), versionHeader.value() ); request.version(Long.parseLong(new String(versionValue)));

where HEADER_CONVERTER is an constant new SimpleHeaderConverter()

Hey @rjosal-indeed
Great! Thank you so much, this makes sense.

I think we can go ahead with this change. Also, can you refer to the suggestions mentioned here (ignore 2nd point) and make the changes accordingly? After, we can go ahead and merge the changes real quick.

@sp-gupta ready for review, thanks!

Thanks @rjosal-indeed
There was one import missing due to which the build was failing. I have corrected it.

Again, Thank you so much for your contribution. Really appreciate!!

src/main/java/io/confluent/connect/elasticsearch/DataConverter.java

…nal version instead of kafka offset for non-datastream indices

rjosal-indeed requested a review from a team as a code owner May 9, 2023 23:25

rjosal-indeed commented Oct 26, 2023

View reviewed changes

sp-gupta reviewed Nov 21, 2023

View reviewed changes

src/main/java/io/confluent/connect/elasticsearch/DataConverter.java Outdated Show resolved Hide resolved

rjosal-indeed changed the base branch from master to 10.0.x November 28, 2023 00:16

rjosal-indeed changed the base branch from 10.0.x to master November 29, 2023 18:54

Introduce optional external.version.header config to use for ES exter…

6efa59d

…nal version instead of kafka offset for non-datastream indices

rjosal-indeed force-pushed the feature-external-version-header branch from ee17b6e to 6efa59d Compare December 4, 2023 18:49

rjosal-indeed changed the base branch from master to 11.0.x December 4, 2023 18:51

Correct importing for IndexRequest

9b06541

sp-gupta approved these changes Dec 5, 2023

View reviewed changes

sp-gupta merged commit 9e95174 into confluentinc:11.0.x Dec 5, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce optional external.version.header config #697

Introduce optional external.version.header config #697

rjosal-indeed commented May 9, 2023

cla-assistant bot commented Sep 11, 2023 •

edited

rjosal-indeed Oct 26, 2023

sp-gupta Nov 22, 2023 •

edited

rjosal-indeed Nov 22, 2023

sp-gupta Nov 23, 2023 •

edited

rjosal-indeed Dec 4, 2023

sp-gupta Dec 5, 2023

		request.version((Long) record.headers().lastWithName(
		config.externalVersionHeader()).value());

Introduce optional external.version.header config #697

Introduce optional external.version.header config #697

Conversation

rjosal-indeed commented May 9, 2023

Problem

Solution

Does this solution apply anywhere else?

If yes, where?

Test Strategy

Testing done:

Release Plan

cla-assistant bot commented Sep 11, 2023 • edited

rjosal-indeed Oct 26, 2023

Choose a reason for hiding this comment

sp-gupta Nov 22, 2023 • edited

Choose a reason for hiding this comment

rjosal-indeed Nov 22, 2023

Choose a reason for hiding this comment

sp-gupta Nov 23, 2023 • edited

Choose a reason for hiding this comment

rjosal-indeed Dec 4, 2023

Choose a reason for hiding this comment

sp-gupta Dec 5, 2023

Choose a reason for hiding this comment

cla-assistant bot commented Sep 11, 2023 •

edited

sp-gupta Nov 22, 2023 •

edited

sp-gupta Nov 23, 2023 •

edited