[HUDI-6683] Added kafka key as part of hudi metadata columns for Json & Avro KafkaSource by prathit06 · Pull Request #9403 · apache/hudi

prathit06 · 2023-08-09T02:48:41Z

Change Logs

This changes add capability to add kafka message key as part of hudi metadata columns for JsonKafkaSource & AvroKafkaSource
For context : #9391

Impact

Describe any public API or user-facing feature change or any performance impact. : NA

Risk level (write none, low medium or high below) : None

If medium or high, explain what verification was done to mitigate the risks.

Documentation Update : None

Describe any necessary documentation update if there is any new feature, config, or user-facing change

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

…& test cases changes

danny0405 · 2023-08-10T03:37:15Z

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/AvroConvertor.java

    recordBuilder.set(KAFKA_SOURCE_OFFSET_COLUMN, consumerRecord.offset());
    recordBuilder.set(KAFKA_SOURCE_PARTITION_COLUMN, consumerRecord.partition());
    recordBuilder.set(KAFKA_SOURCE_TIMESTAMP_COLUMN, consumerRecord.timestamp());
+    recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, kafkaKey);


Suggested change

recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, kafkaKey);

recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, String.valueOf(consumerRecord.key()));

accepted the suggestion & have made the changes

danny0405 · 2023-08-10T03:37:44Z

hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/KafkaOffsetPostProcessor.java

      newFieldList.add(new Schema.Field(KAFKA_SOURCE_PARTITION_COLUMN, Schema.create(Schema.Type.INT), "partition column", 0));
      newFieldList.add(new Schema.Field(KAFKA_SOURCE_TIMESTAMP_COLUMN, Schema.create(Schema.Type.LONG), "timestamp column", 0));
+      newFieldList.add(new Schema.Field(KAFKA_SOURCE_KEY_COLUMN, createNullableSchema(Schema.Type.STRING), "kafka key column", JsonProperties.NULL_VALUE));
      Schema newSchema = Schema.createRecord(schema.getName() + "_processed", schema.getDoc(), schema.getNamespace(), false, newFieldList);


The key is always a string type? Could it be bytes in Kafka ?

The key will always be string type , please refer : https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java#L61
"key.deserializer", StringDeserializer.class.getName()

Where is the code that the key is used as a record key field, I didn't see it.

Referring to the discussion here , the idea was to add kafka key as part of hudi metadata column & not as a recordKey.

In order to set kafka key as record key, end user can do so by setting hoodie.datasource.write.recordkey.field to _hoodie_kafka_source_key , please refer here for more context.

…t06/hudi into add-kafka-key-json-source

prathit06 · 2023-08-10T15:42:31Z

@hudi-bot run azure

hudi-bot · 2023-08-10T19:53:29Z

CI report:

b5846de Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

prathit06 · 2023-08-14T05:14:52Z

Hi @danny0405 , if all looks good, can we merge this PR ?
Please let me if any other action item.

danny0405 · 2023-08-14T05:59:55Z

Hi @danny0405 , if all looks good, can we merge this PR ? Please let me if any other action item.

Have re-triggered the tests.

prathit06 · 2023-08-14T08:51:21Z

Hi @danny0405 , if all looks good, can we merge this PR ? Please let me if any other action item.

Have re-triggered the tests.

tests are success for CI @danny0405

hussein-awala · 2023-08-14T23:49:46Z

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java

        ObjectMapper om = new ObjectMapper();
        partitionIterator.forEachRemaining(consumerRecord -> {
          String record = consumerRecord.value().toString();
+          String recordKey = (String) consumerRecord.key();


Suggested change

String recordKey = (String) consumerRecord.key();

String recordKey = consumerRecord.key().toString();

hussein-awala · 2023-08-14T23:52:06Z

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java

@@ -80,11 +81,13 @@ protected  JavaRDD<String> maybeAppendKafkaOffsets(JavaRDD<ConsumerRecord<Object
        ObjectMapper om = new ObjectMapper();
        partitionIterator.forEachRemaining(consumerRecord -> {
          String record = consumerRecord.value().toString();


I think renaming this variable to recordValue might make the code more readable:

Suggested change

String record = consumerRecord.value().toString();

String recordValue = consumerRecord.value().toString();

hussein-awala · 2023-08-14T23:56:40Z

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/AvroConvertor.java

    recordBuilder.set(KAFKA_SOURCE_OFFSET_COLUMN, consumerRecord.offset());
    recordBuilder.set(KAFKA_SOURCE_PARTITION_COLUMN, consumerRecord.partition());
    recordBuilder.set(KAFKA_SOURCE_TIMESTAMP_COLUMN, consumerRecord.timestamp());
+    recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, String.valueOf(consumerRecord.key()));


Suggested change

recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, String.valueOf(consumerRecord.key()));

recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, consumerRecord.key().toString());

danny0405 · 2023-08-15T02:07:03Z

Thanks for the nice feedback @hussein-awala , maybe you can fire a separate PR to address it.

… & Avro KafkaSource (#9403)

… & Avro KafkaSource (apache#9403)

Added kafka key as part of hudi metadata columns for JsonKafkaSource

12cdd1c

prathit06 mentioned this pull request Aug 9, 2023

[ENHANCEMENT] Kafka Key as part of hudi metadata columns #9391

Closed

prathit06 added 2 commits August 9, 2023 12:18

Added kafka key as part of hudi metadata columns for AvroKafkaSource

d7611c4

Added kafka key as part of hudi metadata columns for AvroKafkaSource …

55da094

…& test cases changes

prathit06 changed the title ~~Added kafka key as part of hudi metadata columns for JsonKafkaSource~~ Added kafka key as part of hudi metadata columns for Json & Avro KafkaSource Aug 9, 2023

danny0405 reviewed Aug 10, 2023

View reviewed changes

prathit06 and others added 3 commits August 10, 2023 21:05

Merge branch 'apache:master' into add-kafka-key-json-source

20a758b

removed kafkaKey var

bb679a8

Merge branch 'add-kafka-key-json-source' of https://github.com/prathi…

b5846de

…t06/hudi into add-kafka-key-json-source

prathit06 requested a review from danny0405 August 10, 2023 15:41

prathit06 changed the title ~~Added kafka key as part of hudi metadata columns for Json & Avro KafkaSource~~ [MINOR] Added kafka key as part of hudi metadata columns for Json & Avro KafkaSource Aug 10, 2023

danny0405 changed the title ~~[MINOR] Added kafka key as part of hudi metadata columns for Json & Avro KafkaSource~~ [HUDI-6683] Added kafka key as part of hudi metadata columns for Json & Avro KafkaSource Aug 11, 2023

hussein-awala reviewed Aug 14, 2023

View reviewed changes

danny0405 approved these changes Aug 15, 2023

View reviewed changes

danny0405 merged commit 4099e1d into apache:master Aug 15, 2023

hussein-awala mentioned this pull request Aug 15, 2023

[MINOR] Rename kafka record value variable in JsonKafkaSource and replace casting to String by calling toStrin #9451

Merged

4 tasks

prashantwason pushed a commit that referenced this pull request Aug 18, 2023

[HUDI-6683] Added kafka key as part of hudi metadata columns for Json…

97f21f8

… & Avro KafkaSource (#9403)

prathit06 mentioned this pull request Aug 22, 2023

[HUDI-6683][FOLLOW-UP] Json & Avro Kafka Source Minor Refactor & Added null Kafka Key test cases #9459

Merged

4 tasks

leosanqing pushed a commit to leosanqing/hudi that referenced this pull request Sep 13, 2023

[HUDI-6683] Added kafka key as part of hudi metadata columns for Json…

52125fc

… & Avro KafkaSource (apache#9403)

	recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, kafkaKey);
	recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, String.valueOf(consumerRecord.key()));

	String recordKey = (String) consumerRecord.key();
	String recordKey = consumerRecord.key().toString();

	String record = consumerRecord.value().toString();
	String recordValue = consumerRecord.value().toString();

Conversation

prathit06 commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level (write none, low medium or high below) : None

Documentation Update : None

Contributor's checklist

Uh oh!

danny0405 Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

prathit06 Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

danny0405 Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

prathit06 Aug 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 Aug 14, 2023

Choose a reason for hiding this comment

Uh oh!

prathit06 Aug 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prathit06 commented Aug 10, 2023

Uh oh!

hudi-bot commented Aug 10, 2023

CI report:

Uh oh!

prathit06 commented Aug 14, 2023

Uh oh!

danny0405 commented Aug 14, 2023

Uh oh!

prathit06 commented Aug 14, 2023

Uh oh!

hussein-awala Aug 14, 2023

Choose a reason for hiding this comment

Uh oh!

hussein-awala Aug 14, 2023

Choose a reason for hiding this comment

Uh oh!

hussein-awala Aug 14, 2023

Choose a reason for hiding this comment

Uh oh!

danny0405 commented Aug 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

prathit06 commented Aug 9, 2023 •

edited

Loading

prathit06 Aug 10, 2023 •

edited

Loading

prathit06 Aug 14, 2023 •

edited

Loading