[HUDI-6683] Added kafka key as part of hudi metadata columns for Json & Avro KafkaSource#9403
Conversation
| recordBuilder.set(KAFKA_SOURCE_OFFSET_COLUMN, consumerRecord.offset()); | ||
| recordBuilder.set(KAFKA_SOURCE_PARTITION_COLUMN, consumerRecord.partition()); | ||
| recordBuilder.set(KAFKA_SOURCE_TIMESTAMP_COLUMN, consumerRecord.timestamp()); | ||
| recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, kafkaKey); |
There was a problem hiding this comment.
| recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, kafkaKey); | |
| recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, String.valueOf(consumerRecord.key())); |
There was a problem hiding this comment.
accepted the suggestion & have made the changes
| newFieldList.add(new Schema.Field(KAFKA_SOURCE_PARTITION_COLUMN, Schema.create(Schema.Type.INT), "partition column", 0)); | ||
| newFieldList.add(new Schema.Field(KAFKA_SOURCE_TIMESTAMP_COLUMN, Schema.create(Schema.Type.LONG), "timestamp column", 0)); | ||
| newFieldList.add(new Schema.Field(KAFKA_SOURCE_KEY_COLUMN, createNullableSchema(Schema.Type.STRING), "kafka key column", JsonProperties.NULL_VALUE)); | ||
| Schema newSchema = Schema.createRecord(schema.getName() + "_processed", schema.getDoc(), schema.getNamespace(), false, newFieldList); |
There was a problem hiding this comment.
The key is always a string type? Could it be bytes in Kafka ?
There was a problem hiding this comment.
The key will always be string type , please refer : https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java#L61
"key.deserializer", StringDeserializer.class.getName()
There was a problem hiding this comment.
Where is the code that the key is used as a record key field, I didn't see it.
There was a problem hiding this comment.
|
@hudi-bot run azure |
|
Hi @danny0405 , if all looks good, can we merge this PR ? |
Have re-triggered the tests. |
tests are success for CI @danny0405 |
| ObjectMapper om = new ObjectMapper(); | ||
| partitionIterator.forEachRemaining(consumerRecord -> { | ||
| String record = consumerRecord.value().toString(); | ||
| String recordKey = (String) consumerRecord.key(); |
There was a problem hiding this comment.
| String recordKey = (String) consumerRecord.key(); | |
| String recordKey = consumerRecord.key().toString(); |
| @@ -80,11 +81,13 @@ protected JavaRDD<String> maybeAppendKafkaOffsets(JavaRDD<ConsumerRecord<Object | |||
| ObjectMapper om = new ObjectMapper(); | |||
| partitionIterator.forEachRemaining(consumerRecord -> { | |||
| String record = consumerRecord.value().toString(); | |||
There was a problem hiding this comment.
I think renaming this variable to recordValue might make the code more readable:
| String record = consumerRecord.value().toString(); | |
| String recordValue = consumerRecord.value().toString(); |
| recordBuilder.set(KAFKA_SOURCE_OFFSET_COLUMN, consumerRecord.offset()); | ||
| recordBuilder.set(KAFKA_SOURCE_PARTITION_COLUMN, consumerRecord.partition()); | ||
| recordBuilder.set(KAFKA_SOURCE_TIMESTAMP_COLUMN, consumerRecord.timestamp()); | ||
| recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, String.valueOf(consumerRecord.key())); |
There was a problem hiding this comment.
| recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, String.valueOf(consumerRecord.key())); | |
| recordBuilder.set(KAFKA_SOURCE_KEY_COLUMN, consumerRecord.key().toString()); |
|
Thanks for the nice feedback @hussein-awala , maybe you can fire a separate PR to address it. |
… & Avro KafkaSource (#9403)
… & Avro KafkaSource (apache#9403)
Change Logs
This changes add capability to add kafka message key as part of hudi metadata columns for JsonKafkaSource & AvroKafkaSource
For context : #9391
Impact
Describe any public API or user-facing feature change or any performance impact. : NA
Risk level (write none, low medium or high below) : None
If medium or high, explain what verification was done to mitigate the risks.
Documentation Update : None
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist