-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulsar IO - KafkaSource - Implement KeyValue support for KafkaBytesSource #10002
Conversation
Fix a problem in HttpLookupService#getSchema
Can someone help me reviewing this patch ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just left a minor comment.
props.put(key, ByteBufferDeserializer.class.getCanonicalName()); | ||
|
||
Schema<?> result; | ||
if (ByteArrayDeserializer.class.getName().equals(kafkaDeserializerClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use switch case to instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately these are not constants and I cannot use the switch.
I like that we have this strong compile binding. If I write string constants I am losing the reference to the class.
This is not code in hotpath, so using the switch is only a syntax issue.
So you feel strong that we should use the switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you feel strong that we should use the switch?
No, just from the syntax perspective, it works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Thanks for your review.
Can you please help merging this patch?
CI is green
@congbobo184 @gaoran10 Could you please also help review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work.
Thank you @codelipenghui and @congbobo184 I have merged this patch. I will follow up with other enhancements to the KafkaSource |
Motivation
Add support for different key and value Deserializers for the Kafka Source.
With this change the Kafka Connector supports non-String keys and it also apply the correct Schema to the Pulsar Topic.
For primitive datatypes we are not decoding the Kafka key pair into Java Objects, we are simply passing a reference to the internal ByteBuffer (that is a wrapper for a byte[]).
The Schema type is decided using the
keyDeserializationClass
andvalueDeserializationClass
parameters that you pass to the Kafka Source configuration.This is the mapping;
When the key deserializer is StringDeserializer we use the decoded key as Pulsar key.
When the key is not StringDeserializer then we use the Pulsar KeyValue data type, with SEPARATED key encoding .
This way on the topic we have a Schema for the key and a Schema for the value.
The key is encoded into the Pulsar key (SEPARATED) and so it is used for routing and for compaction.
Limits of this patch:
Modifications
Verifying this change
The patch introduce unit tests that cover the new code.
Documentation