-
Couldn't load subscription status.
- Fork 13.8k
[FLINK-19275][connector-kafka] Support reading and writing Kafka meta… …data #13732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 3f19217 (Wed Oct 21 16:35:57 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @twalthr . I left some comments.
| * @see TableSchema#toPhysicalRowDataType() | ||
| */ | ||
| TypeInformation<?> createTypeInformation(DataType consumedDataType); | ||
| <T> TypeInformation<T> createTypeInformation(DataType consumedDataType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice improvement. But is this a compatible change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is not a compatible change. But given that those interfaces are still relatively new and not many people have changed to the new sources/sinks. We should do this change now or never and avoid @SuppressWarning in almost all implementations.
| * support comparing arrays stored in the values of a map. We might update the {@link #equals(Object)} | ||
| * with this implementation in future versions. | ||
| */ | ||
| public static boolean deepEquals(Row row, Object other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other should also be Row class? Because the Javadoc says "Compares two Rows".
| private static <E> boolean deepEqualsList(List<E> l1, List<?> l2) { | ||
| final Iterator<E> i1 = l1.iterator(); | ||
| final Iterator<?> i2 = l2.iterator(); | ||
| while (i1.hasNext() && i2.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not compare size first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this implementation from java.util.AbstractList#equals but I don't have strong opinion on this. LinkedList are usually uncommon I guess.
| * | ||
| * <p>The current implementation of {@link Row#equals(Object)} is not able to compare all deeply | ||
| * nested row structures that might be created in the table ecosystem. For example, it does not | ||
| * support comparing arrays stored in the values of a map. We might update the {@link #equals(Object)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we have already support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might not understand your comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "For example, it does not support comparing arrays stored in the values of a map", however, the tests prove that we have supported this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to Row#equals(Object). And this doesn't support arrays in values of maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got.
| true, | ||
| new Integer[]{1, null, 3, 99}, // diff here | ||
| Arrays.asList(1, null, 3), | ||
| originalMap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a same map reference, create another map?
| ), | ||
|
|
||
| LEADER_EPOCH( | ||
| "leader-epoch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just find if we define metadata key with dash - separator, it has to be declared using FROM clause or escaped, e.g. leader_epoch INT FROM 'leader-epoch'. What do you think about changing the key to leader_epoch which is more SQL identifier compliant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, then I need to change also the recommendations in SupportsMetadata interfaces:
* <p>Metadata key names follow the same pattern as mentioned in {@link Factory}. In case of duplicate
* names in format and source keys, format keys shall have higher precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to keep it as it is. Users can use backticks and leader epoch is not very frequently used. Furthermore, once we introduce metadata for formats such as debezium-json.ingestion-timestamp it would be confusing if the format identifier changes from debezium-json to debezium_json for metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that possible to remove the prefix of debezium-json? The Javadoc also says "In case of duplicate names in format and source keys, format keys shall have higher precedence."
So far, the metadata keys of format and source are very different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can shorten them. The idea was to design the metadata similar to regular options. So if key and value are defined they would get a key. and value. prefix. That should be enough. We don't need the debezium-json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But even for key. and value., users would need to use backticks or the FROM clause. I would stick to the naming convention to not cause confusion.
|
|
||
| // -------------------------------------------------------------------------------------------- | ||
|
|
||
| private static class MetadataKafkaDeserializationSchema implements KafkaDeserializationSchema<RowData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare serialVersionUID for this class.
|
|
||
| // -------------------------------------------------------------------------------------------- | ||
|
|
||
| private static final class MetadataAppendingCollector implements Collector<RowData>, Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare serialVersionUID for this class.
| public void collect(RowData physicalRow) { | ||
| final int metadataArity = metadataConverters.length; | ||
| // shortcut if no metadata is required | ||
| if (metadataArity == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a hasMetadata final member variable to allow JIT compiler optimization.
| Row.of("data 3", 3, "CreateTime", LocalDateTime.parse("2020-03-10T13:12:11.123"), 2L, 0, headers3, 0, topic, true) | ||
| ); | ||
|
|
||
| assertTrue(Row.deepEquals(expected, result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about add exception message for the assertion to print the expected and actual rows. That would be helpful for debugging Azure builds.
…nformation in sources and sinks
…data This updates the `KafkaDynamicSource` and `KafkaDynamicSink` to read and write metadata according to FLIP-107. Reading and writing metadata of formats is not supported yet.
3f19217 to
d5bc61d
Compare
|
@wuchong I updated the PR, I hope I addressed most of your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
+1 to merge
…data This updates the `KafkaDynamicSource` and `KafkaDynamicSink` to read and write metadata according to FLIP-107. Reading and writing metadata of formats is not supported yet. This closes apache#13732.
What is the purpose of the change
This updates the
KafkaDynamicSourceandKafkaDynamicSinkto read andwrite metadata according to FLIP-107. Reading and writing metadata of formats
is not supported yet.
This PR is based on #13618.
Brief change log
SupportsReadingMetadataandSupportsWritingMetadatain source and sinkVerifying this change
KafkaTableITCaseDoes this pull request potentially affect one of the following parts:
@Public(Evolving): yesDocumentation