-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-8630] [table] To support JSON schema to TypeInformation conversion #5491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @twalthr. It looks pretty good! I just had some minor comments.
Besides, as the (de)serialization procedures are applied for the byte[]
and Row
types, the JSON can only be considered as an intermediate type. I just wonder whether there are some cases where we must operate a "pure JSON string" instead of a jackson ObjectNode
.
Thanks, Xingcan
} | ||
|
||
@Override | ||
public boolean isEndOfStream(ObjectNode nextElement) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overridden method seems to be redundant.
private ObjectMapper mapper; | ||
|
||
@Override | ||
public ObjectNode deserialize(byte[] message) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this method should be thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The framework takes care of thread safety. Same as MapFunction
etc.
/** | ||
* Deserialization schema from JSON to Flink types. | ||
* | ||
* <p>Deserializes the <code>byte[]</code> messages as a JSON object and reads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
messages -> message
private static final String CONTENT_ENCODING_BASE64 = "base64"; | ||
|
||
/** | ||
* Converts a JSON schema into Flink's type information. Throws an exception of the schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of -> if
* Converts a JSON schema into Flink's type information. It uses {@link Row} for representing | ||
* objects and tuple arrays. | ||
* | ||
* <p>Note: This converter implements just a subset of the JSON schema specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the JSON Schema is still evolving. Shall we consider specifying a version for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. I added a comment about the version (mostly draft-07). But since we only implement a subset of it and also include some keywords from older drafts it is hard to explain. I will add some examples to the docs to show what we support, this should help in those cases.
} | ||
|
||
@Override | ||
public byte[] serialize(Row row) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serialize()
method is also not thread-safe since it invokes the method such as SimpleDateFromat.format()
. Not sure if it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The framework takes care of duplicating the class.
Thanks for the review @xccui. I agree that a pure string-based format would be helpful as well. For this we can simply use a string serialization schema later. In a long-term view we will need to implement scalar functions that can handle a json string and allow accessing such a string as type-safe as possible. |
Merging... |
What is the purpose of the change
This PR implements (almost) full support of the JSON type. It includes:
number
,integer
,string
,object
,array
typesBrief change log
flink-json
inflink-formats
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: yesDocumentation