-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-32665][format] Support reading null value for csv format #23087
base: master
Are you sure you want to change the base?
Conversation
de7723b
to
da9ceca
Compare
@libenchao Please help to review this PR when you're free, thanks |
@@ -185,48 +187,57 @@ private CsvToRowDataConverter createConverter(LogicalType type) { | |||
} | |||
} | |||
|
|||
private boolean convertToBoolean(JsonNode jsonNode) { | |||
private <V> V convertStringToValue(JsonNode jsonNode, Function<String, V> function) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createNullableConverter
should have already handled nulls (similar to json format), however, for csv, this does not work because blank fields is represented as empty text node (empty string). I'm not sure whether there is any configuration for jackson csv mapper to represent blank fields to a null node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a config in csv format about null literal null-literal
, I guess setting it to empty string would solve the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try the option null-literal
and it works. But I think there're some improvement points about csv format here. Currently when I create a csv format table with nullable fields such as table ( a int, b int). There are no additional options for the table and I can successfully insert null data to the field a and b. When I try to select results from the table, it throw exception that empty string('') cannot be parsed as int value. I think there may be two solutions for this, and we can pick one:
- Throw an exception and tell users to configure
null-literal
for their csv table if they try to insert a null value. - Give the option
null-literal
a default value such as empty string('')
@libenchao What do you think of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Give the option
null-literal
a default value such as empty string('')
That would be a breaking change for existing users, which I don't think we should do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have the concern of the breaking change. Then we can choose 1 as a temporary solution.
But eventually, it would be good to improve the out-of-box experience, and go for 2 finally, maybe in Flink 2.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and go for 2 finally, maybe in Flink 2.0?
Do propose it for Flink 2.0 indeed!
What is the purpose of the change
Currently we can create table with nullable column for csv format table, but it will throw exception if there's null value in the record for these columns. This PR aims to support reading null value for csv format.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no) noDocumentation