Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32665][format] Support reading null value for csv format #23087

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FangYongs
Copy link
Contributor

What is the purpose of the change

Currently we can create table with nullable column for csv format table, but it will throw exception if there's null value in the record for these columns. This PR aims to support reading null value for csv format.

Brief change log

  • Return null value for int/long/double/boolean when the text is empty

Verifying this change

This change added tests and can be verified as follows:

  • Added CsvFormatFactoryTest.testDeserializeNullValue to read null value

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no) no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no) no
  • The serializers: (yes / no / don't know) no
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know) no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) no
  • The S3 file system connector: (yes / no / don't know) no

Documentation

  • Does this pull request introduce a new feature? (yes / no) no
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 27, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@FangYongs
Copy link
Contributor Author

@libenchao Please help to review this PR when you're free, thanks

@FangYongs FangYongs requested a review from libenchao July 28, 2023 05:11
@@ -185,48 +187,57 @@ private CsvToRowDataConverter createConverter(LogicalType type) {
}
}

private boolean convertToBoolean(JsonNode jsonNode) {
private <V> V convertStringToValue(JsonNode jsonNode, Function<String, V> function) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createNullableConverter should have already handled nulls (similar to json format), however, for csv, this does not work because blank fields is represented as empty text node (empty string). I'm not sure whether there is any configuration for jackson csv mapper to represent blank fields to a null node?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a config in csv format about null literal null-literal, I guess setting it to empty string would solve the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try the option null-literal and it works. But I think there're some improvement points about csv format here. Currently when I create a csv format table with nullable fields such as table ( a int, b int). There are no additional options for the table and I can successfully insert null data to the field a and b. When I try to select results from the table, it throw exception that empty string('') cannot be parsed as int value. I think there may be two solutions for this, and we can pick one:

  1. Throw an exception and tell users to configure null-literal for their csv table if they try to insert a null value.
  2. Give the option null-literal a default value such as empty string('')

@libenchao What do you think of it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. Give the option null-literal a default value such as empty string('')

That would be a breaking change for existing users, which I don't think we should do here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have the concern of the breaking change. Then we can choose 1 as a temporary solution.

But eventually, it would be good to improve the out-of-box experience, and go for 2 finally, maybe in Flink 2.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and go for 2 finally, maybe in Flink 2.0?

Do propose it for Flink 2.0 indeed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants