Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-31708][API / Type Serialization System] make DataInputView#read(byte[], int, int) return 0 if len is 0. #22335

Closed
wants to merge 2 commits into from

Conversation

Shenjiaqi
Copy link

What is the purpose of the change

Refine semantic of DataInputView#read(byte[] b, int off, int len) interface when len is 0 and no more data is available.
There are 4 Classes that implement this method, and their hehavior is different when len is 0 and no more data is available:

There is a chance that when deserializing an protobuf generated empty object, DataInputDeserializer#read(byte[],int,int) would return -1, and then cause job failure. (Detail is posted in issue)
The reason cause this problem is much like: #20860.

I think the interface should be clarified, and we can use the same semantic as InputStream:

If len is zero, then no bytes are read and 0 is returned;
otherwise, there is an attempt to read at least one byte.
If no byte is available because the stream is at end of file, the value -1 is returned
...

Brief change log

  • Add comment in DataInputView#read(byte[] byte, int off, int len) and clarify behavior when len == 0 and no more data available.
  • Make DataInputDeserializer#read(byte[] byte, int off, int len) return 0 if len == 0 and no more data available.
  • Make InternalDeSerializer#read(byte[] byte, int off, int len) return 0 if len == 0 and no more data available.

Verifying this change

This change added tests and can be verified as follows:

  • Unittest is added for DataInputDeserializer#read

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (yes)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 3, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@pltbkd
Copy link
Contributor

pltbkd commented Apr 6, 2023

Thanks for the contribution @Shenjiaqi .

This PR LGTM. And I think unifying the behavior of implementations of DataInputView#read is a great idea.

Would you like take another look @gaoyunhaii ?

@gaoyunhaii
Copy link
Contributor

Thanks @Shenjiaqi for the PR and @pltbkd for the review! I have two minor comments.

@Shenjiaqi
Copy link
Author

Thanks @gaoyunhaii for review. Now code is fixed.

Copy link
Contributor

@gaoyunhaii gaoyunhaii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Shenjiaqi for the changes! LGTM

gaoyunhaii pushed a commit that referenced this pull request Apr 20, 2023
…byte[], int, int) return 0 if len is 0.

This closes #22335.
gaoyunhaii pushed a commit that referenced this pull request Apr 20, 2023
…byte[], int, int) return 0 if len is 0.

This closes #22335.
RocMarshal pushed a commit to RocMarshal/flink that referenced this pull request May 9, 2024
…byte[], int, int) return 0 if len is 0.

This closes apache#22335.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants