Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files have negative available bytes due to long to int type cas… #21

Merged
merged 1 commit into from
Jul 26, 2018

Conversation

davydotcom
Copy link
Contributor

…ting..

@davydotcom
Copy link
Contributor Author

It was observed that NFSFileInputStream was coming back with 0 bytes if file was larger than 2.1GB (basically Integer.MAX_VALUE)

@DavidASeibert
Copy link
Contributor

This is actually an interface problem - for practical Java coding reasons, I implemented the InputStream interface, but that interface doesn't support large streams properly. The proposed fix will make the returned value look more sensible, but it will still be wrong, and it makes it harder than necessary for users to see if the value is correct or not.

The correct solution may be to return -1 if the value is larger than Integer.MaxValue, and then to implement an interface extension that handles these larger streams. I have not yet figured out how to design that extension for ease of use, though, and I do not know of any standard fix that has been developed for that interface. Do you have any suggestions for the complete fix?

@davydotcom
Copy link
Contributor Author

davydotcom commented Jun 29, 2018 via email

@davydotcom
Copy link
Contributor Author

The proper answer or definition of the available() method from the javadoc is this...

"
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes. "

So if implementing properly it would estimate this value and never be based on the size of the file itself.

@davydotcom
Copy link
Contributor Author

davydotcom commented Jun 29, 2018

I should highlight this is a minimal fix so that it behaves acceptably with other streams for fetching files larger than 2.1GB (in our case its QCOW2 , RAW , or VMDK image files for backup/recovery)

@DavidASeibert
Copy link
Contributor

DavidASeibert commented Jun 29, 2018

Yes, my initial thoughts were a bit off. Coming back here from other code...

-1 does not imply EOF for available(), but 0 does according to the javadoc. Probably the best estimate the code can give is the following:

bytesLeftInBuffer() if that is > 0, or
0 if _isEof, or
_bytesInBuffer otherwise.

That will give either the remaining bytes that can be read without going to the NFS file, or the best estimate of what the next NFS read will give if the buffer is currently empty and there is still data left to read. That will also give good processing results, as the buffer size is typically set to the optimum NFS read size for that system. That will also ensure that it never returns 0 is there is possibly data to be read. Does that make sense to you?

@davydotcom
Copy link
Contributor Author

davydotcom commented Jun 29, 2018 via email

@DavidASeibert
Copy link
Contributor

DavidASeibert commented Jun 29, 2018

The javadoc says that 0 means you are at the end of the stream. Yes, there may be extensions for subclasses that have special values for -1, but that doesn't really make sense for anything other than special devices, and we're trying to stay as plain vanilla as possible.

https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()

Ignoring -1 as a special value, does that logic make sense?

@davydotcom
Copy link
Contributor Author

davydotcom commented Jun 29, 2018 via email

@DavidASeibert DavidASeibert self-assigned this Jul 26, 2018
Copy link
Contributor

@DavidASeibert DavidASeibert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, although the solution will likely need tweaking before the next release.

@DavidASeibert DavidASeibert merged commit ec62b41 into EMCECS:master Jul 26, 2018
@twincitiesguy
Copy link
Contributor

I'm leaving this comment here, even though this PR has already been merged, because we are trying to get a release out that includes a debug fix (#42). Seems to me, based on the method contract, available() should just return bytesLeftInBuffer(). The intent is to reveal how many bytes can be read without any (blocking) network IO, and I think bytesLeftInBuffer() should provide this accurately, unless I'm missing something. FYI, we may introduce this change prior to the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants