-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The BloscCompressor used InputStream.available() to determine if byte… #13
The BloscCompressor used InputStream.available() to determine if byte… #13
Conversation
…s were available. InputStream.available() is not guaranteed to be accurate. Some implementations rely on the implementation in InputStream, which always returns 0. An EOFException was thrown in this case. The BlockCompressor was updated to just attempt to read from the stream and see if bytes were returned.
…ng. Some InputStream implementations may mot always return the requested number of bytes from read(). This is the case when reading from an S3 bucket, where the stream is chunked.
Dear Chris Slater That is a good hint and valuable information. It is only the place where The streams are created in the respective implementation of the For example I like the following 3 ideas to solve the problem:
Since I personally prefer ObjectComposition to inheritance, I would prefer idea 3. For info: What do you think? If you want, you can do that, I'm also available to support you, or I can take your pull What would you prefer? With kind regards |
Hi Sabine, I ran into another issue when using an S3 based store where calls to read() in the compressor did not return the requested number of bytes. This was caused by the Apache HttpClient InputStream that the AWS InputStreams wrap, returning chunked data. This only happened when the size of the object being retrieved was above some threshold. The documentation for InputStream states that this is valid behavior. When this happened, the output of the compressor was filled with zeros, which was caused because cbufferSizes() would return zero values on repeated passes though the loop that was in the uncompress method. The header data that was read multiple times in that loop was filled with payload data. I can take a look at implementing option 3 and see if both problems can be solved with this approach. Even with that, I don't think reading a header in a loop is needed in BloscCompressor.uncompress(). Cheers, Chris |
Dear Chris, you can try the new S3_AWS branch. Please also take a look into the class Let me know, if this works as expected, so I will merge the branch to the master branch. Cheers, |
Dear Chris, if you like, you can also use the email address from my profile for communication that is not tied to a specific single issue. Or maybe even better, so that others can read our conversations, in the discussion platform associated with this project. Have a nice weekend! |
The BloscCompressor used InputStream.available() to determine if bytes were available. InputStream.available() is not guaranteed to be accurate. Some implementations rely on the implementation provided by InputStream itself, which always returns 0. An EOFException was thrown in this case. One example of this is software.amazon.awssdk.services.s3.checksums.ChecksumValidatingInputStream. The BlockCompressor was updated to just attempt to read from the stream and see if bytes were returned.