Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeekableByteChannel that doesn't use a temp file #103

Open
sbeimin opened this issue Feb 28, 2018 · 4 comments
Open

SeekableByteChannel that doesn't use a temp file #103

sbeimin opened this issue Feb 28, 2018 · 4 comments

Comments

@sbeimin
Copy link

sbeimin commented Feb 28, 2018

The current implementation of S3SeekableByteChannel makes use of a temp file. This has a couple of drawbacks. One of the primary reasons to use a SeekableByteChannel is that you do not want to 'download' an entire (2GB) file if you are only interested in a specific range of bytes within the file.

Why do we have a separate S3SeekableByteChannel next to S3FileChannel? S3FileChannel extends java.nio.channels.FileChannel which in turn implements java.nio.channels.SeekableByteChannel.

Another problem is the use of Files.createTempFile(). This generally creates a file in /tmp (on linux e.g.) which can be seen as a security risk if the file contains sensitive data.

The com.amazonaws.services.s3.model.GetObjectRequest supports a private long[] range; which enables the retrieval of a limited range of bytes. Perhaps this could be used to create a better SeekableByteChannel implementation.

However I can't find any option to write a limited range of bytes to an S3 Object... Perhaps we'll have to wait and see that the 2.0 version of the AWS SDK offers us.

@magicDGS
Copy link

I think that an implementation such as the one on epam/htsjdk-s3-plugin S3SeekableStream is the way to go. It is similar as what I'm doing in my implementation of an HTTP/S filesystem....

@ryan-williams
Copy link

I learned about this issue the hard way on a 6GB file just now!

IMO throwing UnsupportedOperationException would be preferable to downloading the whole file (esp. to /tmp, per OP).

@tomwhite
Copy link

I did some work to fix this a while back (that I forgot to post at the time). Sorry you suffered from this too @ryan-williams! Here's the branch: https://github.com/tomwhite/Amazon-S3-FileSystem-NIO2/tree/read-seeks

It lacks unit tests, but I did test it manually.

@ryan-williams
Copy link

ryan-williams commented Jul 24, 2018

That's great @tomwhite, I got most of the way through my own implementation but yours is cleaner.

I patched a few more fixes (to #106 and #108) into it and released org.lasersonlab:s3fs:2.2.3 to Maven Central from this tag on the lasersonlab fork.

Lots of tests fail, as you mentioned, but it seems like a mix of mocks that I don't really care to unbreak, and ITs that aren't configured in a way I know how to run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants