Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3fs-nio trying to put back an object downloaded from the bucket #715

Open
stefanofornari opened this issue May 31, 2023 · 11 comments
Open
Labels
bug Something isn't working help wanted Extra attention is needed needs triage The issue needs to be triaged, before work can commence

Comments

@stefanofornari
Copy link

stefanofornari commented May 31, 2023

Bug Description

s3fs-nio trying to put back an object downloaded from the bucket as per stack trace:

at org.carlspring.cloud.storage.s3fs.S3SeekableByteChannel.sync(S3SeekableByteChannel.java:182)
at org.carlspring.cloud.storage.s3fs.S3SeekableByteChannel.close(S3SeekableByteChannel.java:146)
at java.base/sun.nio.ch.ChannelInputStream.close(ChannelInputStream.java:279)

Steps To Reproduce

FileSystem fs = FileSystems.newFileSystem(URI.create(file), new HashMap<>());
Path p = fs.getPath(new URI(file).getPath());
Files.readAllBytes(p);

Expected Behavior

The object is downloaded without sync back its content

Environment

  • s3fs-nio version: 1.0.0
  • OS: not relevant
  • JDK: not relevant
@stefanofornari stefanofornari added bug Something isn't working needs triage The issue needs to be triaged, before work can commence labels May 31, 2023
@carlspring
Copy link
Owner

carlspring commented May 31, 2023

@stefanofornari ,

Thanks for reporting and for providing sample code to reproduce it!

How large is the file?

Also, just last night we released version 1.0.1. Would you be able to try it with that version as well?

If you could knock up a simple test case as well, that'll help figure out things quicker.

@stefanofornari
Copy link
Author

size of the flle: 262149 bytes

It happens with 1.0.1 as well. it has to do with a temporary file in S3SeekableByteChannel:

public void close()
            throws IOException
    {
        try
        {
            if (!seekable.isOpen())
            {
                return;
            }

            seekable.close();

            if (options.contains(StandardOpenOption.DELETE_ON_CLOSE))
            {
                path.getFileSystem().provider().delete(path);

                return;
            }

            if (options.contains(StandardOpenOption.READ) && options.size() == 1)
            {
                return;
            }

            if(this.tempFile != null)
            {
                sync();
            }
        }
        finally
        {
            if(tempFile != null)
            {
                Files.deleteIfExists(tempFile);
            }
        }
    }

@carlspring carlspring added the help wanted Extra attention is needed label May 31, 2023
@steve-todorov
Copy link
Collaborator

Hey @stefanofornari,

Thanks for reporting this. Are you using S3 or StoreJ ?

@stefanofornari
Copy link
Author

Hey, I am using StorJ, which I think uses MinIO

@steve-todorov
Copy link
Collaborator

steve-todorov commented Jun 4, 2023

I'm not entirely convinced this is a bug in s3fs-nio, because we have a test case that covers this exact behavior:

void shouldReplaceExistingFileExample1()
throws IOException
{
Path file = Files.createTempFile("file-it-srefe1-1-", "file");
Path dw1 = Files.createTempFile("file-it-srefe1-dw-1-", "file");
Path dw2 = Files.createTempFile("file-it-srefe1-dw-2-", "file");
Path dw3 = Files.createTempFile("file-it-srefe1-dw-3-", "file");
Files.write(file, "first".getBytes(), StandardOpenOption.APPEND);
String filename = randomUUID().toString();
String key = getTestBasePath() + "/" + filename;
Path s3file = fileSystemAmazon.getPath(bucket, key);
String first = "first-write";
Files.write(s3file, first.getBytes());
assertThat(Files.readAllBytes(s3file)).isEqualTo(first.getBytes());
Files.copy(s3file, dw1, StandardCopyOption.REPLACE_EXISTING);
assertThat(dw1).hasBinaryContent(first.getBytes());
String second = "second-write";
Files.write(s3file, second.getBytes());
assertThat(Files.readAllBytes(s3file)).isEqualTo(second.getBytes());
Files.copy(s3file, dw2, StandardCopyOption.REPLACE_EXISTING);
assertThat(dw2).hasBinaryContent(second.getBytes());
String third = "third-write";
Files.write(s3file, third.getBytes());
assertThat(Files.readAllBytes(s3file)).isEqualTo(third.getBytes());
Files.copy(s3file, dw3, StandardCopyOption.REPLACE_EXISTING);
assertThat(dw3).hasBinaryContent(third.getBytes());
}
@Test
void shouldReplaceExistingFileExample2()
throws IOException
{
Path file = Files.createTempFile("file-it-srefe2-1-", "file");
Path dw1 = Files.createTempFile("file-it-srefe2-dw-1-", "file");
Path dw2 = Files.createTempFile("file-it-srefe2-dw-2-", "file");
Path dw3 = Files.createTempFile("file-it-srefe2-dw-3-", "file");
Files.write(file, "first".getBytes(), StandardOpenOption.APPEND);
String filename = randomUUID().toString();
String key = getTestBasePath() + "/" + filename;
Path s3file = fileSystemAmazon.getPath(bucket, key);
Files.createFile(s3file);
String first = "first-write";
try(OutputStream out = Files.newOutputStream(s3file, StandardOpenOption.TRUNCATE_EXISTING)) {
out.write(first.getBytes());
}
assertThat(Files.readAllBytes(s3file)).isEqualTo(first.getBytes());
Files.copy(s3file, dw1, StandardCopyOption.REPLACE_EXISTING);
assertThat(dw1).hasBinaryContent(first.getBytes());
String second = "second-write";
try(OutputStream out = Files.newOutputStream(s3file, StandardOpenOption.TRUNCATE_EXISTING)) {
out.write(second.getBytes());
}
assertThat(Files.readAllBytes(s3file)).isEqualTo(second.getBytes());
Files.copy(s3file, dw2, StandardCopyOption.REPLACE_EXISTING);
assertThat(dw2).hasBinaryContent(second.getBytes());
String third = "third-write";
try(OutputStream out = Files.newOutputStream(s3file, StandardOpenOption.TRUNCATE_EXISTING)) {
out.write(third.getBytes());
}
assertThat(Files.readAllBytes(s3file)).isEqualTo(third.getBytes());
Files.copy(s3file, dw3, StandardCopyOption.REPLACE_EXISTING);
assertThat(dw3).hasBinaryContent(third.getBytes());
}

And it's working fine in S3. Maybe it's something specific to StoreJ?

If you could create a reproducible test case with S3 that would be helpful.

@stefanofornari
Copy link
Author

Hi Steve,
I am not sure the code above represents the same use case. It actually writes a file, while in he code I have provided, I read a file. Still I see that sync() call that pushes the file I have read back to the server. I can provide full log if needed.

PS: generally speaking, if you want s3fs-nio to be used with any S3 compliant service, even specific behaviour of a service should be managed by the library itself (but as I mentioned, StorJ uses MinIO). Additionally, as I pointed out in #715 (comment), this behaviour is due to library code; StorJ may trigger a corner case, but I would not be so quick in claiming it is not a bug.

@steve-todorov
Copy link
Collaborator

Maybe you've missed this part of the test:

     String first = "first-write"; 
     Files.write(s3file, first.getBytes()); 
     assertThat(Files.readAllBytes(s3file)).isEqualTo(first.getBytes()); 
                       ^^^^

In essence the test simulates creating a file at s3, checks if the upstream file contains the expected bytes and then replaces it a few times.

We aim to be AWS S3 compliant. However this does not mean "any-cloud-solution-claiming-to-be-s3-complaint". MinIO also has it's own deviations in their implementation. For the better part it's compliant, but it's not 100% a 1 to 1 comparison. :)

If you could try it out with S3 and maybe provide a test case in a repo that reproduces the problem -- that would be much appreciated.

@carlspring
Copy link
Owner

Ciao, @stefanofornari ! :)

StorJ may trigger a corner case, but I would not be so quick in claiming it is not a bug.

We are in no rush to dismiss this issue. Optimally, we would like for s3fs-nio to be able to handle different implementations of S3. However, it's currently based on the AWS Java library, so if it's not supported out of the box by it, somebody would have to invest some time into investigating what the differences between MinIO, StorJ and/or other implementations could be and then provide a fix for it, or, at least, clearly outline what will need to be done. This is why we're trying to understand the problem.

Contributions are very welcome, as we are a small team.

@stefanofornari
Copy link
Author

stefanofornari commented Jun 6, 2023

Maybe you've missed this part of the test:

     String first = "first-write"; 
     Files.write(s3file, first.getBytes()); 
     assertThat(Files.readAllBytes(s3file)).isEqualTo(first.getBytes()); 
                       ^^^^

I think I did not make myself understood, sorry. All tests pass with StorJ. What I mean is that after a file is read with Files.readAllBytes(p); (and the download works well), the same blob is uploaded back to the bucket when the channel is closed. This upload is highly undesirable because it consumes bandwidth, it takes time and most importantly, it touches the file on the bucket. This is because, as explained in #715 (comment), when closing the channel sync() is called, which makes the file uploaded. This looks to me a bug in the library. To see it, you need to raise the level of the logs produced by the AWS SDK.
Let me know if you want the logs of the transaction, but it should be fairly easy to reproduce with the code in the original description.
Let me know if I can be of any help.

@steve-todorov
Copy link
Collaborator

Thanks for further elaborating! I don't have the time to look into this right now, but it does sound like it needs to be checked out.

@carlspring
Copy link
Owner

@stefanofornari ,

Would you be interested further investigating it and attempting to fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed needs triage The issue needs to be triaged, before work can commence
Projects
None yet
Development

No branches or pull requests

3 participants