Cache S3FileAttributes in S3Path instances to dramatically reduce requests to S3 #47

heikkipora · 2015-08-13T11:45:20Z

Given this code snippet running with S3-FileSystem:

List<Path> files = Files.walk(folder, 1).filter(Files::isRegularFile).collect(Collectors.toList());

It makes four (!) S3 HTTP requests per each directory entry (files and immediate subfolders).

With this pull request, the number of S3 HTTP requests is reduced to 1 + N x 2, where N is the number of immediate subfolders.

For a folder containing 10 000 files but no subfolders, the change is 40 000 -> 1 ⚡

… file

pditommaso · 2015-08-13T12:42:33Z

Actually I've sent months ago a similar pull request. See #34. Hope that both of them will be merged.

heikkipora · 2015-08-13T12:53:21Z

Oh, I didn’t even spot that one - let’s hope there’s now enough pressure get them merged!

:: Heikki

On 13 Aug 2015, at 15:42, Paolo Di Tommaso notifications@github.com wrote:

Actually I've sent months ago a similar pull request. See #34. Hope that both of them will be merged.

—
Reply to this email directly or view it on GitHub.

jarnaiz · 2015-08-13T13:55:08Z

Thanks @heikkipora and @pditommaso I promise to check both PRs during the weekend :)

heikkipora · 2015-08-13T17:15:45Z

THAT sounds good! I wasn't yet able to make all the mock'ed tests pass for my part. But it seemed to actually work for me :)

jarnaiz · 2015-08-14T17:19:36Z

Ok, I do a first check of the code and looks great, but i have one question:
When the cache are refreshed? If i do a manual change (using the web interface) in Amazon S3. Im going to be in some kind of inconsistency, no?
A example:

Path s3Path = FileSystem.getPath("s3://path/to/file");
Files.readAttributes(s3Path); // now we have a s3Path with the attributes cache.
// do something in the amazon s3 web interface or in the command line
Files.readAttributes(s3Path); // wrong result no?

If you store a s3path in some kind of session and after a while you try to do a Files.exists(s3Path) you are going to get a false positive. (if some other process delete the file before the call)

What do you think? I am not sure if It is a real problem...

pditommaso · 2015-08-14T17:27:53Z

Yes, I agree on this. A good strategy it to allow the cached attributed to be used only for the following next readAttributes operation and invalidate them. You can see this in my implementation.

https://github.com/Upplication/Amazon-S3-FileSystem-NIO2/pull/34/files#diff-c97f0d097f57a3d98291ce62c0b9d525R482

heikkipora · 2015-08-14T17:39:26Z

You are absolutely right, the attributes will be cached for the lifetime of an S3Path instance - which was my purpose to begin with :) Could be a problem for some uses, true.

jarnaiz · 2015-08-16T20:24:50Z

Thanks for the feedback, I think the @pditommaso strategy its a good point.

Im working with some kind of time expiration cache. If the cache time expire then the readAttributes regenerate the S3FilesAttributes.
If you are working only with this API then you can set the cache to -1 and if you want to mix, you can set to to some time. By default is 1 minute.

I am working on this but, what do you think?

pditommaso · 2015-08-16T20:41:32Z

The time based expiration is alternative to the "expire by read" strategy that I was suggesting, or they are supposed to work together?

I think a good strategy could be: the cached attributes expire after the first access with the readAttributes method and, in any case, after 1 minute.

Also the documentation should stress the fact that to avoid access useless network roundtrips, the user should cache the file attributes when possible.

jarnaiz · 2015-08-17T09:16:51Z

sounds good to me, thx @pditommaso :)
I have some problems with the filewalker test and i am working on it.

heikkipora · 2015-08-17T18:38:25Z

My commits did actually break a few places, pushed a commit to fix that :) Tests are 100% green for me now.

jarnaiz · 2015-08-18T17:04:46Z

Already integrated in the developer branch, thanks for the commits 👍 I used the The time based expiration and expired by use as @pditommaso suggested.

jarnaiz · 2015-08-18T17:22:51Z

I am going to publish a new 1.0.3 version. @heikkipora do you want to be added to the developers list in the pom.xml?. If yes plz tell me your email if you want ;)

heikkipora · 2015-08-18T18:19:07Z

Excellent! My email is heikki.pora at gmail dot com

jarnaiz · 2015-08-18T19:22:19Z

thanks!

heikkipora added 3 commits August 13, 2015 12:37

Extract toS3FileAttributes() in S3Utils

a79476a

Include S3FileAttributes in S3Path for caching

0ec6935

Cache S3FileAttributes in S3Path instances to avoid fetching them per…

6c3bbde

… file

Using wrong key in toS3FileAttributes, fixes broken tests as well

43c017f

jarnaiz added a commit that referenced this pull request Aug 18, 2015

integrated PR #47 and #34

e40295b

jarnaiz mentioned this pull request Aug 18, 2015

Improved listing of buckets with large number of items #34

Closed

jarnaiz closed this Aug 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache S3FileAttributes in S3Path instances to dramatically reduce requests to S3 #47

Cache S3FileAttributes in S3Path instances to dramatically reduce requests to S3 #47

heikkipora commented Aug 13, 2015

pditommaso commented Aug 13, 2015

heikkipora commented Aug 13, 2015

jarnaiz commented Aug 13, 2015

heikkipora commented Aug 13, 2015

jarnaiz commented Aug 14, 2015

pditommaso commented Aug 14, 2015

heikkipora commented Aug 14, 2015

jarnaiz commented Aug 16, 2015

pditommaso commented Aug 16, 2015

jarnaiz commented Aug 17, 2015

heikkipora commented Aug 17, 2015

jarnaiz commented Aug 18, 2015

jarnaiz commented Aug 18, 2015

heikkipora commented Aug 18, 2015

jarnaiz commented Aug 18, 2015

Cache S3FileAttributes in S3Path instances to dramatically reduce requests to S3 #47

Cache S3FileAttributes in S3Path instances to dramatically reduce requests to S3 #47

Conversation

heikkipora commented Aug 13, 2015

pditommaso commented Aug 13, 2015

heikkipora commented Aug 13, 2015

jarnaiz commented Aug 13, 2015

heikkipora commented Aug 13, 2015

jarnaiz commented Aug 14, 2015

pditommaso commented Aug 14, 2015

heikkipora commented Aug 14, 2015

jarnaiz commented Aug 16, 2015

pditommaso commented Aug 16, 2015

jarnaiz commented Aug 17, 2015

heikkipora commented Aug 17, 2015

jarnaiz commented Aug 18, 2015

jarnaiz commented Aug 18, 2015

heikkipora commented Aug 18, 2015

jarnaiz commented Aug 18, 2015