Performance characteristics of absolute path vs relative path (and excessive HEAD requests) #500

apanloco · 2023-09-05T08:21:53Z

Mountpoint for Amazon S3 version

mount-s3 1.0.0

AWS Region

eu-central-1

Describe the running environment

Running in EC2, Debian 12, using instance profile credentials against an S3 Bucket in the same account

What happened?

Hello fellow coders!
I'm trying to understand the behaviour and performance characteristics of mountpoint-s3.

To demonstrate a potential issue I have a screenshot showing:

I read the first byte of a file in the bucket, using a relative path with file in cwd.
I hit enter in both windows so it's easy to see the logs generated for which command.
I read one byte of the same file but using an absolute path.

A few observations from the logs in the screenshot:
Using an absolute path is close to half the speed of using a relative path, i.e. current working directory. 175ms vs 243ms.
For relative path there is 7 connections established for 7 HTTP commands.
For absolute path there's 10 connections established for 11 HTTP commands.

I wonder about all the HEAD requests -- does this look correct? I have a feeling it's not optimal.
I also wonder what's the reason for the behaviour of traversing into the path like this, instead of just reading the file when you have the absolute path. The deeper the path, the longer the time to read the byte.

I'd appreciate any thoughts and input on this.

Relevant log output

The log file with --debug and --debug-crt for one of the commands was too long to attach. Any guidance of what parts are relevant, and I'll attach it.

passaro · 2023-09-05T13:57:40Z

Hi @apanloco, thanks for the detailed issue. The short answer is that the number of requests you are seeing is expected. More details below.

But first, a few suggestions when looking at the logs. In order to better understand why Mountpoint is making a request, you may want to expand your filter with e.g.:

“fuser”: this will show you the fuse requests Mountpoint received from the kernel, e.g. 

2023-09-05T09:25:11.726409Z DEBUG fuser::request: FUSE( 76) ino 0x0000000000000003 LOOKUP name “00”

“new request”: this marks the start of each s3 client request, decorated with the fuse command triggering it, e.g.

2023-09-05T09:39:55.933864Z DEBUG lookup{req=114 ino=1 name=“00”}:list_objects{id=73 bucket=“<redacted>” continued=false delimiter="/" max_keys="1" prefix=“00/“}: mountpoint_s3_client::s3_crt_client::list_objects: new request

To explain the requests Mountpoint makes to S3, consider that, when resolving a path, the kernel will ask Mountpoint to look up each of its components and return their inode number (ino) and associated stats.

For each lookup, Mountpoint will issue 2 simultaneous requests, HeadObject and ListObjects, to determine whether the name maps to a file or a sub-directory. This is required to implement directory shadowing (described here: https://github.com/awslabs/mountpoint-s3/blob/main/doc/SEMANTICS.md#directories). We are considering potential optimisations here: #12.

The result of lookups can be cached in the kernel for a certain period of time. The longer the expiration time, the fewer repeated Head and List requests are required, with the drawback of returning potentially stale information when the content of the S3 bucket change. In order to reduce consistency issues, Mountpoint only caches metadata for up to 1 second and invalidates it on certain operations, like open (see here: https://github.com/awslabs/mountpoint-s3/blob/main/doc/SEMANTICS.md#consistency-and-concurrency).

With this in mind, the difference in performance when using relative and absolute paths is not surprising, especially when only measuring the time to read a single byte from a file, while high throughput workflows will not be affected in the same way. That said, if you have a specific use case that is negatively impacted by the current behavior, we would be happy to hear about it.

apanloco added the bug Something isn't working label Sep 5, 2023

passaro added question Further information is requested and removed bug Something isn't working labels Sep 5, 2023

passaro closed this as completed Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance characteristics of absolute path vs relative path (and excessive HEAD requests) #500

Performance characteristics of absolute path vs relative path (and excessive HEAD requests) #500

apanloco commented Sep 5, 2023

passaro commented Sep 5, 2023

Performance characteristics of absolute path vs relative path (and excessive HEAD requests) #500

Performance characteristics of absolute path vs relative path (and excessive HEAD requests) #500

Comments

apanloco commented Sep 5, 2023

Mountpoint for Amazon S3 version

AWS Region

Describe the running environment

What happened?

Relevant log output

passaro commented Sep 5, 2023