Rework the way file handle reads are performed #103

jacobsa · 2015-08-04T04:54:01Z

I'm currently working on sequential read performance in regions of large objects not cached locally. The throughput isn't great, at least in part because the default chunk size of 16 MiB used by gcsfuse is not large enough to amortize out the overhead of starting a read within a large object. (Cf. Google-internal thread go/vkhor).

Making the chunk size bigger is not a great solution, because it further exacerbates the problem of vastly over-reading for small random reads. In the end the user is going to be reading randomly or sequentially. In the former case we want to read basically exactly what they ask for, and in the latter case we probably want to start a read from their offset to infinity, and just use it until they stop reading. This is also a lot simpler.

So let's throw out the complicated caching layer with leases, etc. Instead:

File handles maintain state in the form of the last read opened, the offset at which it is positioned, and how many bytes remain. If it can be reused, they reuse it.
The first time a read is made through a handle where there is no HTTP response in progress or the response is positioned in the wrong place, the file upgrades the read to 1-2 MiB (a sane size for GCS, compare to 128 KiB reads from Linux and 1 MiB reads from OS X).
If that read completes and the next read is at its end offset, the next request is for the read offset to infinity.

Two properties are interesting here:

GCS never gets a request smaller than 1-2 MiB for a larger object. (This is already true today.)
For a large sequential read, we make at most two requests. The first may be slowish, but the second will give us the full possible single-stream throughput.

If it turns out we need caching, we can later cache outside of this machinery by quantizing reads and caching what comes back.

For #103.

This performs the read upgrade logic discussed in #103. It will be used by a new FileHandle type.

Allowing the kernel to send multiple reads for the same file handle concurrently interferes with sequential read detection like that in GoogleCloudPlatform/gcsfuse#103.

This causes a temporary small regression in sequential read throughput, but should help significantly once the changes for #103 are in. % go install -v && gcsfuse --temp-dir /mnt/ssd0 jacobsa-standard-asia ~/mp % go build ./benchmarks/read_full_file && cputime ./read_full_file --dir ~/mp Before: Full-file read times: 50th ptile: 12.327339ms (5.07 GiB/s) 90th ptile: 12.962231ms (4.82 GiB/s) 98th ptile: 13.62993ms (4.59 GiB/s) After: Full-file read times: 50th ptile: 12.430931ms (5.03 GiB/s) 90th ptile: 13.076107ms (4.78 GiB/s) 98th ptile: 13.763207ms (4.54 GiB/s) % go install -v && gcsfuse --temp-dir /mnt/ssd0 jacobsa-standard-asia ~/mp % go build ./benchmarks/read_within_file && cputime ./read_within_file --file ~/mp/10g Before: Read 289.00 MiB in 10.140961856s (28.50 MiB/s) After: Read 256.00 MiB in 10.304326813s (24.84 MiB/s)

This performs the read upgrade logic discussed in #103. It will be used by a new FileHandle type.

jacobsa added a commit that referenced this issue Aug 4, 2015

Improved the test for reads within large files.

9ab3922

For #103.

jacobsa added a commit that referenced this issue Aug 4, 2015

Improved the test for reads within large files.

9cc3b00

For #103.

jacobsa added a commit that referenced this issue Aug 4, 2015

Added gcsx.RandomReader.

bbef60a

This performs the read upgrade logic discussed in #103. It will be used by a new FileHandle type.

jacobsa added a commit to jacobsa/fuse that referenced this issue Aug 5, 2015

Disabled async reads.

13f90d1

Allowing the kernel to send multiple reads for the same file handle concurrently interferes with sequential read detection like that in GoogleCloudPlatform/gcsfuse#103.

jacobsa added a commit to jacobsa/fuse that referenced this issue Aug 5, 2015

Disabled async reads.

348ed9e

Allowing the kernel to send multiple reads for the same file handle concurrently interferes with sequential read detection like that in GoogleCloudPlatform/gcsfuse#103.

jacobsa added a commit that referenced this issue Aug 5, 2015

Added gcsx.RandomReader.

f5e4d26

This performs the read upgrade logic discussed in #103. It will be used by a new FileHandle type.

jacobsa closed this as completed in babd421 Aug 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the way file handle reads are performed #103

Rework the way file handle reads are performed #103

jacobsa commented Aug 4, 2015

Rework the way file handle reads are performed #103

Rework the way file handle reads are performed #103

Comments

jacobsa commented Aug 4, 2015