Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(runtime/fs): optimize readFile by using a single large buffer #12057

Merged
merged 6 commits into from
Sep 16, 2021

Conversation

AaronO
Copy link
Contributor

@AaronO AaronO commented Sep 13, 2021

This avoids allocating N buffers when reading entire files and copying when concatenating them.

Benchmarks

# Before
❯ deno run -A ./cli/bench/deno_common.js
read_128k_sync:      	n = 50000, dt = 1.941s, r = 25760/s, t = 38820ns/op
read_128k:           	n = 50000, dt = 9.613s, r = 5201/s, t = 192259ns/op

# After
❯ ./target/release/deno run -A ./cli/bench/deno_common.js
read_128k_sync:      	n = 50000, dt = 1.746s, r = 28637/s, t = 34920ns/op
read_128k:           	n = 50000, dt = 8.865s, r = 5640/s, t = 177300ns/op

Notes

  • The improvements to read_128k_sync are somewhat reduced here by the extra cwd lookups caused by the stat, so perf(runtime): cache cwd lookups #12056 will work hand in hand with this change to improve file reads
  • This operates under the assumption that files won't change (truncated or extended) whilst being read, which isn't guaranteed but seems fair IMO
  • Also simplifies implementation of readTextFile / readTextFileSync

runtime/js/12_io.js Outdated Show resolved Hide resolved
Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in favor of this. File reads area not atomic. We can't start optimizing at the cost of correctness.

@AaronO
Copy link
Contributor Author

AaronO commented Sep 13, 2021

Not in favor of this. File reads area not atomic. We can't start optimizing at the cost of correctness.

My point is more that there's no sane use-case to write & read-all at the same time when you don't control the rate of reading and it's uncommon and IMO would be mostly fine as is.

But as I mentioned we can make this more robust:

Files being truncated should already be handled, files being extended could be handled, either by allocating an extra byte in the buffer and seeing if that is set or by doing an extra read.

essentially the current implementation would be a fast path for an unchanged file and then it would fallback to the slow path if the file was extended, so that wouldn't hurt or change correctness

@lucacasonato
Copy link
Member

With fallback, SGTM

@piscisaureus
Copy link
Member

Allocate an extra byte in our read buffer to detect "overflow" then fallback to unsized readAll for remainder of extended file, this is a slowpath that should rarely happen in practice
@AaronO
Copy link
Contributor Author

AaronO commented Sep 16, 2021

I still think that it would be sane/fair to not read beyond the stat'd size, I can't imagine a sane use-case for it but I implemented the slowpath fallback anyway.

@bartlomieju @lucacasonato That should address your concerns

if (cursor > size) {
// Read remaining and concat
return concatBuffers([buf, readAllSync(r)]);
} else { // cursor == size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else { // cursor == size
} else { // cursor <= size

Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants