Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins.readFile: do not truncate content #3546

Merged
merged 3 commits into from May 6, 2020

Conversation

@guibou
Copy link
Contributor

guibou commented Apr 29, 2020

This closes #3026 by allowing builtins.readFile to read a file with a
wrongly reported file size, for example, files in /proc may report a
file size of 0. Reading file in /proc is not a good enough motivation,
however I do think it just makes nix more robust by allowing more file
to be read. Especially, I do considerer the previous behavior to be
dangerous because nix was previously reading truncated files. Examples
of file system which incorrectly report file size may be network file
system or dynamic file system (for performance reason, a dynamic file
system such as FUSE may generate the content of the file on demand).

nix-repl> builtins.readFile "/proc/version"
""

With this commit:

nix-repl> builtins.readFile "/proc/version"
"Linux version 5.6.7 (nixbld@localhost) (gcc version 9.3.0 (GCC)) #1-NixOS SMP Thu Apr 23 08:38:27 UTC 2020\n"

Here is a summary of the behavior changes:

  • If the reported size is smaller, previous implementation
    was silently returning a truncated file content. The new implementation
    is returning the correct file content.

  • If a file had a bigger reported file size, previous implementation was
    failing with an exception, but the new implementation is returning the
    correct file content. This change of behavior is coherent with this pull
    request.

Open questions

  • The behavior is unchanged for correctly reported file size, however
    performances may vary because it uses the more complex sink interface.
    Considering that sink is used a lot, I don't think this impacts the
    performance a lot.
  • builtins.readFile on an infinite file, such as /dev/random may
    fill the memory.
  • it does not support adding file to store, such as ${/proc/version}.
This closes #3026 by allowing `builtins.readFile` to read a file with a
wrongly reported file size, for example, files in `/proc` may report a
file size of 0. Reading file in `/proc` is not a good enough motivation,
however I do think it just makes nix more robust by allowing more file
to be read.  Especially, I do considerer the previous behavior to be
dangerous because nix was previously reading truncated files. Examples
of file system which incorrectly report file size may be network file
system or dynamic file system (for performance reason, a dynamic file
system such as FUSE may generate the content of the file on demand).

```
nix-repl> builtins.readFile "/proc/version"
""
```

With this commit:

```
nix-repl> builtins.readFile "/proc/version"
"Linux version 5.6.7 (nixbld@localhost) (gcc version 9.3.0 (GCC)) #1-NixOS SMP Thu Apr 23 08:38:27 UTC 2020\n"
```

Here is a summary of the behavior changes:

- If the reported size is smaller, previous implementation
was silently returning a truncated file content. The new implementation
is returning the correct file content.

- If a file had a bigger reported file size, previous implementation was
failing with an exception, but the new implementation is returning the
correct file content. This change of behavior is coherent with this pull
request.

Open questions

- The behavior is unchanged for correctly reported file size, however
performances may vary because it uses the more complex sink interface.
Considering that sink is used a lot, I don't think this impacts the
performance a lot.
- `builtins.readFile` on an infinite file, such as `/dev/random` may
fill the memory.
- it does not support adding file to store, such as `${/proc/version}`.
@edolstra
Copy link
Member

edolstra commented Apr 29, 2020

Note that this makes the drain parameter to readFile() rather pointless, since both cases do the same thing now:

return drain ? drainFD(fd.get()) : readFile(fd.get());

where readFile(fd) == return drainFD(fd, true);.

There does appear to be a small but measurable speed penalty btw, e.g. running nix-env -qa:

elapsed time:       median =      4.7344  mean =      4.7326  stddev =      0.0114  min =      4.7089  max =      4.7584  [rejected, p=0.00000, Δ=0.01442±0.00663]

Don't know if it's because of the increased number of system calls or having to reallocate the std::string contents.

@guibou
Copy link
Contributor Author

guibou commented Apr 29, 2020

@edolstra how do you run the performance benchmark? I'll try to see if the std::string size can be reserved using the st_size field.

I'll remove the drain parameter to readFile, thank for seeing that.

guibou added 2 commits Apr 29, 2020
Now it is always `drain` (see previous commit).
When used with `readFile`, we have a pretty good heuristic of the file
size, so `reserve` this in the `string`. This will save some allocation
/ copy when the string is growing.
@guibou
Copy link
Contributor Author

guibou commented Apr 29, 2020

@edolstra drain argument removed, and now the string size is correctly reserved. I hope it fixs the performance regression.

@edolstra edolstra merged commit 74a1bfd into NixOS:master May 6, 2020
2 checks passed
2 checks passed
tests (ubuntu-18.04)
Details
tests (macos)
Details
@guibou guibou deleted the guibou:nix_readfile_on_0_sized_files branch May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

2 participants
You can’t perform that action at this time.