Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read file type #7447

Merged
merged 2 commits into from Jan 23, 2023
Merged

Read file type #7447

merged 2 commits into from Jan 23, 2023

Conversation

aakropotkin
Copy link
Contributor

Adds a new primop readFileType, allowing readDir to create thunks for unknown types when reading large directories.

@aakropotkin
Copy link
Contributor Author

@Ericson2314 @infinisil

@Ericson2314
Copy link
Member

CC @roberth, it was his idea in #7314

@aakropotkin
Copy link
Contributor Author

As a note to @roberth it looks like on Linux readents is already used.

With that in mind the changes to readDir largely effect Darwin systems.

Benchmarking is TODO.

src/libexpr/primops.cc Show resolved Hide resolved
src/libexpr/primops.cc Outdated Show resolved Hide resolved
src/libexpr/primops.cc Outdated Show resolved Hide resolved
@edolstra
Copy link
Member

This should probably be changed into a stat primop that also returns file size and whether a regular file is executable.

src/libexpr/primops.cc Outdated Show resolved Hide resolved
@roberth
Copy link
Member

roberth commented Dec 12, 2022

This should probably be changed into a stat primop that also returns file size and whether a regular file is executable.

I agree, except stat is just a terrible awful non-name.

How about pathInfo?

@aakropotkin
Copy link
Contributor Author

This should probably be changed into a stat primop that also returns file size and whether a regular file is executable.

I agree, except stat is just a terrible awful non-name.

How about pathInfo?

stat should return the full set of stat info, most importantly permissions. I see that as a useful built-in but separate from our goal here.

@Ericson2314
Copy link
Member

@aakropotkin I think the advantage of this "reduced stat" is that it returns just the attributes we care about in the store's model of the file system.

@roberth
Copy link
Member

roberth commented Dec 12, 2022

I think pathType and the performance improvement combine really well into a single small change that does everything we need it to.

Changing it to include any extra information would be a more significant change to the language, that would be less coherent with the readDir builtin. If we choose to implement a reduced stat, we should "change" readDir to reflect this change.

# completing the current interface
pathType f
  = (readDir (dirOf f)).${baseNameOf f}
  = "file" or ...

# a possible future interface
pathInfo f
  = (directoryNodes (dirOf f)).${baseNameOf f}
  = { type = "file"; executable; size; }
 or { type = "directory"; }
 or ...

I don't think we have a feature request for the latter (except an outlier optimization bug that @aakropotkin subtly referenced here, that we should probably fix in some other way) whereas we do for pathType.

I suggest we complete the current interface, which does not interfere with the possible future interface, thereby achieving the goals in the simplest way.

This way we avoid having to answer questions that we really don't need to answer, such as whether the extra attrsets impact performance, which attributes should even be in there, etc. (+ discussions, extra review rounds that we can better spend on the larger issues)

There is a time for pathInfo, but I don't think it should be a priority now. pathType otoh is low hanging fruit. That will unblock the Nixpkgs Architecture Team and solve a validated use case.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-12-12-nixpkgs-architecture-team-meeting-21/23967/1

@aakropotkin
Copy link
Contributor Author

I'm on board with @roberth aside from the function name. I'd personally have a lot of use cases for pathInfo but don't want to weigh down the performance improvements we're chasing in readDir by smuggling those features into this PR.

I'll take another pass at the notes we have here and update the PR.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-12-19-nixpkgs-architecture-team-meeting-22/24125/1

@aakropotkin aakropotkin marked this pull request as ready for review December 20, 2022 21:59
@thufschmitt thufschmitt added the feature Feature request or proposal label Jan 2, 2023
@roberth
Copy link
Member

roberth commented Jan 2, 2023

Field testing a PR checklist (#7538), because I messed up. Apologies for not doing this sooner. The Nix team is improving.

  • is the idea good? has it been discussed by the Nix team?
  • unit tests
    - I don't think a unit test would add value here, as the logic is quite simple and heavily reliant on OS interfaces. Mocking this would not be an effective use of our time and code complexity budget.
  • functional tests (tests/**.sh)
    - Please add a test to invoke the new primop
  • documentation in the manual
    - Done through generated builtins.md
  • documentation in the code (if necessary; ideally code is already clear)
    - See review comment
  • documentation in the commit message (why was this change made? for future reference when maintaining the code)
    - Also please squash
  • documentation in the changelog (to announce features and fixes to existing users who might have to do something to finally solve their problem, and to summarize the development history)
    - Please document both the new builtin and the performance improvement, separately

@fricklerhandwerk
Copy link
Contributor

Discussed in Nix team meeting 2022-12-23:

  • original goal of improving performance is good
  • readFileType is nice side catch; no need make a design change to the file type / directory primops API yet
  • assigned to @roberth for merging

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-12-23-nix-team-meeting-minutes-19/24400/1

@fricklerhandwerk
Copy link
Contributor

Discussed in Nix team meeting 2023-01-02:

  • @fricklerhandwerk: release notes are missing, this is a significant feature for improving performance in some use cases
  • @thufschmitt: this changes the language API, we currently don't have a process to handle that - we can't change that after the fact.
    • would be good if we could make such added features experimental for some time to make sure to get proper feedback if this was the right design idea
      • it may turn out someone will come around and argue that a stat syscall would be better
    • @edolstra: historically we didn't have this because for builtins so far they were obviously useful, typically for performance reasons
      • in this specific case it's tricky to make it experimental because it's used unconditionally by readDir

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-01-02-nix-team-meeting-minutes-20/24403/1

@Ericson2314
Copy link
Member

in this specific case it's tricky to make it experimental because it's used unconditionally by readDir

Not saying we should do that (I think it is probably fine to just stabilize this) but I think we could make an "anonymous primop" that doesn't require a stable feature but also can't be accessed by user code.

@roberth
Copy link
Member

roberth commented Jan 3, 2023

We definitely could, but we don't need to go there because we have a valid use case #3096 and we've discussed the design, deciding not to over-engineer and stick to the valid use case, as it is sufficient and coherent with the existing readDir interface. Reverting the new interface would be wasteful.

@Ericson2314
Copy link
Member

I agree. I just wanted to point that a middle ground does exist if others insisted -- not advocate for it!

@aameen-tulip
Copy link
Contributor

aameen-tulip commented Jan 16, 2023

@roberth I made the changes you requested. I pushed an empty commit with the message to be used on the squashed form of this PR.
961f349

@aakropotkin aakropotkin force-pushed the read-file-type branch 2 times, most recently from a06cad3 to 8b5d3f6 Compare January 22, 2023 19:40
Allows checking directory entry type of a single file/directory.

This was added to optimize the use of `builtins.readDir` on some
filesystems and operating systems which cannot detect this information
using POSIX's `readdir`.

Previously `builtins.readDir` would eagerly use system calls to lookup
these filetypes using other interfaces; this change makes these
operations lazy in the attribute values for each file with application
of `builtins.readFileType`.
@aameen-tulip
Copy link
Contributor

Merged 2.13.0, squashed, and ready to merge.

doc/manual/src/release-notes/rl-next.md Outdated Show resolved Hide resolved
doc/manual/src/release-notes/rl-next.md Outdated Show resolved Hide resolved
@roberth roberth merged commit 9b56683 into NixOS:master Jan 23, 2023
@infinisil
Copy link
Member

readFileType is nice side catch; no need make a design change to the file type / directory primops API yet

Why was this renamed from readPathType/pathType? There's no arguments written down in any comments or meeting notes.

I think readFileType is inaccurate and misleading, because it also works on directories, not just files. I know that @fricklerhandwerk argued before that in Unix, directories are also technically files, but I don't think most people think of it that way.

We now have this situation:

  • builtins.readDir exists, requires a directory, not a file
  • builtins.readFile exists, requires a file, not a directory
  • builtins.readFileType exists, but it works on files and directories

Also there's already a lib.pathType in Nixpkgs that works on both files and directories. Related is NixOS/nixpkgs#224834

I think this should really have been builtins.pathType or builtins.readPathType, I prefer the latter because of @aakropotkin's argument here.

@aakropotkin
Copy link
Contributor Author

readFileType is nice side catch; no need make a design change to the file type / directory primops API yet

Why was this renamed from readPathType/pathType? There's no arguments written down in any comments or meeting notes.

I think readFileType is inaccurate and misleading, because it also works on directories, not just files. I know that @fricklerhandwerk argued before that in Unix, directories are also technically files, but I don't think most people think of it that way.

We now have this situation:

* `builtins.readDir` exists, requires a directory, not a file

* `builtins.readFile` exists, requires a file, not a directory

* `builtins.readFileType` exists, but it works on files and directories

Also there's already a lib.pathType in Nixpkgs that works on both files and directories. Related is NixOS/nixpkgs#224834

I think this should really have been builtins.pathType or builtins.readPathType, I prefer the latter because of @aakropotkin's argument here.

I think I was intentionally trying to avoid confusing "path type" ( the Nix construct ) with "filesystem path". I felt that builtins.isPath ./foo and builtins.pathType ./foo could both be grokked as "a predicate that checks if ./foo is of Nix type "path". Similarly builtins.readPathType ./foo could be interpreted as "read the file ./foo which must be of Nix type "path" ( more of a stretch ).

In retrospect builtins.readPathType was probably a better name.

@roberth
Copy link
Member

roberth commented Apr 19, 2023

I wasn't super happy about the name, but we had already bikeshedded towards this and at least two people agreed on this name. I guess I have to be more annoying then.
Instead of worrying or breaking something, let's just establish that the builtins suck. They're our equivalent of the kernel syscall interface. Have you seen the number of stat calls?

So lib can choose its own sensible name; problem solved.

inaccurate and misleading

Good. I hope this is the final drop in the bucket and we actively start avoiding direct use of builtins.
Slightly trolling, but I do mean it.

@infinisil
Copy link
Member

Could we backport this to the 2.13 release since that will be the default for NixOS 23.05 and because builtins.readFileType is the only way to get the path type of a Nix store path with pure evaluation?

@roberth
Copy link
Member

roberth commented May 24, 2023

the only way to get the path type of a Nix store path with pure evaluation?

Is it? I thought the readDir solution worked fine.
I suppose maybe not in cases where dirOf leaves the approved paths, but isn't that more of a restrict-eval thing anyway?

I do support a backport; I just don't understand the urgency.

@roberth roberth added the backport 2.13-maintenance Automatically creates a PR against the branch label May 24, 2023
@github-actions
Copy link

Successfully created backport PR for 2.13-maintenance:

@infinisil
Copy link
Member

@roberth The more specific problem I ran into was this:

{
  outputs = { nixpkgs, ... }: {
    test = nixpkgs.lib.pathType ./.;
  };
}
$ nix eval .#test
error: access to absolute path '/nix/store' is forbidden in pure eval mode (use '--impure' to override)
(use '--show-trace' to show detailed location information)

This is caused because the old pathType before this PR relied on readDir (dirOf path), which in this case, since flakes get copied to the store before evaluation, readDir (dirOf /nix/store/...) == readDir /nix/store, therefore trying to list the entire Nix store, obviously disallowed in pure evaluation mode.

This is not specific to flakes though, pure evaluation of any ./. inside a fetched path at the root would cause the same.

@infinisil
Copy link
Member

And even more concretely, pathType is being relied upon for file set combinators, which I would like to announce more widely soon, and this issue would cause problems for flake users wanting to try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.13-maintenance Automatically creates a PR against the branch feature Feature request or proposal
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

9 participants