feature | simple-package-paths |
---|---|
start-date | 2022-09-02 |
author | Silvan Mosberger (@infinisil) |
co-authors | Robert Hensing (@roberth) |
pre-RFC reviewers | Thomas Bereknyei (@tomberek), John Ericson (@Ericson2314), Alex Ameen (@aakropotkin) |
shepherd-team | @phaer @06kellyjac @aakropotkin @piegamesde |
shepherd-leader | - |
related-issues | NixOS/nixpkgs#237439, NixOS/nixpkgs#211832 |
Auto-generate trivial top-level attribute definitions in Nixpkgs' pkgs/top-level/all-packages.nix
from a directory structure that matches the attribute name.
This makes it much easier to contribute new packages, since there's no more guessing needed as to where the package should go, both in the ad-hoc directory categories and in all-packages.nix
.
- It is not obvious to package contributors where to add files or which ones to edit. These are very common questions:
- Which directory should my package definition go in?
- What are all the categories and do they matter?
- What if the package has multiple matching categories?
- Why can't I build my package after adding the package file?
- Where in
all-packages.nix
should my package go?
- Figuring out where an attribute is defined is a bit tricky:
- First one has to find the definition of it in
all-packages.nix
to see what file it refers to- On GitHub this is even more problematic, as the
all-packages.nix
file is too big to be displayed by GitHub
- On GitHub this is even more problematic, as the
- Then go to that file's definition, which takes quite some time for navigation (unless you have a plugin that can jump to it directly)
- It also slows down or even locks up editors due to the file size
nix edit -f . package-attr
works, though that's not yet stable (it relies on thenix-command
feature being enabled) and doesn't work with packages that don't setmeta.position
correctly).
- First one has to find the definition of it in
all-packages.nix
frequently causes merge conflicts. It's a point of contention for all new packages
This RFC consists of two parts, each of which is implemented with a PR to Nixpkgs. These PR's should be done after a release to maximize the testing period and minimize merge conflicts.
This part establishes the new directory structure in Nixpkgs. This directory structure is internal to Nixpkgs and not exposed as public interface. This directory structure must be documented in the Nixpkgs manual. This PR will be backported to the stable release in order to ensure that backports of new packages work.
Create the initially-empty pkgs/by-name
directory in Nixpkgs, and migrate the hello
package into it.
Check the following using CI:
pkgs/by-name
must only contain subdirectories of the form${shard}/${name}
, called package directories.- The
name
's of package directories must be unique when lowercased name
is a string only consisting of the ASCII charactersa-z
,A-Z
,0-9
,-
or_
.shard
is the lowercased first two letters ofname
, expressed in Nix:shard = toLower (substring 0 2 name)
.- Each package directory must contain a
package.nix
file and may contain arbitrary other files.
Introduce code to automatically define pkgs.${name}
for each package directory as a value equivalent to
pkgs.callPackage pkgs/by-name/${shard}/${name}/package.nix { }
Optionally there may also be an overriding definition of pkgs.${name}
in pkgs/top-level/all-packages.nix
equivalent to
pkgs.callPackage pkgs/by-name/${shard}/${name}/package.nix args
with an arbitrary args
.
Check the following using CI for each package directory:
pkgs.${name}
is defined as above, either automatically or with someargs
inpkgs/top-level/all-packages.nix
.pkgs.${name}
is a derivation.- The
package.nix
file evaluated frompkgs.${name}
must not access files outside its package directory.
Automatically migrate to new directory structure for all satisfiying definitions pkgs.${name}
, meaning derivations defined as above using callPackage
.
However automatic migration is only possible if:
- Files don't need to be changed, only moved, with the exception of
pkgs/top-level/all-packages.nix
- The Nixpkgs package evaluation result does not change
All satisfying definitions that can't be automatically migrated due to the above restrictions will be added to a CI exclusion list. CI is added to ensure that all satisfying definitions except the CI exclusion list must be using the new directory structure. This means that the new directory structure becomes mandatory for new satisfying definitions after this PR. The CI exclusion list should be removed eventually once the non-automatically-migratable satisfying definitions have been manually migrated. Only in very limited circumstances is it allowed to add new entries to the CI exclusion list.
Non-automatic updates may also be done to ensure further correctness, such as
- GitHub's CODEOWNERS
- Update scripts like this
- The Nixpkgs manual like here
This PR will cause merge conflicts with all existing PRs that modify moved files, however they can trivially be rebased using git rebase && git push -f
.
Because of this, merging of this PR should be widely announced with a pinned issue on the Nixpkgs issue tracker and a Discourse post.
Additionally this PR can benefit from being merged after a release due to the decreased PR count, leading to less conflicts.
To add a new package pkgs.foobar
to Nixpkgs, one only needs to create the file pkgs/by-name/fo/foobar/package.nix
.
No need to find an appropriate category nor to modify all-packages.nix
anymore.
With some packages, the pkgs/by-name
directory may look like this:
pkgs
└── by-name
├── _0
│ ├── _0verkill
│ └── _0x
┊
├── ch
│ ├── ChowPhaser
│ ├── CHOWTapeModel
│ ├── chroma
│ ┊
┊
├── t
│ └── t
┊
The sharded structure leads to a distribution as follows:
- There's 17305 total non-alias top-level attribute names in Nixpkgs revision 6948ef4deff7
- These are split into 726 shards
- The top three shards are:
- "li": 1092 values, coming from the common
lib
prefix - "op": 260 values
- "co": 252 values
- "li": 1092 values, coming from the common
- There's only a single directory with over 1 000 entries, which is notably GitHub's display limit, so this means only 92 attributes would be harder to see on GitHub
These stats are also similar for other package sets for if directory structure were to be adopted for them in the future.
Due to the limitations of the new directory structure, only a limited set of top-level attributes can be automatically migrated:
- No attributes that aren't derivations like
pkgs.fetchFromGitHub
orpkgs.python3Packages
- No attributes defined using non-
pkgs.callPackage
functions likepkgs.python3Packages.callPackage
orpkgs.haskellPackages.callPackage
. In the future we might consider having a separate namespace for such definitions.
Concretely this can be computed to be 81.2% (14036) attributes out of the 17280 total non-alias top-level Nixpkgs attributes in revision 6948ef4deff7.
And the initial automatic migration will be a bit more limited due to the additional constraints:
- No attributes that share common files with other attributes like
pkgs.readline
- No attributes that references files from other packages like
pkgs.gettext
These attributes will need to be moved to the new directory structure manually with some arguably-needed refactoring to improve reusability of common files.
nix edit
and search.nixos.org will automatically point to the new location without problems, since they rely on meta.position
to get the file to edit, which still works.
- Backporting changes to moved files won't be problematic
git blame
locally and on GitHub is unaffected, since it follows file moves properly.
A commonly recommended way of building current package directories in Nixpkgs is to use nix-build --expr 'with import <nixpkgs> {}; callPackage pkgs/applications/misc/hello {}'
.
Since the path changes package.nix
is now used, this becomes like nix-build --expr 'with import <nixpkgs> {}; callPackage pkgs/by-name/he/hello/package.nix {}'
, which is harder for users.
However, calling a path like this is an anti-pattern anyway, because it doesn't use the correct Nixpkgs version and it doesn't use the correct argument overrides.
The correct way of doing it was to add the package to all-packages.nix
, then calling nix-build -A hello
.
This nix-build --expr
workaround is partially motivated by the difficulty of knowing the mapping from attributes to package paths, which is what this RFC improves upon.
By teaching users that pkgs/by-name/<shard>/<name>
corresponds to nix-build -A <name>
, the need for such nix-build --expr
workarounds should disappear.
While this RFC allows passing custom arguments, doing so means that all-packages.nix
will have to be maintained for that package.
In specific cases where attributes of custom arguments are of the form name = value
and name
isn't a package attribute, they can be avoided without breaking the API.
To do so, ensure that the function in the called file has value
as an argument and set the default of the name
argument to value
.
This notably doesn't work when name
is already a package attribute or when such a package is added later, because then the default is never used and instead overridden.
Sometimes there's a need to create a variant of a package with different callPackage
arguments. This can be achieved using .override
as follows:
{
graphviz_nox = graphviz.override { withXorg = false; };
}
However this can cause problems with an overlay that tries to make the variant the default as follows:
self: super: {
# Oops, infinite recursion!
graphviz = self.graphviz_nox;
}
Because of this, there's the pattern of duplicating the callPackage
call with the custom arguments as such:
{
graphviz_nox = callPackage ../tools/graphics/graphviz { withXorg = false; };
}
The semantics of how package directories are checked by CI do allow the definition of package variants from package directories:
{
graphviz_nox = callPackage ../by-name/gr/graphviz/package.nix { withXorg = false; };
}
- This directory structure can only be used for top-level packages using
callPackage
, so not for e.g.python3Packages.requests
or a package defined usinghaskellPackages.callPackage
- It's not possible anymore to be a GitHub code owner of category directories.
- The existing categorization of packages gets lost. Counter-arguments:
- It was never that useful to begin with.
- The categorization was always incomplete, because packages defined in the language package sets often don't get their own categorized file path.
- It was an inconvenient user interface, requiring a checkout or browsing through GitHub
- Many packages fit multiple categories, leading to multiple locations to search through instead of one
- There's other better ways of discovering similar packages, e.g. Repology
- It was never that useful to begin with.
- This breaks
builtins.unsafeGetAttrPos "hello" pkgs
. Counter-arguments:- We have to draw a line as to what constitutes the public interface of Nixpkgs. We have decided that making attribute position information part of that is not productive. For context, this information is already accepted to be unreliable at the language level, noting the
unsafe
part of the name. - Support for this could be added to Nix (make
builtins.readDir
propagate file as a position)
- We have to draw a line as to what constitutes the public interface of Nixpkgs. We have decided that making attribute position information part of that is not productive. For context, this information is already accepted to be unreliable at the language level, noting the
Context: this directory contains the shards, which contain the package directories. We could move the shards to a different location.
Alternatives:
- Use
by-name
in the root directory instead- (+) This is future proof in case we want to make the directory structure more general purpose
- (-) We don't yet know if we want that, so this is out of scope for now
- (+) This is future proof in case we want to make the directory structure more general purpose
- Use
pkgs
instead, so that the${shard}
's are siblings to the other current directories inpkgs
such astop-level
, with the intention that the other directories would be hopefully removed at some point, then only leaving the shards inpkgs
- (+) If we remove the other directories at some point, only the
${shard}
's will be left inpkgs
- (-) This leads to ambiguities between the directories from the new directory structure and the other directories, requiring special handling in the code and CI, leading to complexities.
- (-) This makes it hard to pick out the few non-shard directories in directory listings since they will be interleaved with the ~700 shards.
- (-) This would be harder to document and explain to people, since one always has to disregard all non-sharded directories, with no obvious justification
- (-) Currently we cannot apply this directory structure to all definitions in
pkgs
, in particular nested packages likepythonPackages.*
, non-callPackage
'd definitions likecopyDesktopItems
and non-derivations likefetchFromGitHub
. Depending on how we want to handle those, it might make more sense to keeppkgs/by-name
or to usepkgs
directly once all legacy paths are migrated away to another top-level directory, we don't yet know.pkgs/by-name
will be easier to migrate topkgs
than the other way around though. - (-) Causes poor auto-completion for the existing directories
- (+) If we remove the other directories at some point, only the
- A variation of the above that improves on this is altering the shards to be prefixed with
_
so that they're always ordered together and not interleaved with non-shards. Non-shards would still be at the bottom of file listings though, but at least together. It shares the same other problems however. pkgs/unit
: This was the name initially used by the RFC untilby-name
was proposed and favored.- (+) It's not associated with any pre-existing assumptions about what it means, which should cause people unfamiliar with this directory structure to read the documentation.
- (-) This is however also a disadvantage, the name doesn't inform people anything about what it does
- (-) Systemd also has the term "unit", which could be confused with this
- (+) It makes sense to view package directories as units, because they are discrete entities distinct from other entities of the same type
- (+) We envision that in the future we could extend the directory structure to not just include a package definition for each directory, but also other parts such as NixOS modules, library components, tests, etc. In this case
unit
would fit even better and could be described asA collection of standardized files related to the same software component
- (+) It's not associated with any pre-existing assumptions about what it means, which should cause people unfamiliar with this directory structure to read the documentation.
- Various other proposals:
pkgs/auto
,pkgs/pkg
,pkgs/mod
,pkgs/component
,pkgs/part
,pkgs/comp
,pkgs/app
,pkgs/simple
,pkgs/default
,pkgs/shards
,pkgs/top
,pkgs/main
- (-) Generally all of these names have some pre-existing assumptions about them, causing potential confusion when used for this concept
pkgs/default
: Could be interpreted to be some Nix-builtin magic that defaults to that folder. Could also be interpreted as "this is where the default packages go", which then raises the question "which packages are part of the default ones?"pkgs/shards
: The sharding is a self-evident implementation detail, it shouldn't be repeatedpkgs/simple
: Implies that there's a complicated way to declare packages, which there currently is, but it's something we should get away from. If we migrate everything, simple wouldn't mean anything anymore.pkgs/top
: Easily confusable withpkgs/top-level
, thoughtop
would make sense otherwise if we eventually moved all top-level packages to there.- We could consider moving
pkgs/top-level
to another location then, e.g.pkgs/package-sets
.
- We could consider moving
pkgs/main
: "If these are the main packages, where do the others go? What even is a main package?". Also could be confused with an entry-point
packages/${shard}
- (+) Provides a clean starting point without having to be close to the legacy structure
- (-) This would be very confusing to newcomers because there's now both a
pkgs
and apackages
directory in the Nixpkgs root, both spelled the same but very different contents.
pkgs/_
- (+) Very short, fast to type (though that can depend on the keyboard layout)
- (+) Avoids naming discussions, because there is no name
- (-) Naming things is hard, but we shouldn't avoid the problem by giving it no name, which is arguably the worst name
- (-) Looks hacky and internal
- (+) Looks temporary, intention to move to
pkgs
itself once everything is sharded- (-) It shouldn't be temporary. While we do hope to migrate all packages to some sharded form at some point, this may never happen, or the direction is completely changed, and this may take years to form.
Context: The structure is pkgs/by-name/${shard}/${name}
with shard
being the lowercased two-letter prefix of name
.
Alternatives:
- A flat directory, where
pkgs.hello
would be inpkgs/by-name/hello
.- (+) Simpler for the user and code.
- (-) The GitHub web interface only renders the first 1 000 entries when browsing directories, which would make most packages inaccessible in this way.
- (+) This feature is not used often.
- (-) A poll showed that about 41% of people rely on this feature every week.
- (+) This feature is not used often.
- (-) Bad because it makes
git
and file listings slower.
- Use three-letter or four-letter prefixes.
- (-) Also leads to directories containing more than 1 000 entries, see above.
- Use multi-level structure, e.g. a two-level two-letter prefix structure where
hello
is inpkgs/by-name/he/ll/hello
- (+) This would allow virtually a unlimited number of packages without performance problems
- (-) It's hard to understand, type and implement, needs a special case for packages with few characters
- E.g.
x
could go inpkgs/by-name/x-/--/x
- E.g.
- (-) There's not enough packages even in Nixpkgs that a two-level 4-letter structure would make sense. Most of the structure would only be filled by a couple entries.
- (-) Even Git only uses 2-letter prefixes for its objects hex hashes
- Use two-letter prefixes split into two directories, like
pkgs/by-name/h/e/hello
- (+) Allows easy traversal by clicking on GitHub file listings, shard directories being limited to under 40 children
- (-) Requires special-casing single-letter attribute names
- (+) There's currently only 6 such cases, which could be handled on a one-off basis
- (-) Makes auto-completion worse, having to tab-complete once more
- (-) Makes it harder to create shards: if a shard doesn't exist yet, it has to be created with either one or two
mkdir
's, or amkdir -p
- Use a dynamic structure where directories are rebalanced when they have too many entries.
E.g.
pkgs.foobar
could be inpkgs/by-name/f/foobar
initially. But when there's more than 1 000 packages starting withf
, all packages starting withf
are distributed under 2-letter prefixes, movingfoobar
topkgs/by-name/fo/foobar
.- (-) The structure depends not only on the name of the package then, making it harder to find packages again and figure out where they should go
- (-) Complex to implement
Context: The only file that has to exist in package directories is package.nix
, it must contain a function suitable for callPackage
.
Alternatives:
default.nix
- (+)
default.nix
is already a convention most people are used to. - (-) We don't benefit from the usual
default.nix
benefits:- Removing the need to specify the file name in expressions, but this does not apply because this file will be imported automatically by the code that replaces definitions from
all-packages.nix
.- (+) But there's still some support for
all-packages.nix
for custom arguments, which requires people to type out the name- (-) This is hopefully only temporary, in the future we should fully get rid of
all-packages.nix
- (-) This is hopefully only temporary, in the future we should fully get rid of
- (+) But there's still some support for
- Removing the need to specify the file name on the command line, but this does not apply because a package function must be imported into an expression before it can be used, making
nix build -f pkgs/by-name/hell/hello
equally broken regardless of file name.
- Removing the need to specify the file name in expressions, but this does not apply because this file will be imported automatically by the code that replaces definitions from
- (-) Not using
default.nix
frees updefault.nix
for an expression that is actually buildable, e.g.(import ../.. {}).hello
, although we don't yet have a use case for this that isn't covered bynix-build ../.. -A <attrname>
. - (-) Using
default.nix
would tempt users to invokenix-build .
, which wouldn't work and making package functions auto-callable is a known anti-pattern.
- (+)
pkg-fun[c].nix
- (+) Makes a potential transition to a non-function form of packages in the future easier.
- (-) There's no problem with introducing versioning later with different filenames.
- (-) We don't even know if we actually want to have a non-function form of packages.
- (-) Abbreviations are a bit jarring.
- (+) Makes a potential transition to a non-function form of packages in the future easier.
Context: The migration moves files around without providing any backwards compatibility for those moved paths.
Alternative:
- Have a backwards-compatibility layer for moved paths, such as a symlink pointing from the old to the new location, or for Nix files even a
builtins.trace "deprecated" (import ../new/path)
.- (-) It would give precedent to file paths being a stable API interface, which definitely shouldn't be the case (bar some exceptions).
- (-) Leads to worse merge conflicts as the transition is happening, since Git would have to resolve a merge conflict between a symlink and a changed file.
Context: It's possible to override the default { }
argument to callPackage
by manually specifying the full definition in all-packages.nix
The alternative is to not allow that, requiring that pkgs.${name}
corresponds directly to callPackage pkgs/by-name/${shard}/${name}/package.nix { }
.
- (-) It's harder to explain to beginners whether their package can use the new directory structure or not
- (+) The direct correspondance ensures that the package directory contains all information about the package, which is very intuitive
- (-) We're not at the point where we can have that though, custom arguments don't have a good replacement yet
- (-) If a package previously didn't need custom arguments, it would be moved to the new directory structure. But when the need for a custom argument arises, it then requires moving it out from new directory structure and into the freeform structure of
pkgs/
again. - (+) It's easier to relax restrictions than to impose new ones
Context: There's a requirement to check that package directories can't access paths outside themselves.
Alternatives:
- Don't have this requirement
- (-) Doesn't discourage the use of file paths as an API.
- (-) Makes further migrations to different file structures harder.
- Make the requirement also apply the other way around: Files outside the package directory cannot access files inside it, with
package.nix
being the only exception, and only for the one attribute inall-packages.nix
- (-) Enforcing this requires a global view of Nixpkgs, which is nasty to implement
- (-) Package variants would not be possible to define
Context: Custom callPackage
arguments have to be added to all-packages.nix
Alternative: Expand the auto-calling logic according to: Package directories are automatically discovered and transformed to a definition of the form
# If args.nix doesn't exist
pkgs.${name} = pkgs.callPackage ${packageDir}/package.nix {}
# If args.nix does exists
pkgs.${name} = pkgs.callPackage ${packageDir}/package.nix (import ${packageDir}/args.nix pkgs);
- (+) It makes another class of packages uniform, by picking a solution with restricted expressive power.
- (-) It does not solve the contributor experience problem of having too many rules.
args.nix
is another pattern that contributors need to learn how to use, as we have seen that it is not immediately obvious to everyone how it works.- (+) A CI check can mitigate the possible lack of uniformity, and we see a simple implementation strategy for it.
- (-) Complicates the directory structure with an optional file
All of these questions are in scope to be addressed in future discussions in the Nixpkgs Architecture Team:
- Expose an API to get access to the package functions directly, without calling them
- Add a meta tagging or categorization system to packages as a replacement for the package categories. Maybe
meta.tags
withsearch.nixos.org
integration. Maybe https://repology.org/ integration. See also #146. - Making the filetree more human-friendly by grouping files together by "topic" rather than technical delineations. For instance, having a package definition, changelog, package-specific config generator and perhaps even NixOS module in one directory makes work on the package in a broad sense easier.
- This RFC only addresses the top-level attribute namespace, aka packages in
pkgs.<name>
, it doesn't do anything about package sets likepkgs.python3Packages.<name>
,pkgs.haskell.packages.ghc942.<name>
, which may or may not also benefit from a similar auto-calling - Improve the semantics of
callPackage
and/or apply a better solution, such as a module-like solution - Potentially establish an updateScript standard to avoid problems like, relates to Flakes too
- What to do with different versions, e.g.
wlroots = wlroots_0_14
? This goes into version resolution, a different problem to fix - What to do about e.g.
python3Packages.callPackage
? This goes into overrides, a different problem to fix - What about aliases like
jami-daemon = jami.jami-daemon
? - What about
recurseIntoAttrs
? Not single packages, package sets, another problem