Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch store derivations from using ATerm to using JSON or some other mainstream format #5481

Open
catern opened this issue Nov 3, 2021 · 10 comments

Comments

@catern
Copy link
Contributor

catern commented Nov 3, 2021

Is your feature request related to a problem? Please describe.

Store derivations are currently in the ATerm format, which is at this point only used by Nix. Since ATerm is specific to Nix, Nix has its own pretty printer and other tools to support ATerm.

Describe the solution you'd like

Nix should represent store derivations as JSON or s-expressions (or some other common format) instead.

Backwards-compatibility may be tricky, but some special casing to detect ATerm vs JSON seems like it should work.

Describe alternatives you've considered

Some more radical rework of store derivations could also get rid of ATerm, but it's much easier to just switch to JSON.

We could embrace ATerm further; probably the first step would be to publish an actual publicly-accessible spec for the ATerm ASCII format that we use. It's unlikely anyone else will use ATerm, though.

@catern
Copy link
Contributor Author

catern commented Nov 3, 2021

@puckipedia on Libera #nixos has mentioned that changing the format of store derivations will cause all Nix derivations to hash differently, and thus all derivations (even old, already-realised ones) will need to be rebuilt (except for fixed-output derivations).

One idea for working around that is to add a derivation attribute, say "__store_drv_format", which can be set to "json" to opt-in to the new format. Then some version of Nixpkgs could just start setting that attribute by default.

We might also talk to Guix people, because I think they've planned in the past to replace ATerm with s-exps, and maybe they've thought up a clever migration approach.

@stale
Copy link

stale bot commented May 2, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label May 2, 2022
@lambdadog
Copy link

I'd very much like to see this. ATerm is an overly obscure format that, notably, doesn't even have any {de,}serialization libraries packaged in nixpkgs itself, at least as far as I've been able to find!

It makes .drv files overly opaque for a text format and frankly it's hard to find information on the format even with google.

@stale stale bot removed the stale label May 17, 2022
@flokli
Copy link
Contributor

flokli commented Jun 6, 2022

Note there is the nix show-derivation command, which produces a JSON output.

I doubt ATerm will go away any time soon, especially considering they're deeply baked into how all the hashing methods.

You can find some documentation about the ATerm format here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.2195

On the other hand, go-nix recently added a parser for Derivations. I hope you find it useful, and as always, contributions welcome :-)

@toraritte
Copy link
Contributor

What are the problems with ATerms? Does it have any technical flaws, such as it hinders the addition of certain features, makes processing slow/more expensive, etc.? I'm honestly curious.

One issue I can name right off the bat is that it's use by Nix is completely undocumented (except for Dolstra's PhD thesis).

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/why-was-aterms-chosen-for-the-format-of-store-derivations-instead-of-asn-1/27762/1

@lambdadog
Copy link

What are the problems with ATerms? Does it have any technical flaws, such as it hinders the addition of certain features, makes processing slow/more expensive, etc.? I'm honestly curious.

As far as actual problems with Nix as an opaque tool, I can't say that there are any I'm aware of. That said, it leads it being necessary to create libraries such as the Haskell nix-derivation or (often poorly maintained, if they even exist) ATerm libraries rather than simply using a more mainstream format's parsing library when creating tools that interoperate with Nix.

I suspect the author of nix-derivation may not even be aware the ATerm format is what Nix is using. The benefits would be being able to write tools such as nix-diff without having to jump through hoops because Nix uses an obscure serialization format that doesn't even come up when you search its name on Google, assuming you can even find its name in the first place since you'll have to read Dolstra's PhD thesis to find it.

I'm all for innovation in serialization formats, but Nix's usage of ATerm is at best novel, not innovative, given that it stifles creation of tools that interact with derivations and provides no benefit to Nix itself.

That said, it may be more painful than it is valuable to simply switch at this point. If nothing else it would create a lengthy "upgrade" process and would invalidate all hashes unless (likely slow) workarounds were created. Perhaps for a couple of versions Nix could both use ATerm hashes and (ex.) JSON hashes, preferring JSON hashes for new derivations but checking for ATerm ones in caches first, then after a notable portion of derivations in Nixpkgs were now hashes using JSON the ATerm support could be dropped and the remaining ATerm hashes could be invalidated.

It would be a slowdown (as conversion to ATerm would be required for every derivation hashing), but it may be worth it to use a format that's not unnecessarily obscure.

@lambdadog
Copy link

lambdadog commented May 2, 2023

As @flokli mentioned, nix show-derivation can be used but, as an example, with a tool like nix-diff, which iterates through derivation inputs recursively, that would require shelling out for every single derivation encountered, which I suspect is why the nix-derivation library was created in the first place, to avoid such a heavy performance penalty.

@l0b0
Copy link
Contributor

l0b0 commented Feb 20, 2024

Taking as an example the smallest .drv file on my system, /nix/store/y1k8vmb26nwhlir3c5zzwl5mdzbr1nwy-nixos.drv. If I manually pretty-print this file, it looks like this:

Derive(
  [
    (
      "out",
      "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos",
      "",
      ""
    )
  ],
  [],
  [
    "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
  ],
  "builtin",
  "builtin:unpack-channel",
  [],
  [
    (
      "builder",
      "builtin:unpack-channel"
    ),
    (
      "channelName",
      "nixos"
    ),
    (
      "name",
      "nixos"
    ),
    (
      "out",
      "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos"
    ),
    (
      "preferLocalBuild",
      "1"
    ),
    (
      "src",
      "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
    ),
    (
      "system",
      "builtin"
    )
  ]
)

It looks like the only thing necessary to make this a JSON file is to remove the Derive keyword and change the "tuples" into lists:

[
  [
    [
      "out",
      "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos",
      "",
      ""
    ]
  ],
  [],
  [
    "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
  ],
  "builtin",
  "builtin:unpack-channel",
  [],
  [
    [
      "builder",
      "builtin:unpack-channel"
    ],
    [
      "channelName",
      "nixos"
    ],
    [
      "name",
      "nixos"
    ],
    [
      "out",
      "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos"
    ],
    [
      "preferLocalBuild",
      "1"
    ],
    [
      "src",
      "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
    ],
    [
      "system",
      "builtin"
    ]
  ]
]

This brings up a few questions:

Is there any other syntax which needs to be supported?

Are ATerm parenthesis-delimited "tuples" meaningfully different from square bracket-delimited "lists"?

Would the Nix language itself be a good substitute for ATerm? It seems like a natural choice, basically "flattening" the Nix expressions into only static values. Based on nix derivation show /nix/store/y1k8vmb26nwhlir3c5zzwl5mdzbr1nwy-nixos.drv:

{
  args =  [];
  builder =  "builtin:unpack-channel";
  env =  {
    builder =  "builtin:unpack-channel";
    channelName =  "nixos";
    name =  "nixos";
    out =  "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos";
    preferLocalBuild =  "1";
    src =  "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz";
    system =  "builtin";
  };
  inputDrvs =  {};
  inputSrcs =  [
    "/nix/store/xv5kn3sxwi38qbnnhlrzqx2lzkxrk5c3-nixexprs.tar.xz"
  ];
  name =  "nixos";
  outputs =  {
    out =  {
      path =  "/nix/store/rjw7gkfmwc3cs63cky7hv04nimssz26d-nixos";
    };
  };
  system =  "builtin";
}

This seems pretty nice. One less language to worry about, the result is pretty much self-documenting, and we get other advantages of Nixlang such as comments.

@theoparis
Copy link

theoparis commented Aug 11, 2024

I agree with @l0b0. It would be nice to use nix as the format for store derivations.

I'm writing a nix derivation builder in Rust and I want to be able to build existing .drv files. I tried to modify nix locally to use libexpr but I ended up a recursive meson dependency (libstore -> libexpr -> libstore).
I'll probably just use json for the time being since I don't want to bother with parsing a separate aterm format that isn't widely used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants