Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make it possible to add s3 tag to nars when uploading to s3, to allow different retention policy for debug symbols #8080

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

symphorien
Copy link
Member

this allows setting a different retention policy for debug symbols in s3 backed binary caches.

Motivation

We currently have debug symbols for very few libraries in NixOS. Recompiling everything to get debug symbols is non trivial and time consuming.
The main objection (NixOS/nixpkgs#18530) to enabling debug symbols on a large scale is that they are heavy (on my system, about 50% larger on average than the original libraries) so storage costs for the official binary cache would be unacceptable. One possible idea is to keep debug symbols for a shorter period than other packages (which are currently stores forever), for example 7 months (support duration of a release).
This implies being able to know what nars to remove 7 month from now.

.narinfo files contain the name of the store path. A naive approach could be to download all narinfo, read them for paths ending in -debug, and send DELETE queries to amazon but this will take forever and cost many requests.

A better approach is to rely on the ability of s3 to selectively expire objects after a delay

  • by directory (prefix)
  • by tag

By prefix: it would be possible to move debug related nars to another directory instead of nar/, and point narinfo files to it. Although it's undocumented this breaks the format of binary caches and I fear may have unsuspected consequences. For this reason I tried the approach of tagging debuginfo related nars at upload time.

In terms of implementation, binary caches are given a hint when uploading a file, and this hint is set to debug for debug nars.
Only the s3 implem uses the hint currently. This does not really add special casing for debug nars, as there was already an option about them (index-debug-info).

Note: in my opinion this is of limited usefulness if people in charge of the official binary cache don't like this approach. Can someone confirm that hydra does not use multipart uploads ? I did not find a way to make this work with multipart uploads.

Context

Alternatives: keep living without debug info on NixOS forever. More seriously see the discussions at NixOS/nixpkgs#18530

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests - tests/nixos/*
  • documentation in the manual
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or incompatible change: updated release notes

Priorities

Add 👍 to pull requests you find important.

@grahamc
Copy link
Member

grahamc commented Mar 20, 2023

This is an interesting idea. One problem I can imagine is that Nix caches hold the invariant that if a store path exists in the cache, all of its dependencies are also in that same cache.

The implication here is that I think this means we're assuming there are never any store paths that will ever depend on debug outputs. However, I'm not sure that is true, which makes this a rather dangerous proposal.

It may be that we could introduce that invariant(?) but otherwise we can't really know if anything refers to the store path without a garbage collection process.

@symphorien
Copy link
Member Author

This invariant is certainly necessary for stores but does it exist for binary caches as well?

@grahamc
Copy link
Member

grahamc commented Mar 20, 2023

Yes, the invariant is held for binary caches as well. (Which are technically implemented as a store, but that is not why.)

@symphorien
Copy link
Member Author

Could you expand about what parts of nix rely on the fact that binary caches are closed by transitive dependency? I tried to experience the issue myself, and failed.
Setup:
Set up a proxy that misses nixos-16.03's ncurses, but is otherwise identical to cache.nixos.org:

{ config, pkgs, lib, ... }:
{
  services.nginx = {
    enable = true;
    virtualHosts."incomplete.binary.cache" = {
      default = true;
      locations = {
        "/".proxyPass = "https://cache.nixos.org";
        "/ihkcrpmhw7v8gss4zhdfx5zbvxpan06i.narinfo".return = "404";
        "/nar/1k1bsjrvbwfjp26civp4grsxnizmdnccx5xrq41wd881zwij6q12.nar.xz".return = "404";
      };
    };
  };
}

Then attempt to build nixos 16.03's sl:

$  sudo nix --option substituters http://127.0.0.1 --option narinfo-cache-negative-ttl 0 --option narinfo-cache-positive-ttl 0 --extra-experimental-features nix-command --extra-experimental-features flakes -v build -f channel:nixos-16.03 sl
these 2 paths will be fetched (6.79 MiB download, 31.68 MiB unpacked):
  /nix/store/gwl3ppqj4i730nhd4f50ncl5jc4n97ks-glibc-2.23
  /nix/store/whayxmjybkbbdjm82sdmypcmhnlsf6yw-sl-5.02
don't know how to build these paths:
  /nix/store/ihkcrpmhw7v8gss4zhdfx5zbvxpan06i-ncurses-5.9
copying path '/nix/store/gwl3ppqj4i730nhd4f50ncl5jc4n97ks-glibc-2.23' from 'http://127.0.0.1'...
copying path '/nix/store/39jm0yd71yjm8mxw355z7b7wixvc0hpk-sl-5.02-src' from 'http://127.0.0.1'...
copying path '/nix/store/i7hx6w6zy3bv53f2xm1r23ya8qbzn4is-bash-4.3-p42' from 'http://127.0.0.1'...
copying path '/nix/store/rm9fycfaprdr0zmkssmb5rg1g59w82lp-bzip2-1.0.6' from 'http://127.0.0.1'...
copying path '/nix/store/raza8n6f2d65njscffjaj2sgm3j0s5ys-gawk-4.1.3' from 'http://127.0.0.1'...
copying path '/nix/store/bxzwd8nb1jjdh9fcqpii4x7r4gl2s8qb-binutils-2.26' from 'http://127.0.0.1'...
copying path '/nix/store/5xbamcjwscssy80nf0gzqzaaywgrbq0q-ed-1.12' from 'http://127.0.0.1'...
copying path '/nix/store/jzm1pqc3jgwnv7x5cli8j3z3x7678z7c-attr-2.4.47' from 'http://127.0.0.1'...
copying path '/nix/store/x6x1fl9qz7ss4f3l6csxvgryyvh5gz1z-gnumake-4.1' from 'http://127.0.0.1'...
copying path '/nix/store/25lv2pv9c6nlzcxhh4kcln406rnh991q-gnused-4.2.2' from 'http://127.0.0.1'...
copying path '/nix/store/j0iqb8qz273xw7xynlx25s0zbr7y2853-gnutar-1.28' from 'http://127.0.0.1'...
copying path '/nix/store/78i3ia55aq3hbb5gjyz7mmarnb8q1xkk-gzip-1.6' from 'http://127.0.0.1'...
copying path '/nix/store/17izhykpk5zy5r6b216r4wdbq4f6zpmw-ncurses-5.9.tar.gz' from 'http://127.0.0.1'...
copying path '/nix/store/89pbrd7mgdd7dz7a0f85qb5l4i6hb8nf-patch-2.7.5' from 'http://127.0.0.1'...
copying path '/nix/store/z2i4zh64b1ki9zxis5zgf7adddw8pad3-paxctl-0.9' from 'http://127.0.0.1'...
copying path '/nix/store/6w0wrgc9yvzk0lncc6wxpdixdmqspp3f-xz-5.2.2' from 'http://127.0.0.1'...
copying path '/nix/store/mq5a5h2p9wwwbpv0i7lmjzw2a503ph22-acl-2.2.52' from 'http://127.0.0.1'...
copying path '/nix/store/slbij3ypl8w60kdyynylmyc6pfc8k4vf-zlib-1.2.8' from 'http://127.0.0.1'...
copying path '/nix/store/c7ipds48nb7sfzhb7vqp26rrllirxwxv-gcc-5.3.0' from 'http://127.0.0.1'...
copying path '/nix/store/w8vzn0lsahbd9sfh0v30x65qwq6xrpa8-coreutils-8.25' from 'http://127.0.0.1'...
copying path '/nix/store/0c80aywsq3kxifpp10mgm4a4rn7dkny1-diffutils-3.3' from 'http://127.0.0.1'...
copying path '/nix/store/l65knk24c08q0lwdcf0yyh7x6l5shhqj-findutils-4.4.2' from 'http://127.0.0.1'...
copying path '/nix/store/dpkbd1ffigsvbk9n92ihzzicld5z3p7a-pcre-8.38' from 'http://127.0.0.1'...
copying path '/nix/store/9srabrkss2srkghmss1cq2wcwpp4d3nk-patchelf-0.9' from 'http://127.0.0.1'...
copying path '/nix/store/dp6c8mcsashywfkppdzic3l1qz4n9paq-gnugrep-2.22' from 'http://127.0.0.1'...
copying path '/nix/store/m0pbxxvs7zz4ixk4sxyq9shwazpd3kwq-gcc-wrapper-5.3.0' from 'http://127.0.0.1'...
copying path '/nix/store/62h3c4d6rdnlxichixqg8h9jxi8nhxk0-stdenv' from 'http://127.0.0.1'...
building '/nix/store/rr1acqcvkb86zrnyvcyqlcgikwsvhs7i-ncurses-5.9.drv'...
copying path '/nix/store/whayxmjybkbbdjm82sdmypcmhnlsf6yw-sl-5.02' from 'http://127.0.0.1'...

As you can see it transparently rebuilds the missing path, and even respects topological order by substituting, then building, then substituting again. In this experiment at least, nix proved fully resilient to binary caches with holes.

@lheckemann
Copy link
Member

I really like the idea of having the ability to fetch debug symbols from cache.nixos.org.

I'm not too keen on this implementation, because it's unnecessarily limited to serving this single use case. Allowing attaching extra metadata to store paths has a huge amount of potential for other uses, and attaching the tag based on this very specific shape of output path is an unfortunately specific mechanism.

I'd much prefer if the policy producing metadata like this were kept outside Nix itself:

  • Evaluation-time metadata --- one could for example tag paths with the nixpkgs version they were built with, and this should not affect the contents of the output). I imagine it would involve the meta field on derivations, which already has semantics like this -- not affecting;
  • Build-time metadata, which are tightly coupled to the build itself. These could be placed in $output/nix-support/meta.json for example. I think this would be much more sensible for implementing "expiring" debug info.

Regarding the store invariants: these are important especially for something like debug info; I can see the rebuilding behaviour you observed resulting in debug info which doesn't actually match the binaries fetched from the cache, and ends up being worse than useless. We could probably avoid breaking this invariant on the nixpkgs side by not allowing package outputs to reference their dependencies' debug outputs in practice, though I'd much prefer a solution that guarantees the preservation of the invariant as opposed to "hopefully" providing it.

@symphorien
Copy link
Member Author

Build-time metadata, which are tightly coupled to the build itself. These could be placed in $output/nix-support/meta.json for example. I think this would be much more sensible for implementing "expiring" debug info.

Consider first the case where we put this metadata (in our case "this is debug info") inside a file outside the nar (be it the narinfo or some other file) then to be able to remove debuginfo from the binary cache, we need to fetch all narinfos from the dawn of time (because listing APIs of s3 seem extremely limited, it does not even seem possible to fetch files by date). To get an idea of how slow it is: nix-index takes 8 minutes and half on my machine to fetch 92000 such metadata files. These 92000 paths are an approximation of "all that is in nixpkgs", and with the staging cycle we produce at least a new batch of 92000 every ~two weeks for nixos-unstable, and one other for stable. NixOS used to be smaller in the past, but still after 10 years this suggests it would take nearly three days to download one metadata file per store path in the binary cache.

Putting it inside the nar like $output/nix-support/meta.json makes this problem worse.

Final possibility: putting this info inside the db of hydra: well it's exactly as ad-hoc as my solution, and to some extent by opening this PR against nix I'm modifying the part of hydra that uploads to S3 so it's not fundamentally different nor less ugly.

Evaluation-time metadata --- one could for example tag paths with the nixpkgs version they were built with, and this should not affect the contents of the output). I imagine it would involve the meta field on derivations, which already has semantics like this -- not affecting;

You suggest we could interpret meta.hint, and this value would becomes a s3 tag or a file xattr. This seems better to me, but I am still uncomfortable with this solution because

  • writing the documentation for this is weird: this hint may or may not be stored in an unspecified way in binary caches. it may also be ignored.
  • we need to set it by output, but meta is global to the derivation.

Regarding the store invariants: these are important especially for something like debug info; I can see the rebuilding behaviour you observed resulting in debug info which doesn't actually match the binaries fetched from the cache, and ends up being worse than useless. We could probably avoid breaking this invariant on the nixpkgs side by not allowing package outputs to reference their dependencies' debug outputs in practice, though I'd much prefer a solution that guarantees the preservation of the invariant as opposed to "hopefully" providing it.

Note that this problem (rebuilding debug info a second time results in mismatched binary vs debuginfo) already exists currently with nix upholding the invariant, see #7756 . This was deemed a bug in nixpkgs for not having reproducible builds, and not a responsibility of nix.

Besides the current situation is:

  • I fetch sl, it crashes
  • I don't have debuginfo
  • I may rebuild with debuginfo, and if I am not lucky the debug info may not match the coredump I obtained

With debuginfo removed from cache after say 6 month, the situation you deplore is:

  • nix fetches sl and rebuilds mismatching debuginfo
  • the debug info may not match the coredump I obtained

This is bad, but exactly as bad as right now. On the other hand, we get usable debuginfo for 6 months. This is not a regression.

@lheckemann
Copy link
Member

Consider first the case where we put this metadata (in our case "this is debug info") inside a file outside the nar (be it the narinfo or some other file) then to be able to remove debuginfo from the binary cache, we need to fetch all narinfos from the dawn of time (because listing APIs of s3 seem extremely limited, it does not even seem possible to fetch files by date).

I'm suggesting that this metadata could be applied as S3 tags at upload time, much like you do here -- my main goal with the suggestion was to avoid limiting ourselves to this very specific debug tag based on a very restrictive convention as opposed to being defined by the expression or its build output -- as an example, what if we want to do the same for JS source maps? I could easily imagine those not ending up in lib/debug but still being a separate output that we don't want to keep forever, and it wouldn't be great if we would then need to implement the logic for that in Nix itself rather than in the builders.

$output/nix-support/meta.json would be the source of truth for this, but not the only place that this information may be available.

Regarding evaluation-time metadata: there's probably a sensible scheme through which one could attach such metadata per output as well. However, I don't think it's super relevant here and more of a related thing on my wishlist.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Apr 17, 2023

Triaged in the Nix team meeting 2023-04-10:

  • The solution feels fairly ad-hoc. It should probably work like this:
    1. Don't hardcode the debug label and instead add a more generic tagging mechanism
    2. This could perhaps be done as part of the build and also stored in the local DB
  • We're happy to discuss a more structured proposal that addresses this

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-04-10-nix-team-meeting-minutes-47/27357/1

@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Oct 30, 2023
@symphorien
Copy link
Member Author

thank you for your feedback. I pushed an implementation of your idea. If a nar contains /nix-support/tags.json that contains {"foo": "bar"} then the corresponding .nar and .narinfo objects on s3 are tagged nix:foo -> bar. The nix: namespacing is to avoid the forbidden aws: namespace.
If this approach looks good to you, I will make a companion nixpkgs pr that tags the debug outputs this way and test a bit more.

@symphorien
Copy link
Member Author

I fixed the merge conflicts

if a nar contains a nix-support/tags.json file of the form {key: string
value}, then the nar file (plus .narinfo and other accompanying files)
is tagged as "nix:key" -> "value".
@symphorien
Copy link
Member Author

I fixed the merge conflicts.

@edef1c
Copy link
Member

edef1c commented Nov 26, 2023

A naive approach could be to download all narinfo, read them for paths ending in -debug, and send DELETE queries to amazon but this will take forever and cost many requests.

FWIW, the archivist team is keeping consolidated dumps of all narinfos. Ingesting all ~205 million of them took a couple of hours and around €100 in S3 requests. We have about a quarter million -debug paths currently, weighing in at ~17TiB, or ~3.6% of total cache size.

DELETE requests are billed alike to GETs (and faster to run), so expiring a few million paths is not particularly costly or complex.

It's also worth noting that there are .ls files for newer paths in the cache, describing their contents. These are present for about 195 million of the paths in the cache, although those do weigh in at ~415GiB in their current format, and haven't been extracted yet.

A better approach is to rely on the ability of s3 to selectively expire objects after a delay

Naively using object expiration will expire objects that are still in recent closures but haven't been rebuilt recently.

By prefix: it would be possible to move debug related nars to another directory instead of nar/, and point narinfo files to it. Although it's undocumented this breaks the format of binary caches and I fear may have unsuspected consequences. For this reason I tried the approach of tagging debuginfo related nars at upload time.

Nothing in Nix cache retrieval depends on nars being in nar/. If there are implementations that depend on this, it's probably best to prevent that from ossifying. cache.nixos.org is also not the raw S3 bucket, so we could quite easily rewrite paths on the CDN side if we really needed to.

Copy link
Contributor

@tomberek tomberek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the approach. Let's add some docs regarding the new nix-support file and convention.

Thinking out loud now:

  • this is basically metadata, and can be used to have additional attributes in a NAR model that does not have them.
  • It cannot be used for single-file store objects.
  • other possibilities?
  • pros/cons for other realms than debugging and s3?

@symphorien
Copy link
Member Author

It's very cool that the archivist team has real numbers about the storage cost of debuginfo.

Naively using object expiration will expire objects that are still in recent closures but haven't been rebuilt recently.

That's a very good point, and I had not thought about that. I expect this does not affect debuginfo too much, as the staging cycle ensures there is a world rebuild every month approximately. But that would definitely be relevant if we wanted to expire sources this way.

other possibilities?

it's possible to encode these tags on file:/// binary caches with xattrs, but I fail to see an application.


will write documentation later this week.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-long-term-resolution-phase-1/36493/1

@symphorien symphorien changed the title tag nars containing only lib/debug as hint=debug when uploading to s3 make it possible to add s3 tag to nars when uploading to s3, to allow different retention policy for debug symbols Dec 5, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-01-26-nix-team-meeting-minutes-118/38851/1

@symphorien
Copy link
Member Author

Hey @Ericson2314
I see you were assigned to make a broader proposal in the nix team minutes. I suppose it's no use fixing the merge conflicts until you do so, but feel free to tell me if it's not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
store Issues and pull requests concerning the Nix store
Projects
Status: 🏁 Review
Development

Successfully merging this pull request may close these issues.

8 participants