Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write meta attribute as JSON into the derivation #256296

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

domenkozar
Copy link
Member

This allows to generate SBOMs and filter derivations based on license without relying on evaluation time information.

The downside is that changing meta attributes triggers a rebuild (which shouldn't have much impact with CA derivations).

Another downside is disk space usage, for example pkgs.git closure of .drv files goes from 3.8MB to 4.4MB.

This is part of a larger effort to bring security vulnerability notifications to https://cachix.org and anyone else that wants to have the automation.

@shlevy
Copy link
Member

shlevy commented Sep 20, 2023

which shouldn't have much impact with CA derivations

Aren't CA derivations still an experimental feature? We can't rely on them like this.

Metadata changes should not cause rebuilds and should not be part of the output. That's not metadata, that's data. If the maintainers of glibc change we shouldn't rebuild the world. The proper fix here is for Nix to have some mechanism for attaching metadata to store objects that doesn't impact the hash algorithm.

@domenkozar
Copy link
Member Author

domenkozar commented Sep 20, 2023

Aren't CA derivations still an experimental feature? We can't rely on them like this.

They are of course, I meant that long term we have a solution for this issue so it's less impactful than it may seem.

Metadata changes should not cause rebuilds and should not be part of the output.

The way I see it is that CA derivations are exactly the idea behind this, trivial changes shouldn't impact the whole recompilation of the tree.

Metadata changes should not cause rebuilds and should not be part of the output. That's not metadata, that's data. If the maintainers of glibc change we shouldn't rebuild the world.

There are two conflicting shoulds:

  1. we should have good automation for security vulnerabilities
  2. metadata shoudn't impact rebuilds

I do think (2) brings so many good things that we can delay (1) should until CA derivations are stable.

@shlevy
Copy link
Member

shlevy commented Sep 20, 2023

If this feature is important, implement it properly in Nix. No need to wait for CA derivations.

@domenkozar
Copy link
Member Author

If this feature is important, implement it properly in Nix. No need to wait for CA derivations.

I like to split improvements into three phases:

  1. make it work (here comes the benefit)
  2. make it correct (fix the bugs)
  3. make it fast (CA derivations / Nix allowing derivation inputs not impacting hashing)

@shlevy
Copy link
Member

shlevy commented Sep 20, 2023

You're "making it work" by breaking the semantics of an existing working system for everyone. You're not building some prototype of some opt-in functionality. Again, why not do your step 1 by implementing a prototype of a metadata feature in Nix? In 30 seconds of thought here's an approach you could probably put together in a day or two:

  1. Add a field to the db for metadata, just an arbitrary string optionally associated with any store path
  2. Add a new builtin configurableDerivationStrict that takes a set containing { meta, drvArgs }, passes drvArgs to derivationStrict, and attaches meta to the .drv storepath
  3. Have mkDerivation use configurableDerivationStrict if Nix supports it and there is metadata.

@domenkozar
Copy link
Member Author

domenkozar commented Sep 20, 2023

You're "making it work" by breaking the semantics of an existing working system for everyone. You're not building some prototype of some opt-in functionality. Again, why not do your step 1 by implementing a prototype of a metadata feature in Nix? In 30 seconds of thought here's an approach you could probably put together in a day or two:

  1. Add a field to the db for metadata, just an arbitrary string optionally associated with any store path
  2. Add a new builtin configurableDerivationStrict that takes a set containing { meta, drvArgs }, passes drvArgs to derivationStrict, and attaches meta to the .drv storepath
  3. Have mkDerivation use configurableDerivationStrict if Nix supports it and there is metadata.

I'm not sure creating another type of derivation is the right approach here, given that this is a solved problem already as experimental feature in Nix for ~2 years.

Note that your suggestion breaks most likely the semantics of .drv files if I understand it correctly, for example for existing parsers.

@infinisil infinisil added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Sep 20, 2023
@shlevy
Copy link
Member

shlevy commented Sep 20, 2023

I'm not sure creating another type of derivation is the right approach here

I'm not creating another type of derivation. I'm suggesting simply a wrapper primop that adds a new field to the Nix db.

given that this is a solved problem already as experimental feature in Nix for ~2 years.

Do CA derivations work? Are they ready to be enabled by default? Are they stable? If so, then they should be unexperimental. If not, they cannot be relied upon. Even if they were non-experimental, they would not be a proper solution here: nothing should rebuild if you change metadata.

Note that your suggestion breaks most likely the semantics of .drv files if I understand it correctly, for example for existing parsers.

No, no data is stored in the drv. Just in the db associated with the drv path.

@YorikSar
Copy link
Contributor

I think that this pulls in too much information into the build. While license is a part of sources, and license change wouldn't happen without needing a rebuild anyway, other pieces of information like current maintainer or level of brokenness on different platforms shouldn't affect the package.

Maybe we should tie license to the sources of the package and store that in derivations instead? Or maybe even keep it in "source" derivations and not in "package" derivations.

I'm not sure how this helps security vulnerability checks though. If we find a vulnerability in the package, we add it in meta and currently it doesn't trigger package rebuild, so you can compare store path and check if the package that you already have is now vulnerable. If you add this information into derivation, without CA you'll get a new output path and you won't be able to check it on already deployed systems.

@shlevy
Copy link
Member

shlevy commented Sep 20, 2023

Or maybe even keep it in "source" derivations and not in "package" derivations.

Given the package drv, we can almost always identify the src, unpack it, and scan the result for licenses. No new drv needed, probably gets you 95% coverage with a simple script. Seems like a perfect fit for @domenkozar's "make it work" stage.

@YorikSar
Copy link
Contributor

identify the src, unpack it, and scan the result for licenses

I think this is too much effort with a bunch of corner cases, so “caching” the result of such action is a good thing. I also understand the need for having this cached not just in code, but in derivations as often you don’t have sources for drvs that produced these outputs, but drvs themselves are quite easy to keep around with keep-derivations.

@flokli
Copy link
Contributor

flokli commented Sep 20, 2023

identify the src, unpack it, and scan the result for licenses

I think this is too much effort with a bunch of corner cases, so “caching” the result of such action is a good thing.

That's exactly what IFD can already do today. Just because it doesn't fly with Hydra and its scheduler doesn't mean it's a bad idea - meta isn't recursed AFAIK anyways. I still would very much prefer if we specified this stuff manually, rather than blindly trusting a script to gather the right data.

This allows to generate SBOMs and filter derivations based on license
without relying on evaluation time information.

The downside is that changing meta attributes triggers a rebuild
(which shouldn't have much impact with CA derivations).

Another downside is disk space usage, for example pkgs.git closure
of .drv files goes from 3.8MB to 4.4MB.
@domenkozar
Copy link
Member Author

I've changed the implementation to be opt-in, so that we can develop the feature until it's ready to be on by default.

@SuperSandro2000
Copy link
Member

Do CA derivations work? Are they ready to be enabled by default?

No, they are far from ready. At the moment you can't reliable convert a already built system output because each other referencing outputs create deadlocks. In the past multiple such deadlocks where already fixed but I can easily imagine that there are more to be found and edge cases to be fixed.
Also hydra support is not existing and the open PR for it is stale and I couldn't get it work.

@roberth
Copy link
Member

roberth commented Sep 21, 2023

This could be handled by improving the string context mechanism. That way we don't "pollute" the drv files.

@blaggacao
Copy link
Contributor

blaggacao commented Sep 21, 2023

After Domen posted this PR in #slsa:nixos.org a quick exploratory discussion ensued between @RaitoBezarius and myself the outcome of which I want to resume here:

  • More than a question of metadata storage, it can be considered a question of cache storage (since the SBOM data will be reproducible)

  • That renders the question addressed by this PR one of a cache storage implementation

  • .drv for the reasons mentioned above and also from its principle design as "build recipes", may be a very unsuitable implementation of such a cache

  • other options to implement such a cache would be:

    • the local cache in the sql database, as proposed by @shlevy
    • a remote cache, i.e. sbom.cachix.org / sbom.nixos.org

Using .drv as a cache implementation still seemed unsuitable to us even if using it could lead to a quick "make it work" improvement of ecosystem outcomes by potentially improving overall security.

The reasons it seemed unsuitable even under this angle mostly stems from:

  • .drv was not designed to be a cache layer / metadata transport layer even if placing something into a drv has all the effects of a cache via Nix' build cache
  • this may set a precedent that can lead the community in the wrong architectural direction from which it may be difficult to recover. Related questions may be asked, such as: Do we have the governance in place to follow up and correct the implementation and ensure proper migration later? We answered that question such that we remained in favor of not setting that precedent. We did not validate if similar precedents already exist, though.
  • an independent (metadata) cache layer implementation may emerge in a sandbox (whether remote or in sqlight) to validate all assumptions and use cases and a later settling inside Nix or Nixpkgs is a reasonable alternative path forward

I hope I did capture the outcomes appropriatly and that our findings are well received.

@lheckemann
Copy link
Member

NixOS/nix#8080 is somewhat related; the approach currently implemented there only applies metadata from build outputs, but I could see a future where tags can come from evaluation-time info that doesn't land in the derivations.

@Ericson2314
Copy link
Member

Ericson2314 commented Dec 6, 2023

Can't we just have mkDerivation in Nixpkgs write another file with the metadata which refers to the derivation? No Nix changes needed then.

@emilylange
Copy link
Member

I'll mark this as draft, as it's clear this shouldn't be merged until multiple concerns are resolved, and I want to prevent this getting merged by someone by accident (well aware there is currently a merge conflict, that prevents it from being merged right now anyway).

@emilylange emilylange marked this pull request as draft December 6, 2023 19:36
@wegank wegank added 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 2.status: merge conflict labels Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: stdenv Standard environment 10.rebuild-darwin: 0 10.rebuild-linux: 0 significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet