Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gvisor: init at 2018-11-10 #50218

Closed
wants to merge 1 commit into from
Closed

gvisor: init at 2018-11-10 #50218

wants to merge 1 commit into from

Conversation

@andrew-d
Copy link
Contributor

@andrew-d andrew-d commented Nov 11, 2018

Motivation for this change

Add a package for the gvisor container runtime sandbox. This was requested in #39889, but there were some problems with Bazel at the time. I've managed to get this working, but I'd appreciate feedback on how I've done so. In short: there's two derivations here; one that is a fixed-output derivation produced by running bazel sync to download all dependencies and making them deterministic, and a second that uses the above derivation along with the source in order to build the actual output binary. At the end of the whole process, gvisor is runnable:

$ /nix/store/iag5vgl51alqmirabvz5ij9yfp6kwmby-gvisor-2018-11-10/bin/runsc --help
Usage: runsc <flags> <subcommand> <subcommand args>

Subcommands:
	checkpoint       checkpoint current state of container (experimental)
	create           create a secure container
	delete           delete resources held by a container
	events           display container events such as OOM notifications, cpu, memory, and IO usage statistics

I haven't yet tested this with Docker, so I'll try to do that shortly.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

cc @dtzWill and @q3k (on the original issue)
cc @mboes (Bazel maintainer - feedback appreciated!)

@andrew-d
Copy link
Contributor Author

@andrew-d andrew-d commented Nov 11, 2018

I tried testing this with Docker, but it looks like the CONFIG_CGROUP_PERF kernel option isn't enabled in the default Nixpkgs kernel, leading to some form of incompatibility between gvisor and Docker. I initially get the following error:

error creating container: error configuring cgroup: mkdir /sys/fs/cgroup/perf_event: read-only file system

I can remount the cgroup filesystem as rw (sudo mount -o remount,rw /sys/fs/cgroup), but when doing that or patching gvisor to remove that cgroup from the controller set, I get the error:

unable to find "perf_event" in controller set: unknown.

I don't have time to rebuild my kernel with that cgroup option right now, but I'll try to get it to it soon, unless someone else wants to have a try!

@orivej orivej mentioned this pull request Nov 11, 2018
3 of 9 tasks complete
@andrew-d
Copy link
Contributor Author

@andrew-d andrew-d commented Nov 12, 2018

@orivej - After applying #50225, and setting virtualisation.docker.extraOptions = "--add-runtime=runsc=/nix/store/[...]-gvisor-2018-11-10/bin/runsc";, I'm able to run docker run --runtime=runsc -it ubuntu /bin/bash, apt-get install things from within the container, and generally do things. I also confirmed via ps that gvisor was running! 🎉

pkgs/applications/virtualization/gvisor/default.nix Outdated
find "$out" -name '*.sh' -exec \
sed -i 's|#!/bin/bash|#!${bash}/bin/bash|g' {} \;

find "$out" -name '*.go' -exec \

This comment has been minimized.

@nlewo

nlewo Nov 18, 2018
Member

It seems this is only required by tests. Could you try with '*_test.go'?
Also, it woul be nice to patch this upstream:/

This comment has been minimized.

@andrew-d

andrew-d Nov 18, 2018
Author Contributor

I'll look into patching this upstream at some point, sure. For now, fixed this, the merge conflict, and force-pushed.

@nlewo
Copy link
Member

@nlewo nlewo commented Nov 18, 2018

@Profpatsch Do you know if there is a more simple way to prefetch dependencies for Bazel builds in nixpkgs? The goal is to not have to download dependencies with Bazel (bazel sync) at application build time.

@andrew-d
Copy link
Contributor Author

@andrew-d andrew-d commented Nov 18, 2018

The goal is to not have to download dependencies with Bazel (bazel sync) at application build time.

I tried a couple ways to do this, but wasn't successful. You can use native.existing_rules() to iterate over all repository rules in a Bazel workspace, but turning those into things that Nix can fetch is pretty tricky. Especially since you can't just search for a standard set of rules that access the network, since e.g. rules_go has some repository rules that run custom commands to fetch dependencies. I suspect that bazel sync is probably the best we're going to get, honestly.

@andrew-d andrew-d force-pushed the andrew-d:andrew/gvisor branch Nov 18, 2018
@Profpatsch
Copy link
Member

@Profpatsch Profpatsch commented Nov 19, 2018

Do you know if there is a more simple way to prefetch dependencies for Bazel builds in nixpkgs?

bazel sync --experimental_repository_resolved_file <filename> is able to produce some kind of lock file, but it’s kinda verbose and not in a well-known format, but skylark. It might be possible to eval it with a python interpreter and spew out some json.

pkgs/applications/virtualization/gvisor/default.nix Outdated Show resolved Hide resolved
@nlewo
Copy link
Member

@nlewo nlewo commented Nov 24, 2018

This looks good to me.
But, it's really tricky to build a Bazel project in nixpkgs. It would be nice to have a bazel2nix tool! Moreover, I don't know how this build will be robust on Bazel upgrades.

This is not required, but it would be nice to have a NixOS test that uses this container runtime engine. I could help on that.

@GrahamcOfBorg build gvisor

@GrahamcOfBorg
Copy link

@GrahamcOfBorg GrahamcOfBorg commented Nov 24, 2018

No attempt on aarch64-linux (full log)

The following builds were skipped because they don't evaluate on aarch64-linux: gvisor

Partial log (click to expand)


a) For `nixos-rebuild` you can set
  { nixpkgs.config.allowUnsupportedSystem = true; }
in configuration.nix to override this.

b) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
  { allowUnsupportedSystem = true; }
to ~/.config/nixpkgs/config.nix.


@GrahamcOfBorg
Copy link

@GrahamcOfBorg GrahamcOfBorg commented Nov 24, 2018

No attempt on x86_64-darwin (full log)

The following builds were skipped because they don't evaluate on x86_64-darwin: gvisor

Partial log (click to expand)


a) For `nixos-rebuild` you can set
  { nixpkgs.config.allowUnsupportedSystem = true; }
in configuration.nix to override this.

b) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
  { allowUnsupportedSystem = true; }
to ~/.config/nixpkgs/config.nix.


@GrahamcOfBorg
Copy link

@GrahamcOfBorg GrahamcOfBorg commented Nov 24, 2018

Unexpected error: command failed with exit code 1 on x86_64-linux (full log)

Attempted: gvisor

Partial log (click to expand)

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   155    0   155    0     0    734      0 --:--:-- --:--:-- --:--:--   734
100 1735k    0 1735k    0     0  1521k      0 --:--:--  0:00:01 --:--:-- 3916k
unpacking source archive /build/d97ccfa346d23d99dcbe634a10fa5d81b089100d.tar.gz
cannot link '/nix/store/.links/1cnjyagqg3s6b6v6j675ryhzk9q9f6fhd88v0dyqw8w6s5b07r7x' to '/nix/store/zwzcdh5x9wr3pq0n6vdzcpgrjcnixr7f-source/pkg/sentry/kernel/g3doc/run_states.dot': No space left on device
cannot link '/nix/store/.links/1s37c4s9a74nv7j6xxydif20a7ljydlj8y4z3p69n1dahnn6m7gq' to '/nix/store/zwzcdh5x9wr3pq0n6vdzcpgrjcnixr7f-source/pkg/sentry/mm/mm.go': No space left on device
cannot link '/nix/store/.links/0wymhb94n99vs4yf3vjb4gmlva43hcfnbrvac9wldjd85aaznhxp' to '/nix/store/zwzcdh5x9wr3pq0n6vdzcpgrjcnixr7f-source/pkg/sentry/fs/proc/uptime.go': No space left on device
warning: path '/nix/store/zwzcdh5x9wr3pq0n6vdzcpgrjcnixr7f-source' claims to be content-addressed but isn't
error: unexpected end-of-file

@andrew-d
Copy link
Contributor Author

@andrew-d andrew-d commented Nov 24, 2018

@nlewo - This should be pretty reliable through Bazel upgrades; the bazel sync command is the newly-recommended way of doing reproducible builds, and the only other thing that could change is the $TEST_TMPDIR variable, which is currently documented here. Just about everything else is essentially independent of the way Bazel works. I suspect this approach is substantially more reliable than using the Bazel --experimental_repository_resolved_file flag, which could change at any point.

I'll try to get a NixOS test added today or tomorrow; it'll be my first one, so I'll happily take any advice!

In the mean time, do you mind kicking off another build? Looks like the Linux build failed due to disk space, which doesn't look related to this PR.

pkgs/applications/virtualization/gvisor/default.nix Outdated
# NOTE: this is the output of the whole fixed-output derivation, so
# `nix-prefetch-git` won't work to obtain this. The easiest way is to just
# change it and see what breaks :)
sha256 = "1bcnq7kazbf6l5j0g82x2lvg1nbp7z70klk139dxi0jkw0j8dh3r";

This comment has been minimized.

@nlewo

nlewo Nov 25, 2018
Member

This is not the expected hash. Maybe, you forgot to update it when you changed line 63.

This comment has been minimized.

@andrew-d

andrew-d Nov 26, 2018
Author Contributor

Ahhhhhhhh, I know what this is. The fixed-output derivation has the paths to bash and coreutils as part of the derivation, so any change to those results in a new hash here too. That's annoying 😒

I think I can fix that by changing the bash calls to /bin/sh, and just dropping the test fixes since we're not running them anyway.

This comment has been minimized.

@nlewo

nlewo Nov 26, 2018
Member

If possible, it would be better to patch shebangs in the patch phase of the bazelDependencies derivation.

pkgs/applications/virtualization/gvisor/default.nix Outdated

outputHashMode = "recursive";
outputHashAlgo = "sha256";
outputHash = "0430pn3q71r6pyxq32k2n1zhnp9hvs5mizvw3zy6zwrsv3fchdb6";

This comment has been minimized.

@nlewo

nlewo Nov 25, 2018
Member

This hash is also not the expected one when I locally build it. But this could be related to the update of the hash of patchedSource.

This comment has been minimized.

@andrew-d

andrew-d Nov 26, 2018
Author Contributor

I force-pushed the fix for the comment above. If this hash still doesn't work, mind running this command for me (with the correct store path) and uploading the results to Gist / Pastebin / something?

(cd /nix/store/35j7izc656kyppz5nqci9c6rivp2zi9s-gvisor-build-dependencies-2018-11-10 && find . -type f | xargs shasum | sort -k 2,2)

This comment has been minimized.

@nlewo

nlewo Nov 26, 2018
Member

@andrew-d Hashes are not corrects:(
Let me know if https://gist.github.com/nlewo/bbd43f6a7c985e6d70402ac55a439116 helps you: there are hashes of the resulting build temporary directory (nix-store -rK ...).

This comment has been minimized.

@andrew-d

andrew-d Nov 27, 2018
Author Contributor

Okay, this is going to be substantially more annoying than I'd expected. After a bunch of digging, here's what I've found:

  • rules_go uses some helper tools, which they go install into a synthetic repository
  • Bazel creates .marker files to track whether repositories are up-to-date (and which is the vast majority of what differs between our two systems); these marker files appear to include the hash of the underlying files in the working tree.
  • Since these are built using the regular go tooling, and inconsistent paths, they don't have a consistent output hash.
  • Bazel will verify marker as part of the build process, so we can't patch these tools post-hoc (since this means the hashes in the marker files don't match and Bazel tries to re-download everything).

I'm honestly at a bit of a loss; here are my thoughts:

  1. Try to get these tools building in a reproducible fashion (requires an upstream patch in rules_go)
  2. Do something to fix these specific files; the inconsistency comes from a specific debug section in the output binaries (.note.go.buildid), so we could try to zero out that section.

Of the two, I'm going to try to do #1, since the second feels fragile to me. But overall, yeah, this is pretty annoying 😒

(also, I thought about trying to build things with Nix itself, but unlike the *tonix utilities that other languages' package managers use, Bazel repository rules allow running arbitrary shell scripts, so I think we'll always have to run Bazel itself to fetch dependencies)

@andrew-d andrew-d force-pushed the andrew-d:andrew/gvisor branch to f1a1545 Nov 26, 2018
@andrew-d
Copy link
Contributor Author

@andrew-d andrew-d commented Nov 28, 2018

Current state: I've submitted a patch to bazel-gazelle to make the helper build tools deterministic (bazelbuild/bazel-gazelle#382) which has been merged, but that's not sufficient; I'm currently chasing down some Nix paths in the dependency output. Most of them are local configuration from the environment, and we can just remove them (rm -rf $out/local_config*), but there's one particular problem that I'm running into:

Our Go compiler has patches[1][2] that replace the absolute /etc/services, /etc/protocols, and /usr/share/zoneinfo paths with Nix store paths. This, however, means that we cannot use a Go binary in a fixed-output derivation, since the binary will contain paths from the Nix store and thus the fixed-output hash will change if those paths ever do. Anyone have any idea what we normally do in cases like this? Or should we just assume that this particular problem is a lost cause, and find some other way of building these binaries?

(also, holy hell is this rapidly turning into something more complicated than I'd originally expected 😛)

@Profpatsch
Copy link
Member

@Profpatsch Profpatsch commented Nov 29, 2018

Thanks for putting in the work to research bazel builds inside of nix.

This, however, means that we cannot use a Go binary in a fixed-output derivation, since the binary will contain paths from the Nix store and thus the fixed-output hash will change if those paths ever do. Anyone have any idea what we normally do in cases like this? Or should we just assume that this particular problem is a lost cause, and find some other way of building these binaries?

I haven’t seen fixed-output hashes for anything but implementing fetchers, since they require absolute determinism. Especially with a semi-hermetic build tool like bazel which uses build rules written by imperative programmers (cough rules_go cough) that’s tough to achieve.

Best strategy I can see right now is using their lock file to parse out all hashes and check those hashes into nixpkgs (plus an update script that can update the hashes). Since the output format is skylark, you should be able to parse it as valid python syntax (or eval with all symbols stubbed out).

@andrew-d andrew-d force-pushed the andrew-d:andrew/gvisor branch from f1a1545 Dec 10, 2018
@andrew-d
Copy link
Contributor Author

@andrew-d andrew-d commented Dec 10, 2018

This was super annoying, but: I've just force-pushed an update that successfully builds gvisor by manually fetching all dependencies with Nix. It's especially annoying since rules_go applies patches to some third-party libraries, so we have to manually apply those ourselves too, or the build will fail. However: this builds properly for me, now.

@Profpatsch and @nlewo - thoughts on this new approach?

@andrew-d andrew-d force-pushed the andrew-d:andrew/gvisor branch to e38bc1c Dec 10, 2018
@andrew-d
Copy link
Contributor Author

@andrew-d andrew-d commented Dec 10, 2018

Just pushed an alternate version; this is now generated by a very WIP script that will parse the Bazel resolved-dependencies file and attempt to convert it to a Nix file. It's pretty hacky, but I'm heading to bed and figured I'd drop it here for now!

@Profpatsch
Copy link
Member

@Profpatsch Profpatsch commented Dec 10, 2018

thoughts on this new approach?

I really like it. Would be nice to split out the parser to get a generic transformation from bazel lockfile to nixpkgs package. Of course then the generated code must be overridable, I can help with that if you want. See https://github.com/Profpatsch/yarn2nix/tree/master/nix-lib for an example on how that can be done (there might be some code-reuse possible).

@benpye
Copy link
Contributor

@benpye benpye commented Feb 12, 2019

@andrew-d Wondered if you ever continued with this? I really like the idea of running services on my NixOS machine within gVisor KVM containers, especially for things like the Unifi controller I run, a big Java behemoth.

@andrew-d
Copy link
Contributor Author

@andrew-d andrew-d commented Feb 13, 2019

@benpye - The short answer is "not yet"; I'm currently trying my hand at a slightly more generic "Bazel to Nix" translator, and once I get that working will update this PR with the generated code. This branch does work, though, if you want to apply it to a local fork!

@ghuntley
Copy link
Member

@ghuntley ghuntley commented Jun 22, 2019

@andrew-d anything you need help with? This is rad.

@andrew-d andrew-d mentioned this pull request Nov 9, 2019
9 of 10 tasks complete
@flokli flokli closed this in #73097 Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.