-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Breaking] Generalize LRE to arbitrary toolchains #728
[Breaking] Generalize LRE to arbitrary toolchains #728
Conversation
98990b0
to
633e427
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 35 files at r1, 1 of 12 files at r2.
Reviewable status: 0 of 1 LGTMs obtained
flake.nix
line 141 at r2 (raw file):
name = "nativelink"; config = { Entrypoint = [(pkgs.lib.getExe' nativelink "nativelink")];
til: function names can have '
https://github.com/NixOS/nixpkgs/blob/master/lib/meta.nix#L174
deployment-examples/kubernetes/worker-lre-cc.json.template
line 27 at r2 (raw file):
"fast": { "filesystem": { "content_path": "/tmp/.cache/nativelink/data-worker-test/content_path-cas",
Is tmp the intended place for nativelink data? If possible we should be uniformed in examples of where to expect the nativelink directory to be rooted at or maybe not hidden in a dot dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained
deployment-examples/kubernetes/worker-lre-cc.json.template
line 27 at r2 (raw file):
Previously, adam-singer (Adam Singer) wrote…
Is tmp the intended place for nativelink data? If possible we should be uniformed in examples of where to expect the nativelink directory to be rooted at or maybe not hidden in a dot dir.
I'm not sure what the optimal approach is.
We probably want to move away from /root
to better support rootless setups with our recommended defaults. I used /tmp
here because it seemed like a fairly standard choice and is usually given permissions 1777
, i.e. this works without any additional user (shadow
, passwd
etc) setup.
An alternative would be to explicitly configure a user and a home directory.
633e427
to
3f1d704
Compare
d909587
to
7dc5765
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@allada +@adam-singer +@zbirenbaum +@MarcusSorealheis +@blakehatch cc @kubevalet
FYI Parts of the new docs here might seem a bit out of place. I'll change this with to the upcoming Ubuntu Remote exec example which I'll add in a separate commit.
Reviewable status: 0 of 5 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, publish-image, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04 (waiting on @adam-singer, @allada, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 22 of 35 files at r1, 9 of 12 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: 0 of 5 LGTMs obtained (waiting on @allada, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
local-remote-execution/lre-java.nix
line 18 at r3 (raw file):
"${pkgs.gnutar}/bin" ])) "JAVA_HOME=${pkgs.jdk17_headless}/lib/openjdk"
Do we also need openjdk/bin within the PATH?
local-remote-execution/rbe_configs_gen_adjustments.diff
line 14 at r3 (raw file):
+ return "/tmp/workdir" case OSWindows: return "C:/workdir"
Isn't windows C:\
or has /
and \
been universally understood by windows now as the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 LGTMs obtained (waiting on @adam-singer, @allada, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
local-remote-execution/rbe_configs_gen_adjustments.diff
line 14 at r3 (raw file):
Previously, adam-singer (Adam Singer) wrote…
Isn't windows
C:\
or has/
and\
been universally understood by windows now as the same thing?
This indeed looks wrong. But the entire thing falls apart on windows anyways as we can't even build the rbe-configs-gen
tool on windows in the first place lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 LGTMs obtained (waiting on @adam-singer, @allada, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
local-remote-execution/lre-java.nix
line 18 at r3 (raw file):
Previously, adam-singer (Adam Singer) wrote…
Do we also need openjdk/bin within the PATH?
It doesn't affect toolchain generation. However, this seems like a bug in rbe-configs-gen
. It "should" affect toolchain generation.
I believe we should get a generated java_toolchain
. Only having a local_java_runtime
setup generated looks wrong: https://github.com/TraceMachina/nativelink/blob/main/local-remote-execution/generated/java/BUILD
Another reason to rewrite rbe-configs-gen
😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 12 files at r2, all commit messages.
Reviewable status: 0 of 5 LGTMs obtained (waiting on @adam-singer, @allada, @blakehatch, and @MarcusSorealheis)
local-remote-execution/rbe_configs_gen_adjustments.diff
line 14 at r3 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
This indeed looks wrong. But the entire thing falls apart on windows anyways as we can't even build the
rbe-configs-gen
tool on windows in the first place lol.
I think that it should still be set to the correct string for windows since it looks weird to anyone exploring the code. They might think the reason it doesn't work is because of this. I think a note should be added stating windows is unsupported and why in the commit message, as well as here or in a README since having this check gives the impression it is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 LGTMs obtained (waiting on @adam-singer, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
-- commits
line 20 at r3:
nit: side effect
Code quote:
sideeffect
deployment-examples/kubernetes/worker-lre-cc.json.template
line 27 at r2 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
I'm not sure what the optimal approach is.
We probably want to move away from
/root
to better support rootless setups with our recommended defaults. I used/tmp
here because it seemed like a fairly standard choice and is usually given permissions1777
, i.e. this works without any additional user (shadow
,passwd
etc) setup.An alternative would be to explicitly configure a user and a home directory.
I'm personally not a fan of /root
or /tmp
. Maybe:
${HOME}/.cache/
instead? I believe this works on windows, but not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 LGTMs obtained (waiting on @adam-singer, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
deployment-examples/kubernetes/worker-lre-java.yaml
line 21 at r3 (raw file):
env: - name: RUST_LOG value: info
nit: This is super verbose, maybe warn
instead?
deployment-examples/kubernetes/worker-lre-java.json.template
line 31 at r3 (raw file):
"eviction_policy": { // 10gb. "max_bytes": 10000000000,
nit: Maybe make this a variable that is set in the yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 5 LGTMs obtained (waiting on @adam-singer, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
86107ad
to
02e0fc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @adam-singer from 3 discussions.
Reviewable status: 1 of 5 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Vercel, pre-commit-checks, publish-image, ubuntu-20.04 / stable (waiting on @adam-singer, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
deployment-examples/kubernetes/worker-lre-java.json.template
line 31 at r3 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Maybe make this a variable that is set in the
yaml
?
Agreed. I'll defer this to a second pass where I go over all values again to see whether there are other things we want to set as environment variables such as paths.
deployment-examples/kubernetes/worker-lre-cc.json.template
line 27 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
I'm personally not a fan of
/root
or/tmp
. Maybe:
${HOME}/.cache/
instead? I believe this works on windows, but not sure.
'HOME/.cache`seems like the most intuitive approach. It's a dotdir, but this is also the path that other caches use.
However, I'll hold off on this for now since it's unclear to me how to implement the user properly. The current setup doesn't set up users at all which makes it a bit simpler and more generic. Once we add a user we potentially need to add templating etc for the username as well. This is probably something we want to do but it seems better to defer such efforts to when we have Helm charts to handle such templating properly. We could also investigate approaches where we add the user accounts via K8s directly rather than baking them into the container.
local-remote-execution/rbe_configs_gen_adjustments.diff
line 14 at r3 (raw file):
Previously, zbirenbaum (Zach Birenbaum) wrote…
I think that it should still be set to the correct string for windows since it looks weird to anyone exploring the code. They might think the reason it doesn't work is because of this. I think a note should be added stating windows is unsupported and why in the commit message, as well as here or in a README since having this check gives the impression it is supported.
This isn't visible on Reviewable, but this code is actually just a patch. It seems better to not change anything that doesn't have any relevant influence on how rbe-configs-gen
behaved before this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 5 LGTMs obtained, and pending CI: Remote / large-ubuntu-22.04 (waiting on @adam-singer, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
deployment-examples/kubernetes/worker-lre-cc.json.template
line 27 at r2 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
'HOME/.cache`seems like the most intuitive approach. It's a dotdir, but this is also the path that other caches use.
However, I'll hold off on this for now since it's unclear to me how to implement the user properly. The current setup doesn't set up users at all which makes it a bit simpler and more generic. Once we add a user we potentially need to add templating etc for the username as well. This is probably something we want to do but it seems better to defer such efforts to when we have Helm charts to handle such templating properly. We could also investigate approaches where we add the user accounts via K8s directly rather than baking them into the container.
In that case lets make a /nativelink
directory. /tmp
is certainly not the right approach.
02e0fc3
to
1ca2a24
Compare
This refactors the entire remote execution setup. We now use "base" images to supply toolchains and have wrappers to create nativelink workers from those base images. This allows us to "enrich" arbitrary toolchain containers to turn them into nativelink workers. In other words, we now have a framework to import non-Nix containers into our Nix infrastructure, such as "classic" Ubuntu-based toolchain containers. Toolchain generation is now arbitrarily fine-grained. In practice, this means that for instance the Java and C++ toolchains are now separate entities. This has a large impact on the efficiency of multi-toolchain deployments. The Kubernetes example has been updated accordingly. As a side effect of the new container structures the K8s deployment now works without root permissions in the nativelink containers. The LRE infrastructure is now treated as a special case of the new toolchain setup process. The `rbe-configs-gen` logic is now an implementation detail and the generator logic is no longer carried over into the final worker images. This brings down the image size for the LRE containers from ~2.5GB to ~1.7GB for C++ and ~600MB for Java. The slight overall reduction in container sizes is due to the omission of the Bazel executable. Bazel is required to generate the Starlark toolchain configurations but doesn't have to be present in the final worker images.
1ca2a24
to
81295ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 5 LGTMs obtained, and pending CI: docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), windows-2022 / stable (waiting on @adam-singer, @blakehatch, @MarcusSorealheis, and @zbirenbaum)
deployment-examples/kubernetes/worker-lre-cc.json.template
line 27 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
In that case lets make a
/nativelink
directory./tmp
is certainly not the right approach.
Found that not creating a user runs the worker with root permissions. Added a user to the worker wrapper to support rootless usage by default and using the ~/.cache
path by default. We might want to change this to HOME/.cache
when implement this in a more templated fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 12 files at r2, 10 of 10 files at r4, 2 of 2 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: 2 of 5 LGTMs obtained (waiting on @blakehatch, @MarcusSorealheis, and @zbirenbaum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 LGTMs obtained (waiting on @blakehatch and @zbirenbaum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 2 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 2 LGTMs obtained, and 1 discussions need to be resolved
flake.nix
line 142 at r6 (raw file):
contents = [ nativelink pkgs.dockerTools.caCertificates
here
This refactors the entire remote execution setup.
We now use "base" images to supply toolchains and have wrappers to create nativelink workers from those base images. This allows us to "enrich" arbitrary toolchain containers to turn them into nativelink workers.
In other words, we now have a framework to import non-Nix containers into our Nix infrastructure, such as "classic" Ubuntu-based toolchain containers.
Toolchain generation is now arbitrarily fine-grained. In practice, this means that for instance the Java and C++ toolchains are now separate entities. This has a large impact on the efficiency of multi-toolchain deployments. The Kubernetes example has been updated accordingly.
The LRE infrastructure is now treated as a special case of the new toolchain setup process. The
rbe-configs-gen
logic is now an implementation detail and the generator logic is no longer carried over into the final worker images. This brings down the image size for the LRE containers from ~2.5GB to ~1.7GB for C++ and ~600MB for Java. The slight overall reduction in container sizes is due to the omission of the Bazel executable. Bazel is required to generate the Starlark toolchain configurations but doesn't have to be present in the final worker images.This change is