New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Kubernetes example #479
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@adam-singer +@allada +@MarcusSorealheis +@blakehatch
FYI the first revision here can be skipped as it's identical to #471
@allada This is working in my setup and it seems like the cluster also starts up fine in CI. But the build invocation in CI triggers an "UNIMPLEMENTED" error. It seems like the cluster is reachable, but something goes wrong during communication. Maybe I'm doing something wrong in the .json
configs or platform properties? Is there a way to get more info out of the UNIMPLEMENTED
message?
Reviewable status: 0 of 39 files reviewed, all discussions resolved (waiting on @adam-singer, @allada, @blakehatch, and @MarcusSorealheis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 19 files at r2, all commit messages.
Reviewable status: 17 of 39 files reviewed, 13 unresolved discussions (waiting on @aaronmondal, @adam-singer, @blakehatch, and @MarcusSorealheis)
deployment-examples/kubernetes/00_infra.sh
line 41 at r2 (raw file):
for node in $(kind get nodes); do docker exec "${node}" mkdir -p "${REGISTRY_DIR}" cat <<EOF | docker exec -i "${node}" cp /dev/stdin "${REGISTRY_DIR}/hosts.toml"
nit: This seems overly fancy.
deployment-examples/kubernetes/cas.json
line 4 at r2 (raw file):
// `~/.cache/native-link`. It will store all data on disk and // allows for restarts of the underlying service. It is optimized // so objects are compressed, deduplicated and uses some in-memory
nit: There are no in-memory operations.
deployment-examples/kubernetes/cas.json
line 13 at r2 (raw file):
"compression": { "compression_algorithm": { "LZ4": {}
nit: Now lower case in main
.
deployment-examples/kubernetes/cas.json
line 27 at r2 (raw file):
} }, "verify_size": true,
fyi: very slow,
deployment-examples/kubernetes/cas.json
line 32 at r2 (raw file):
}, "AC_MAIN_STORE": { "filesystem": {
nit: From now on, if we ever evict in the CAS, we should use completeness_checking_store
(on ac) w/ existance_cache_store
(on cas).
deployment-examples/kubernetes/cas.json
line 61 at r2 (raw file):
}, // According to https://github.com/grpc/grpc.github.io/issues/371 16KiB - 64KiB is optimal. "max_bytes_per_stream": 64000, // 64kb.
nit: No longer needed in main
.
deployment-examples/kubernetes/cas.json
line 96 at r2 (raw file):
}, // According to https://github.com/grpc/grpc.github.io/issues/371 16KiB - 64KiB is optimal. "max_bytes_per_stream": 64000, // 64kb.
nit: ditto.
deployment-examples/kubernetes/example-do-not-use-in-prod-key.pem
line 1 at r2 (raw file):
-----BEGIN PRIVATE KEY-----
nit: Symlink this file. Git does support symlinks.
deployment-examples/kubernetes/example-do-not-use-in-prod-rootca.crt
line 1 at r2 (raw file):
-----BEGIN CERTIFICATE-----
nit: ditto.
deployment-examples/kubernetes/worker.json
line 3 at r2 (raw file):
{ "stores": { "GRPC_LOCAL_STORE": {
nit: inline this?
deployment-examples/kubernetes/worker.json
line 4 at r2 (raw file):
"stores": { "GRPC_LOCAL_STORE": { // Note: This file is used to test GRPC store.
nit: remove?
deployment-examples/kubernetes/worker.json
line 12 at r2 (raw file):
}, "GRPC_LOCAL_AC_STORE": { // Note: This file is used to test GRPC store.
nit: ditto.
deployment-examples/kubernetes/worker.json
line 57 at r2 (raw file):
}, "container-image": { "values": ["docker://native-link-toolchain:li84k8gzw2qvmzc9qrx6s2vc690lfy2b"]
nit: Can we sniff this out somehow with a command and then use an ENV here to set it?
deployment-examples/kubernetes/worker.yaml
line 21 at r2 (raw file):
env: - name: RUST_LOG value: debug
nit: Very verbose. Maybe info
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 18 of 25 files at r1, 5 of 21 files at r3, 18 of 18 files at r4, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @aaronmondal, @blakehatch, and @MarcusSorealheis)
.bazelrc
line 76 at r4 (raw file):
build:k8s --config=lre build:k8s --remote_instance_name=main build:k8s --remote_cache=grpc://172.20.255.200:50051
What controls the IP configuration, is it always ensured to be these two addresses? Are there additional steps that can make it a hostname from fancy docker commands/configs?
83984e7
to
b4f4eb3
Compare
d4db281
to
2b3813c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 43 files reviewed, 4 unresolved discussions (waiting on @adam-singer, @allada, @blakehatch, and @MarcusSorealheis)
.bazelrc
line 76 at r4 (raw file):
Previously, adam-singer (Adam Singer) wrote…
What controls the IP configuration, is it always ensured to be these two addresses? Are there additional steps that can make it a hostname from fancy docker commands/configs?
Yeah turns out these can slightly differ. On my machines it's these IPs, but on the GHA runners its 172.18.xxx
. I've changed the k8s readme to show how to find out these IPs.
deployment-examples/kubernetes/00_infra.sh
line 41 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: This seems overly fancy.
This pattern was masterfully copied from https://kind.sigs.k8s.io/docs/user/local-registry/ :D But it's a fairly common pattern in K8s scripts where we often have to pipe some file into some command while leaving no traces tmp files on the machine.
It even has it's own Wikipedia article 😆 https://en.wikipedia.org/wiki/Here_document
deployment-examples/kubernetes/cas.json
line 27 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
fyi: very slow,
Is it ok to remove this when using the existence caching store?
deployment-examples/kubernetes/cas.json
line 32 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: From now on, if we ever evict in the CAS, we should use
completeness_checking_store
(on ac) w/existance_cache_store
(on cas).
Not sure if this is correct. The order is a bit confusing 😅
deployment-examples/kubernetes/example-do-not-use-in-prod-key.pem
line 1 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Symlink this file. Git does support symlinks.
Agreed that it would be nicer. However, Kustomize doesn't allow referencing outside the directory where the kustomization.yaml
file is located.
I plan to change this setup to a "better practice" one using cert-manager, but defer this to another PR as a certmanager setup seems out of scope at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 18 of 37 files at r5, all commit messages.
Reviewable status: 24 of 43 files reviewed, 3 unresolved discussions (waiting on @allada and @blakehatch)
This example starts a fairly complete Kubernetes cluster and showcases perfectly reproducible remote execution via the local remote execution toolchain containers. This example uses a three-layer setup process: 1. The infra layer is a kind cluster with Cilium and MetalLB. This layer is built to be easily swappable with more "production grade" clusters. 2. The operations layer deploys a few standard applications that are not inherently required for NativeLink, but are solid deployments that one would likely want running in a cluster. This includes monitoring and handling image availability. 3. The application layer is a straightforward `kubectl apply -k .` which deploys a NativeLink CAS, Worker and Scheduler. This deployment differs from the Docker Compose setup in that it does not make use of any system paths and doesn't allow visibility "outside" of the node itself. That is, it's a hard requirement that the worker image is self-contained. Storage is fully ephemeral in this example and a `kubectl delete -k .` will destroy the cache for quick iterations and cache testing.
Thanks @aaronmondal ! |
This example starts a fairly complete Kubernetes cluster and showcases
perfectly reproducible remote execution via the local remote execution
toolchain containers.
This example uses a three-layer setup process:
is built to be easily swappable with more "production grade"
clusters.
inherently required for NativeLink, but are solid deployments that
one would likely want running in a cluster. This includes monitoring
and handling image availability.
kubectl apply -k .
whichdeploys a NativeLink CAS, Worker and Scheduler. This deployment
differs from the Docker Compose setup in that it does not make use of
any system paths and doesn't allow visibility "outside" of the node
itself. That is, it's a hard requirement that the worker image is
self-contained. Storage is fully ephemeral in this example and a
kubectl delete -k .
will destroy the cache for quick iterations andcache testing.
This change is