Skip to content

how to install nvkind cluster on a node that can only access the external network by setting http_proxy #51

@markluofd

Description

@markluofd

I have a node with 8xA800 nvidia cards,but this node can only access the external network when http_proxy and https_proxy are set.

when I deploy a nvkind cluster, there's the error message:

nvkind cluster create --name gpu-cluster --config-template=one-worker-per-gpu.yaml
Creating cluster "gpu-cluster" ...
✓ Ensuring node image (kindest/node:v1.34.0) 🖼
✓ Preparing nodes 📦 📦 📦 📦 📦 📦 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-gpu-cluster"
You can now use your cluster with:

kubectl cluster-info --context kind-gpu-cluster

Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/
Ign:1 http://deb.debian.org/debian bookworm InRelease
Ign:2 http://deb.debian.org/debian bookworm-updates InRelease
Ign:3 http://deb.debian.org/debian-security bookworm-security InRelease
Ign:1 http://deb.debian.org/debian bookworm InRelease
Ign:2 http://deb.debian.org/debian bookworm-updates InRelease
Ign:3 http://deb.debian.org/debian-security bookworm-security InRelease
Ign:1 http://deb.debian.org/debian bookworm InRelease
Ign:2 http://deb.debian.org/debian bookworm-updates InRelease
Ign:3 http://deb.debian.org/debian-security bookworm-security InRelease
Err:1 http://deb.debian.org/debian bookworm InRelease
Temporary failure resolving 'agent.baidu.com'
Err:2 http://deb.debian.org/debian bookworm-updates InRelease
Temporary failure resolving 'agent.baidu.com'
Err:3 http://deb.debian.org/debian-security bookworm-security InRelease
Temporary failure resolving 'agent.baidu.com'
Reading package lists...
W: Failed to fetch http://deb.debian.org/debian/dists/bookworm/InRelease Temporary failure resolving 'agent.baidu.com'
W: Failed to fetch http://deb.debian.org/debian/dists/bookworm-updates/InRelease Temporary failure resolving 'agent.baidu.com'
W: Failed to fetch http://deb.debian.org/debian-security/dists/bookworm-security/InRelease Temporary failure resolving 'agent.baidu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package gpg
bash: line 4: gpg: command not found

I would like to ask if there is a way to resolve the issue of the gpu-cluster-worker container being unable to access the external network?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions