-
Notifications
You must be signed in to change notification settings - Fork 212
Description
When dstack creates a jump pod, it picks an external IP of a random node:
dstack/src/dstack/_internal/core/backends/kubernetes/utils.py
Lines 57 to 64 in fb4a4da
| def get_cluster_public_ip(api: CoreV1Api) -> Optional[str]: | |
| """ | |
| Returns public IP of any cluster node. | |
| """ | |
| public_ips = get_cluster_public_ips(api) | |
| if len(public_ips) == 0: | |
| return None | |
| return public_ips[0] |
It works in most cases, as each node listens on NodePort and routes traffic:
Every node in the cluster configures itself to listen on that assigned port and to forward traffic to one of the ready endpoints associated with that Service. You'll be able to contact the type: NodePort Service, from outside the cluster, by connecting to any node using the appropriate protocol (for example: TCP), and the appropriate port (as assigned to that Service).
https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
But on some managed solutions, namely Nebius, NodePort is only accessible via the external IP of the node where the pod is running. It works as expected with private IPs, meaning that one can access the service from any node using any node's private IP, but with public IPs traffic forwarding doesn't work for some reason.
To mitigate the issue, dstack should use the pod's node external IP when possible.