Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for getting node name from env var instead of hostname #50

Closed
rptaylor opened this issue Aug 18, 2022 · 2 comments
Closed

support for getting node name from env var instead of hostname #50

rptaylor opened this issue Aug 18, 2022 · 2 comments

Comments

@rptaylor
Copy link

Hi @PalNilsson
Running jobs on kubernetes, we face the issue that processes running in pods see the pod name as the host name of the node. For example, in a pod named grid-job-16703192-mp47k (which is effectively the batch job ID):

bash-4.2$ hostname
grid-job-16703192-mp47k

This makes sense for k8s, and is appropriate because each pod does have its own unique IP address, but it doesn't fit well with Panda; the result is: https://bigpanda.cern.ch/wns/CA-VICTORIA-K8S-T2/?hours=12
Every "node" is a random unique ID so it is very difficult to correlate jobs to real nodes and identify problematic nodes.

We can easily expose the real node name as an env var:

bash-4.2$ echo $MY_NODE_NAME
cluster-dev-k8s-node-a-2

So I would like to propose that the pilot look for some specific env var (maybe PANDA_NODE_NAME, PILOT_NODE_NAME ?), and if it exists, it uses that instead of the result of hostname when reporting details to the Panda server. Would that be reasonable?

@PalNilsson
Copy link
Collaborator

Hi (back from vacation). Using PILOT_NODE_NAME sounds good. I can look for it and use it instead of hostname if set.

@PalNilsson
Copy link
Collaborator

Moved the discussion to https://its.cern.ch/jira/browse/ATLASPANDA-641 (I'm suggesting to use existing env var PANDA_HOSTNAME instead of a new one).

Implemented in dev pilot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants