Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to query password file entry for "user" #124

Closed
nbacking opened this issue Aug 27, 2022 · 2 comments
Closed

Failed to query password file entry for "user" #124

nbacking opened this issue Aug 27, 2022 · 2 comments
Labels

Comments

@nbacking
Copy link

I rebooted a cluster head node yesterday, using slurm this uses munge to authenticate. Last night started getting error messages when submitting jobs. I am very new to all of this but after reading a lot last night I was able to verify it looks like its running (ouput below). Then I tested with the systemctl status -l munge says it's running, then tried running munge -n | unmunge this was working.

but when job submission starts i still am getting the error: any help would be apprecated.

If munged is up, restart with --num-threads=10
Munge encode failed: Unable to access "/var/run/munge/munge.socket.2": No such file or directory
authentication: invalid authentication credential
batch job submission failed: protocol authentication error

[root@master01 munge]# systemctl status -l munge
● munge.service - MUNGE authentication service
Loaded: loaded (/usr/lib/systemd/system/munge.service; static; vendor preset: disabled)
Active: active (running) since Fri 2022-08-26 23:11:05 EDT; 9h ago
Docs: man:munged(8)
Process: 41542 ExecStart=/usr/sbin/munged --force (code=exited, status=0/SUCCESS)
Main PID: 41544 (munged)
Tasks: 4
CGroup: /system.slice/munge.service
└─41544 /usr/sbin/munged --force

Aug 26 23:11:05 master01 systemd[1]: Starting MUNGE authentication service...
Aug 26 23:11:05 master01 systemd[1]: Started MUNGE authentication service.
[root@master01 munge]# munge -n | unmunge
STATUS: Success (0)
ENCODE_HOST: master01.cm.cluster (10.141.255.254)
ENCODE_TIME: 2022-08-27 08:50:40 -0400 (1661604640)
DECODE_TIME: 2022-08-27 08:50:40 -0400 (1661604640)
TTL: 300
CIPHER: aes128 (4)
MAC: sha256 (5)
ZIP: none (0)
UID: root (0)
GID: root (0)
LENGTH: 0

image

@dun
Copy link
Owner

dun commented Aug 28, 2022

munged appears to be running. From your output above, you are able to encode and decode credentials on host "master01".

The munged socket is created when the daemon starts, and removed when the daemon gracefully terminates. The default location of the socket is listed in the munged --help message for the --socket option (shown in brackets). For example:

$ /usr/sbin/munged --help | grep socket=
  -S, --socket=PATH         Specify local socket [/run/munge/munge.socket.2]

$ /usr/sbin/munged --help | sed -ne '/socket=/ s/.*\[\(.*\)\]/\1/p'
/run/munge/munge.socket.2

The above errors for sbatch and squeue (Failed to access "/var/run/munge/munge.socket.2": No such file or directory) appear to show that munged is not running on the host that invoked sbatch and squeue. Check if munged is running on that host as well. If it is running, you should see the socket /var/run/munge/munge.socket.2.

munged needs to be running on all nodes in the cluster, and its key file will need to be securely copied to all nodes as well.

Regarding this issue's title (Failed to query password file entry for "user"), the following message can be generated by munged:

Info: Failed to query passwd file for "foo": User not found

This is an informational message that occurs when the /etc/group file contains a group to which user "foo" belongs, but user "foo" is not listed in the /etc/passwd file.

@dun dun added the question label Aug 28, 2022
@nbacking
Copy link
Author

nbacking commented Aug 28, 2022

Thanks, that worked the issue was not on the head node, but the cluster node...just a coincidence that I rebooted the cluster at the same time this happened which was confusing but I found the issues on the graphical node and as soon as i restarted the service I was good as gold. thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants