Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurmd crashes when there is more than one gpu #27

Closed
cmd-ntrf opened this issue Feb 21, 2020 · 1 comment
Closed

slurmd crashes when there is more than one gpu #27

cmd-ntrf opened this issue Feb 21, 2020 · 1 comment
Assignees
Labels
bug Something isn't working
Milestone

Comments

@cmd-ntrf
Copy link
Member

cmd-ntrf commented Feb 21, 2020

Jacob Boschee (@jboschee) reported the following error on a node with two gpus:

Feb 20 18:59:13 node2.int.rubberducky.calculquebec.cloud slurmd[57977]: fatal: gres.conf duplicate records for gpu

The corresponding gres.conf:

###########################################################
# Slurm's Generic Resource (GRES) configuration file
# Use NVML to gather GPU configuration information
# Information about all other GRES gathered from slurm.conf
###########################################################
AutoDetect=nvml
Name=gpu
Name=gpu
@cmd-ntrf cmd-ntrf added the bug Something isn't working label Feb 21, 2020
@cmd-ntrf cmd-ntrf self-assigned this Feb 21, 2020
@cmd-ntrf
Copy link
Member Author

AutoDetect=nvml is sufficient. The solution is to remove the part of the template that generates Name=gpu.

@cmd-ntrf cmd-ntrf added this to the 6.0 milestone Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant