Skip to content

New slurm read plugins (do not necessarily merge) #1198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

rezib
Copy link

@rezib rezib commented Aug 4, 2015

Slurm is a job scheduler and workload manager that is very popular among the HPC community.

This pull request proposes 2 new read plugins to get metrics out of Slurm:

  • The slurmctld plugin which sends RPC to slurmctld daemon to get the number of CPU and nodes allocated to jobs. It relies on the libslurm library. Please note this plugin requires the Globals true parameter since the libslurm library also load a 2nd row of plugins with dlopen(). This plugin is designed to run on Slurm job scheduler server of the supercomputer.
  • The slurmd plugin which browse Slurm job specifics (cpuset) cgroups to get the jiffies of the CPU allocated to jobs and do the sum of memory PSS of all PIDs of the job. This plugin requires the TaskPlugin=task/cgroup to be enable in Slurm configuration file. This plugin is desgined to run on all compute nodes of a supercomputer.

My intent is not necessarily to merge these plugins in upstream collectd repository (unless you would be happy to do so), but I would be pleased to get code review from peers and people used to develop such collectd plugins :)

I've tried to follow the development guidelines and re-use existing functions as much as possible. As I am not an experienced C developer very familiar with portability considerations (only tested on Linux and amd64), I would be pleased to get feedback on this matter too!

@rezib
Copy link
Author

rezib commented Sep 10, 2015

I just fixed my commit for x86_32 and rebased it on top on current master branch.

hmlth and others added 3 commits February 16, 2018 17:41
When slurmd is build with multiple slurmd per node support, the
name of the subdirectory where the cgroups are created changes to
include the hostname (cpuset/slurm_NODENAME instead of
cpuset/slurm). This patch supports this by trying both and add an
option to select a node name if the nodename is not the local
hostname.
This is the default in Slurm starting with version >=16.05.
@lukeyeager
Copy link
Contributor

Can probably close this since #3037 was merged.

@mrunge
Copy link
Member

mrunge commented Jul 30, 2020

Yes, thank you all involved here. I'm closing this, since #3037 was merged.

@mrunge mrunge closed this Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants