New slurm read plugins (do not necessarily merge) #1198
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Slurm is a job scheduler and workload manager that is very popular among the HPC community.
This pull request proposes 2 new read plugins to get metrics out of Slurm:
slurmctld
plugin which sends RPC toslurmctld
daemon to get the number of CPU and nodes allocated to jobs. It relies on thelibslurm
library. Please note this plugin requires theGlobals true
parameter since thelibslurm
library also load a 2nd row of plugins withdlopen()
. This plugin is designed to run on Slurm job scheduler server of the supercomputer.slurmd
plugin which browse Slurm job specifics (cpuset) cgroups to get the jiffies of the CPU allocated to jobs and do the sum of memory PSS of all PIDs of the job. This plugin requires theTaskPlugin=task/cgroup
to be enable in Slurm configuration file. This plugin is desgined to run on all compute nodes of a supercomputer.My intent is not necessarily to merge these plugins in upstream collectd repository (unless you would be happy to do so), but I would be pleased to get code review from peers and people used to develop such collectd plugins :)
I've tried to follow the development guidelines and re-use existing functions as much as possible. As I am not an experienced C developer very familiar with portability considerations (only tested on Linux and amd64), I would be pleased to get feedback on this matter too!