New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New slurm read plugins (do not necessarily merge) #1198

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@rezib
Copy link

rezib commented Aug 4, 2015

Slurm is a job scheduler and workload manager that is very popular among the HPC community.

This pull request proposes 2 new read plugins to get metrics out of Slurm:

  • The slurmctld plugin which sends RPC to slurmctld daemon to get the number of CPU and nodes allocated to jobs. It relies on the libslurm library. Please note this plugin requires the Globals true parameter since the libslurm library also load a 2nd row of plugins with dlopen(). This plugin is designed to run on Slurm job scheduler server of the supercomputer.
  • The slurmd plugin which browse Slurm job specifics (cpuset) cgroups to get the jiffies of the CPU allocated to jobs and do the sum of memory PSS of all PIDs of the job. This plugin requires the TaskPlugin=task/cgroup to be enable in Slurm configuration file. This plugin is desgined to run on all compute nodes of a supercomputer.

My intent is not necessarily to merge these plugins in upstream collectd repository (unless you would be happy to do so), but I would be pleased to get code review from peers and people used to develop such collectd plugins :)

I've tried to follow the development guidelines and re-use existing functions as much as possible. As I am not an experienced C developer very familiar with portability considerations (only tested on Linux and amd64), I would be pleased to get feedback on this matter too!

@mfournier mfournier added the New plugin label Aug 4, 2015

rezib added some commits Jul 29, 2015

@rezib rezib force-pushed the edf-hpc:slurm branch from b7727e4 to c4d1c27 Sep 10, 2015

@rezib

This comment has been minimized.

Copy link
Author

rezib commented Sep 10, 2015

I just fixed my commit for x86_32 and rebased it on top on current master branch.

hmlth and others added some commits Feb 16, 2018

slurmd: add support for cgroups in slurm_NODENAME
When slurmd is build with multiple slurmd per node support, the
name of the subdirectory where the cgroups are created changes to
include the hostname (cpuset/slurm_NODENAME instead of
cpuset/slurm). This patch supports this by trying both and add an
option to select a node name if the nodename is not the local
hostname.
slurmd: set default cgroup path to /sys/fs/cgroup
This is the default in Slurm starting with version >=16.05.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment