Skip to content

LIKWID and SLURM

Thomas Gruber edited this page Jun 15, 2023 · 7 revisions

LIKWID and SLURM

When running jobs on a HPC system, it is nowadays commonly managed through the job scheduler SLURM. If you want access to the performance monitoring features of your system, some cluster-specific flags might be required when submitting a job. When a job is started, it is commonly restricted to the requested resources which might interfere with the execution of LIKWID (and other tools). This page contains some helpful hints for users as well as configuration ideas for administrators.

Using LIKWID in SLURM jobs

likwid-perfctr

Check with your compute center if they are running some sort of job-specific monitoring that might interfere with your reading hardware-performance counters. Usually, there will be the possibility to disable the job-specific monitoring for individual jobs with additional parameters during job submission.

likwid-mpirun

Enabling CPU performance monitoring

Using the accessdaemon mode

Using the perf_event mode

To avoid possible security and privacy concerns it is advisable to set the paranoid value (see likwid-perfctr) to 0 if a compute job has allocated a compute node exclusively. An exclusive usage also avoids contention issues in local shared resources during benchmarking.

A suitable way for HPC clusters with Slurm is to configure a prolog that detects if a job is running exclusively on a node and then sets /proc/sys/kernel/perf_event_paranoid to 0. Correspondingly, an epilog is needed that sets it back to the default value of 2.

Enabling Nvidia GPU performance monitoring

Reading of GPU performance counter by non-admin users is often disabled on HPC clusters due to security concerns. Trying to read them you will get a message referring you to the Nvidia-documentation about ERR_NVGPUCTRPERM.

A suitable way for HPC clusters with Slurm is to configure a prolog that detects if a job is running exclusively on a node and then

  1. stops all systemd services accessing the GPU devices, for example nvidia-persistenced and nvidia-persistenced,
  2. then unloads all relevant nvidia kernel modules, for example modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nv_peer_mem nvidia,
  3. then reloads the nvidia kernel module with the current parameter, modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0
  4. and finally starts the services again.

A corresponding epilog also needs to be created where modprobe nvidia NVreg_RestrictProfilingToAdminUsers=1 is used instead. Be warned that such a prolog and epilog increase the job start/end duration because especially the restart of the nvidia systemd-services can take some time, likely up to one minute. A workaround would be to add a SPANK plugin that makes enabling the access to performance counters optional via a job submission parameter.

SLURM integration

Integrating hardware performance counting into SLURM is a two-step process. If there is some system monitoring in place that also requires hardware performance counter access, you can allow it for distinct jobs as long as they are using the nodes exclusively. We at NHR@FAU use the SLURM contraint hwperf for jobs that require access and otherwise let the system monitoring get the counts. At first, at job submission, a SLURM submit filter checks whether the nodes are requested exclusively in case of hwperf. Afterwards in the prologue/epilogue scripts, we set the permissions and ownerships to allow LIKWID and other perf_event-based tools.

SLURM submit filter

Accessing hardware performance counters should only be allowed in user-exclusive environments due to security reasons.

-- all jobs with constraint hwperf need to allocate node exclusively
for feature in string.gmatch(job_desc.features or "", "[^,]*") do
    if ( feature == "hwperf" and job_desc.shared ~= 0 ) then
        slurm.log_info("slurm_job_submit: job from uid %u with constraint hwperf but not exclusive", job_desc.user_id )
        slurm.user_msg("--constraint=hwperf only available for node-exclusive jobs with --exclusive")
        return 2029 --- slurm.ERROR ESLURM_INVALID_FEATURE
    end
end

Epilogue/Prologue

Here is a combined version for CPU profiling. For Nvidia GPU profiling, you have to add the relevant calls. It is helpful to check with a SLURM submit script that only jobs with exclusive node usage can set the hwperf flag used here:

if [[ "$SLURM_JOB_CONSTRAINTS" =~ "hwperf" ]] ; then
    chown $SLURM_JOB_USER /var/run/likwid.lock
    # Also grant permission to use performance counters via perf interface (e.g. with vtune)
    echo 0 > /proc/sys/kernel/perf_event_paranoid
fi
if [[ "$SLURM_JOB_CONSTRAINTS" =~ "hwperf" ]] ; then
    chown $MONITORING_USER /var/run/likwid.lock
    # Also disable permission to use performance counters via perf interface (e.g. with vtune)
    echo 2 > /proc/sys/kernel/perf_event_paranoid
fi

You could also use the latest paranoid setting of 4 but it is only provided by some Linux distributions yet.

Clone this wiki locally