Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
176 lines (156 sloc) 7.02 KB
Simple build/install on Linux:
./configure --enable-debug \
--prefix=<install-dir> --sysconfdir=<config-dir>
make install
If you make changes to files, then on _MCR_, run
then check-in the new and files
Here is a step-by-step HOWTO for creating a new release of SLURM on a
Linux cluster (See BlueGene and AIX specific notes below for some differences).
0. svn co slurm
svn co buildfarm
put the buildfarm directory in your search path
1. Update NEWS and META files for the new release. In the META file,
the API, Major, Minor, Micro, Version, and Release fields must all
The Release field should always be 1 unless one of
the following is true
- Changes were made to the spec file, documentation, or example
files, but not to code.
- this is a prerelease (Release = 0.preX)
2. Tag the repository with the appropriate name for the new version.
svn copy \ \
-m "description"
3. Use the rpm make target to create the new RPMs. This requires a .rpmmacros (.rpmrc for newer versions of rpmbuild) file containing:
%_slurm_sysconfdir /etc/slurm
%_enable_debug "--enable-debug"
I usually build with using the following syntax:
build -s
NOTE: For v1.0 and earlier add: --pre-exec='./'
4. Move the RPMs to
/usr/local/admin/rpms/llnl/RPMS-RHEL4/x86_64 (odevi, or gauss)
/usr/local/admin/rpms/llnl/RPMS-RHEL4/i386/ (mdevi)
/usr/local/admin/rpms/llnl/RPMS-RHEL4/ia64/ (tdevi)
send an announcement email (with the latest entry from the NEWS
file) out to
5. Copy tagged bzip file (e.g. slurm-0.6.0-0.pre3.bz2) to FTP server
for external SLURM users.
6. Copy bzip file and rpms (including src.rpm) to
cd upload
put filename
Use SourceForge admin tool to add new release, including changelog.
BlueGene build notes:
3. Use the rpm make target to create the new RPMs. This requires a .rpmmacros
(.rpmrc for newer versions of rpmbuild) file containing:
%_slurm_sysconfdir /etc/slurm
%_enable_debug "--enable-debug"
%with_cflags CFLAGS=-m64
Build on Service Node with using the following syntax
build -s
4. Copy RPMs to /usr/admin/sles/llnl/RPMS-SLES9
Do _not_ copy the switch-elan, authd-authd,
aix-federation or auth-none RPMs
To build and run on AIX:
0. svn co slurm
svn co buildfarm
put the buildfarm directory in your search path
Also, you will need two commands to appear FIRST in your PATH:
I do this by making symlinks to those commands in the buildfarm directory,
then making the buildfarm directory the first one in my PATH.
Also, make certain that the "proctrack" rpm is installed.
1. export OBJECT_MODE=32
2. Build with:
./configure --enable-debug --prefix=/opt/freeware \
--with-ssl=/opt/freeware --with-munge=/opt/freeware
make uninstall # remove old shared libraries, aix caches them
make install
3. To build RPMs (NOTE: Many GNU tools are required):
Create a file specifying system specific files:
# RPM Macros for use with SLURM on AIX
# The system-wide macros for RPM are in /usr/lib/rpm/macros
# and this overrides a few of them
%_prefix /opt/freeware
%_slurm_sysconfdir %{_prefix}/etc/slurm
%_defaultdocdir %{_prefix}/doc
%_enable_debug "--enable-debug"
%with_ssl "--with-ssl=/opt/freeware"
%with_munge "--with-munge=/opt/freeware"
CC=/usr/bin/gcc build -s
4. export MP_RMLIB=./
export CHECKPOINT=yes
5. poe hostname -rmpool debug
6. To debug, set SLURM_LL_API_DEBUG=3 before running poe - will create a file
It can also be helpful to use poe options "-ilevel 6 -pmdlog yes"
There will be a log file create named /tmp/mplog.<jobid>.<taskid>
7. If you update proctrack, be sure to run "slibclean" to clear cached
8. Install the rpms slurm-*.ppc.rpm, slurm-aix-federation-*.ppc.rpm,
slurm-auth-munge-*.ppc.rpm, slurm-devel-*.ppc.rpm, and
slurm-sched-wiki-*.ppc.rpm in /usr/admin/inst.image/slurm/aix5.3 on an
OCF AIX machine (pdev is a good choice).
AIX/Federation switch window problems
To clean switch windows: ntblclean =w 8 -a sni0
To get switch window status: ntblstatus
BlueGene bglblock boot problem diagnosis
- Logon to the Service Node (bglsn, ubglsn)
- Execute /admin/bglscripts/fatalras
This will produce a list of failures including Rack and Midplane number
<date> R<rack> M<midplane> <failure details>
- Translate the Rack and Midplane to SLURM node id: smap -R r<rack><midplane>
- Drain only the bad SLURM node, return others to service using scontrol
Configuration file update procedures:
- cd /usr/bgl/dist/slurm (on bgli)
- co -l <filename>
- vi <filename>
- ci -u <filename>
- make install
- then run "dist_local slurm" on SN and FENs to update /etc/slurm
Some RPM commands:
rpm -qa | grep slurm (determine what is installed)
rpm -qpl slurm-1.1.9-1.rpm (check contents of an rpm)
rpm -e slurm-1.1.8-1 (erase an rpm
rpm -i --ignoresize slurm-1.1.9-1.rpm (install a new rpm)
For main SLURM plugin installation on BGL service node:
rpm -i --force --nodeps --ignoresize slurm-1.1.9-1.rpm
To clear a wedged job:
> delete bgljob ####
> free RMP###
Starting and stopping daemons on Linux:
/etc/init.d/slurm stop
/etc/init.d/slurm start
- cd to the top level src directory
- Run the patch command with epilog_complete.patch as stdin:
patch -p[path_level_to_filter] [--dry-run] < epilog_complete.patch
To get the process and job IDs with proctrack/sgi_job:
- jstat -p
CVS and gnats:
Include "gnats:<id> e.g. "(gnats:123)" as part of cvs commit to
automatically record that update in gnats database. NOTE: Does
not change gnats bug state, but records source files associated
with the bug.
For memory leaks (for AIX use zerofault, zf; for linux use valgrind)
valgrind --tool=memcheck --leak-check=yes --num-callers=6 --leak-resolution=med ./slurmctld
Before new major release:
- Test on ia64, i386, x86_64, BGL, AIX, OSX, XCPU
- Test on Elan and IB switches
- Test fail-over of slurmctld
- Test for memory leaks in slurmctld and slurmd
- Change API version number
- Review and release web pages
- Review and release code
- Run "make check"
- Test that the prolog and epilog run
- Run the test suite with SlurmUser NOT being self
Something went wrong with that request. Please try again.