Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tag: slurm-1-1-20-1
Fetching contributors…

Cannot retrieve contributors at this time

file 164 lines (146 sloc) 6.775 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
Simple build/install on Linux:
  ./autogen.sh
  ./configure --enable-debug \
              --prefix=<install-dir> --sysconfdir=<config-dir>
  make
  make install

If you make changes to Makefile.am files, then on _MCR_, run
 ./autogen.sh
then check-in the new Makefile.am and Makefile.in files

Here is a step-by-step HOWTO for creating a new release of SLURM on a
Linux cluster (See BlueGene and AIX specific notes below for some differences).
0. svn co https://eris.llnl.gov/svn/slurm/trunk slurm
   svn co https://eris.llnl.gov/svn/chaos/private/buildfarm/trunk buildfarm
   put the buildfarm directory in your search path
1. Update NEWS and META files for the new release. In the META file,
   the API, Major, Minor, Micro, Version, and Release fields must all
   by up-to-date. **** DON'T UPDATE META UNTIL RIGHT BEFORE THE TAG ****
   The Release field should always be 1 unless one of
   the following is true
   - Changes were made to the spec file, documentation, or example
     files, but not to code.
   - this is a prerelease (Release = 0.preX)
2. Tag the repository with the appropriate name for the new version.
   svn copy https://eris.llnl.gov/svn/slurm/trunk \
     https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3 \
     -m "description"
3. Use the rpm make target to create the new RPMs. This requires a .rpmmacros
   (.rpmrc for newer versions of rpmbuild) file containing:
%_slurm_sysconfdir /etc/slurm
%_enable_debug "--enable-debug"
   I usually build with using the following syntax:
   build -s https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3
   NOTE: For v1.0 and earlier add: --pre-exec='./autogen.sh'
4. Move the RPMs to
   /usr/local/admin/rpms/llnl/RPMS-RHEL4/x86_64 (odevi, or gauss)
   /usr/local/admin/rpms/llnl/RPMS-RHEL4/i386/ (mdevi)
   /usr/local/admin/rpms/llnl/RPMS-RHEL4/ia64/ (tdevi)
   send an announcement email (with the latest entry from the NEWS
   file) out to linux-admin@lists.llnl.gov.
5. Copy tagged bzip file (e.g. slurm-0.6.0-0.pre3.bz2) to FTP server
   for external SLURM users.
6. Copy bzip file and rpms (including src.rpm) to sourceforge.net:
   ncftp upload.sf.net
   cd upload
   put filename
   Use SourceForge admin tool to ad new release, including changelog.

BlueGene build notes:
3. Use the rpm make target to create the new RPMs. This requires a .rpmmacros
   (.rpmrc for newer versions of rpmbuild) file containing:
%_slurm_sysconfdir /etc/slurm
%_enable_debug "--enable-debug"
%with_cflags CFLAGS=-m64 CXX="g++ -m64"
   Build on Service Node with using the following syntax
   build -s https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3
 4. Copy RPMs to /usr/admin/sles/llnl/RPMS-SLES9
    Do _not_ copy the switch-elan, authd-authd,
    aix-federation or auth-none RPMs

To build and run on AIX:
0. svn co https://eris.llnl.gov/svn/slurm/trunk slurm
   svn co https://eris.llnl.gov/svn/slurm/private/proctrack-aix/trunk proctrack
   svn co https://eris.llnl.gov/svn/buildfarm/trunk buildfarm
   put the buildfarm directory in your search path
   Also, you will need two commands to appear FIRST in your PATH:

      /usr/local/tools/gnu/aix_5_64_fed/bin/install
      /usr/local/gnu/bin/tar

   I do this by making symlinks to those commands in the buildfarm directory,
   then making the buildfarm directory the first one in my PATH.
1. export OBJECT_MODE=32
2. Build with:
   ./configure --enable-debug --prefix=/opt/freeware \
--sysconfdir=/opt/freeware/etc/slurm
        --with-proctrack=<your directory>/proctrack \
--with-ssl=/opt/freeware --with-munge=/opt/freeware
   make
   make uninstall # remove old shared libraries, aix caches them
   make install
3. To build RPMs (NOTE: Many GNU tools are required):
   Create a file specifying system specific files:
#
# RPM Macros for use with SLURM on AIX
# The system-wide macros for RPM are in /usr/lib/rpm/macros
# and this overrides a few of them
#
%_prefix /opt/freeware
%_slurm_sysconfdir %{_prefix}/etc/slurm
        %_defaultdocdir %{_prefix}/doc

%_enable_debug "--enable-debug"
%with_proctrack "--with-proctrack=<your directory>/proctrack"
%with_ssl "--with-ssl=/opt/freeware"
%with_munge "--with-munge=/opt/freeware"
   build -s https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3
4. export MP_RMLIB=./slurm_ll_api.so
   export CHECKPOINT=yes
5. poe hostname -rmpool debug
6. To debug, set SLURM_LL_API_DEBUG=3 before running poe - will create a file
     /tmp/slurm.*
   It can also be helpful to use poe options "-ilevel 6 -pmdlog yes"
   There will be a log file create named /tmp/mplog.<jobid>.<taskid>
7. If you update proctrack, be sure to run "slibclean" to clear cached
   version.
8. Install the rpms slurm-*.ppc.rpm, slurm-aix-federation-*.ppc.rpm,
   slurm-auth-munge-*.ppc.rpm, slurm-devel-*.ppc.rpm, and
   slurm-sched-wiki-*.ppc.rpm in /usr/admin/inst.image/slurm/aix5.3 on an
   OCF AIX machine (pdev is a good choice).

AIX/Federation switch window problems
To clean switch windows: ntblclean =w 8 -a sni0
To get switch window status: ntblstatus

BlueGene bglblock boot problem diagnosis
  - Logon to the Service Node (bglsn, ubglsn)
  - Execute /admin/bglscripts/fatalras
    This will produce a list of failures including Rack and Midplane number
    <date> R<rack> M<midplane> <failure details>
  - Translate the Rack and Midplane to SLURM node id: smap -R r<rack><midplane>
  - Drain only the bad SLURM node, return others to service using scontrol

Configuration file update procedures:
  - cd /usr/bgl/dist/slurm (on bgli)
  - co -l <filename>
  - vi <filename>
  - ci -u <filename>
  - make install
  - then run "dist_local slurm" on SN and FENs to update /etc/slurm

Some RPM commands:
  - rpm -querry --all | grep slurm
  - rpm --erase package_name
  - rpm --install --ignoresize file_name
For main SLURM plugin installation on BGL service node:
  - rpm --install --force --nodeps --ignoresize slurm-#.rpm


To clear a wedged job:
  /bgl/startMMCSconsole
  > delete bgljob ####
  > free RMP###

Starting and stopping daemons on Linux:
  /etc/init.d/slurm stop
  /etc/init.d/slurm start

Patches:
  - cd to the top level src directory
  - Run the patch command with epilog_complete.patch as stdin:
    patch -p[path_level_to_filter] [--dry-run] < epilog_complete.patch

CVS and gnats:
Include "gnats:<id> e.g. "(gnats:123)" as part of cvs commit to
automatically record that update in gnats database. NOTE: Does
not change gnats bug state, but records source files associated
with the bug.

For memory leaks (for AIX use zerofault, zf; for linux use valgrind)
 valgrind --tool=memcheck --leak-check=yes --num-callers=6 --leak-resolution=med ./slurmctld

Remember to test on ia64, i386, BGL, and AIX.
Something went wrong with that request. Please try again.