Skip to content

Latest commit

 

History

History
68 lines (46 loc) · 3.03 KB

tbg.rst

File metadata and controls

68 lines (46 loc) · 3.03 KB

TBG

Axel Huebl

René Widera

Our tool template batch generator (tbg) abstracts program runtime options from technical details of supercomputers. On a desktop PC, one can just execute a command interactively and instantaneously. Contrarily on a supercomputer, resources need to be shared between different users efficiently via job scheduling. Scheduling on today's supercomputers is usually done via batch systems that define various queues of resources.

An unfortunate aspect about batch systems from a user's perspective is, that their usage varies a lot. And naturally, different systems have different resources in queues that need to be described.

PIConGPU runtime options are described in configuration files (.cfg). We abstract the description of queues, resource acquisition and job submission via template files (.tpl). For example, a .cfg file defines how many devices shall be used for computation, but a .tpl file calculates how many physical nodes will be requested. Also, .tpl files takes care of how to spawn a process when scheduled, e.g. with mpiexec and which flags for networking details need to be passed. After combining the machine independent (portable) .cfg file from user input with the machine dependent .tpl file, tbg can submit the requested job to the batch system.

Last but not least, one usually wants to store the input of a simulation with its output. tbg conveniently automates this task before submission. The .tpl and the .cfg files that were used to start the simulation can be found in <tbg destination dir>/tbg/ and can be used together with the .param files from <tbg destination dir>/input/.../param/ to recreate the simulation setup.

In summary, PIConGPU runtime options in .cfg files are portable to any machine. When accessing a machine for the first time, one needs to write template .tpl files, abstractly describing how to run PIConGPU on the specific queue(s) of the batch system. We ship such template files already for a set of supercomputers, interactive execution and many common batch systems. See $PICSRC/etc/picongpu/ and our list of systems with .profile files <install-profile> for details.

Usage

../../bin/tbg --help

.cfg File Macros

Feel free to copy & paste sections of the files below into your .cfg, e.g. to configure complex plugins:

../../TBG_macros.cfg

Batch System Examples

Axel Huebl, Richard Pausch

Slurm

Slurm is a modern batch system, e.g. installed on the Taurus cluster at TU Dresden, Hemera at HZDR, Cori at NERSC, among others.

LSF

LSF (for Load Sharing Facility) is an IBM batch system (bsub/BSUB). It is used, e.g. on Summit at ORNL.