Skip to content

PBS on Charlie

Ben Tupper edited this page Jun 18, 2024 · 75 revisions

How to use PBS on Charlie

Table of Contents

CFE vs C1

Charlie is Bigelow’s High Performance Computer, cfe is the front end of Charlie, and c1 is one of machines that is part of Charlie. The front end is the space that someone looking to use Charlie would log in on to request time and resources for their particular job. Then PBS, a job scheduling software, will direct the job to the appropriate machine to be run. In this way we can submit jobs from cfe to be run on c1 and we never log on to c1 directly.

Not logging on to c1 to submit jobs directly is a good thing, because we can much more efficiently schedule computing resources using PBS and submitting jobs from cfe. In short, we will only use cfe as a log in and file management platform, all jobs must be sent to c1 through the scheduler to be run.

Submitting a job from CFE with qsub shellscript.sh

In order to submit a job to the PBS scheduler from cfe, first we will write a submission script. To write your first submission script, create a new file called submission.sh and follow the steps outlined in the Submission Script Guidelines section.

Once you have completed your submission.sh script, and saved it in your home directory on cfe, then simply type the command qsub submission.sh on the command line.

If everything has worked correctly PBS will print the jobID tag to your terminal screen. It is important to remember the jobID since without it you will not be able to access the job if you need to modify parameters, or delete it.

For example, when we type qsub submission.sh on the command line, PBS prints 787.cfe1, so our jobID is 787.cfe1.

The script below is an example format of a basic submission script. Note that lines that set PBS parameters begin with #PBS.

#!/bin/bash                                                                     

## set name of script                                                           
#PBS -N script_name                                                             

## send the environment variables with job 
#PBS -V

## set the queue                                                                          
#PBS -q route                                                                   

## give job 10 minutes                        
#PBS -l walltime=00:10:00 

## use one compute node and one cpu (this will default to use 2gb of memory)                                                      
#PBS -l select=1:ncpus=1:mem=4GB    
                                                              
## output files placed in output directory in the user vcc’s home directory                                     
#PBS -e /home/vcc/output                                                           
#PBS -o /home/vcc/output                                                          


## jobs to submit                                                               
echo start
/bin/sleep 10
echo finished

The script creates a new job named script_name that has access to 1 cpu for 10 minutes and can use up to 2 GB of memory, which is the default setting for memory on a single compute node. The PBS directive #PBS -l select=1:ncpus=1 could be changed to #PBS -l mem=400mb,ncpus=1 if a user for instance wanted to specify more or less memory. The script writes then the job output and errors to a directory called output in the users home directory. Then we see that the script submits a very simple job in which it writes start to the output, then waits and writes finished to the output file.

Managing output and error files will be important for the user. Before submitting this script you should change vcc to your username so that output and error files end up in your home directory. Then create an output directory in your home directory by typing the command mkdir output on the command line.

View Job Status with qstat

We can monitor how our submitted jobs are progressing by typing qstat on the command line. This will generate a table similar to the one below. Here the user vcc is running the submission script script_name four times. In the status (S) column, two jobs are marked with E, meaning that the job has finished. The third job, 792.cfe1, is marked with R, indicating that it is running, while the fourth job marked Q is waiting to be run in the normal Queue.

Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
790.cfe1          script_name      vcc               00:00:00 E normal          
791.cfe1          script_name      vcc               00:00:00 E normal          
792.cfe1          script_name      vcc               00:00:00 R normal 
793.cfe1          script_name      vcc               00:00:00 Q normal 

The simple qstatcommand above provides useful basic information about the status and time used for each job submitted. More detailed information is provided with the command qstat -a :

cfe1: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname            SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ------------       ------ --- --- ------ ----- - -----
790.cfe1        vcc      normal    script_name        6022   1   1    2gb 00:10  E 00:00
791.cfe1        vcc      normal    script_name        6023   1   1    2gb 00:10  E 00:00
792.cfe1        vcc      normal    script_name        6024   1   1    2gb 00:10  R 00:00
793.cfe1        vcc      normal    script_name        6025   1   1    2gb 00:10  Q 00:00

Modify job Parameters with qalter

Most of the job parameters can be modified while the job is waiting in the queue to be run. Once the jobs is running, however, we will only be able to decrease the amount of wall time for the job.

For example we could increase the wall time of job 793.cfe1 since it is still in the queue: qalter -l walltime=20:00 793, so the basic format of a command to alter a job parameter is qalter <PBS command> <JobID>. A PBS command is similar to a PBS directive, the only difference is that we will exclude the #PBS preface that is part of the directive in the command form. In this way we can easily change any of the job parameters that we set in the script with PBS directives, using the qalter command followed by a PBS command. For example, in the example submission script we name the job script_name using the PBS directive #PBS -N script_name , so the change the name on job 793.cfe1 to new_name we would use the command qalter -N new_name 793. In this way we can easily change all of the job parameters that we sent in our submissions script as long as our job has not begun to run.

In the event that our job status is R, then we may decrease the wall time of our job. For example, we could decrease the wall time of 792.cfe to 5 minutes from the 10 minutes that it was originally assigned using the command qalter -l walltime=05:00 792.

Delete a Job with qdel

Deleting a jobs using PBS is easy. Simply type the command qdel <job ID> into the command line while logged in on cfe. For instance, if we were to delete the submission.sh job submission from above, the command would be qdel 787.cfe1.

Submitting the Same Job Multiple Times

Often it is necessary to run the same job multiple times using either different input files or parameters. If each job under the different input files or parameters will require the same maximum wall time and number of compute resources, then an efficient way to submit your jobs is to use a job array. It is easy to convert your basic jobs submission script to a job array script, simply add the PBS directive #PBS -J <number of jobs>. If you want to run the job script for 10 jobs, then substitute 1-10 for number of jobs, and PBS will create jobs with index values 1,2,3,...,10; alternatively to produce 10 jobs you could substitute 0-9, and PBS would create index values for the jobs 0,1,2,...,9.

Keep in mind that the shell indexes arrays starting from 0 (not 1 for you R folks!) For example...

#PBS -J 0-4
groups=(invertebrate plant vertebrate mammalian rodent)
module use /mod/bigelow
module load R 
Rscript /my/nice/script.R ${groups[${PBS_ARRAY_INDEX}]}

Jobs arrays group the set of jobs submitted under the same jobID. For example, if I submit a job array script, PBS returns a jobID in the format 925[].cfe1. So, if I want to access a particular job in the array, I just include its index value between the brackets. In this way, to access the jobs with index value 2, I would use the jobID 925[2].cfe1. The output for the jobs with index 2 in the array is written to a file called 925[2].cfe1.OU. To manage your output files it is recommendable to create an output directory in your home directory. To do this you can use the command mkdir output on the command line.

Under the Submission Script Guidelines section the script linked to Submit Multiple Jobs Using Arrays gives an example of a job script that uses a job array. This script performs a very simple task, it determines if the jobs index is even or odd and the prints a message to the output file of the particular job. We can access the job index inside our script using the variable $PBS_ARRAY_INDEX, in this way you can see that it will be easy to assign different parameters or input files to the jobs in the array based on the job index value.

Interactive Jobs (for development and debugging)

Interactive jobs allow the user to log on to c1 via a scheduled session for the purpose of debugging or new feature development. An interactive job can be created by typing qsub -I submission.sh on the command line. This submission.sh script will be different than the ones used above in that it will only contain PBS directives, leaving out any code that refers to jobs that you would like to test. A PBS directive starts with a # symbol followed by commands that tell PBS how much time and resources to allow you to use on Charlie. An example submission script for an interactive session is shown below:

#!/bin/sh
#PBS -l walltime=1:00:00 
#PBS -l select=1:ncpus=1
#PBS -q devel

The above script requests a session on c1 lasting 1 hour and using 1 cpu. For most jobs a single cpu will be sufficient, unless working with parallel programs. In this way, after 1 hour, our session will close automatically. The final line in our submission script #PBS -q devel tells PBS which queue we would like to use. When using an interactive session the devel queue is the best queue to use because this queue makes sure that the job will be run right away so we can get our test results as fast as possible.

Additionally we do not need to use a submission script for starting an interactive session. We may type the information necessary to start a session directly into the command line as follows: qsub -I -l walltime=1:00:00 -l select=1:ncpus=1 -q level. This command will produce the same results as described above.

Once an interactive job has been submitted, PBS will print:

qsub: waiting for job <Job ID> to start
qsub: job <Job ID> ready

Once PBS prints that the job is ready, its status is R, and thus it will not be possible to adjust the amount of time of the session. To end an interactive session, simply type exit on the command line.

Setting Email Notifications

You can add additional PBS directives in your script to allow for an email to be sent to your when the jobs has begun running, finished running or has been aborted (due to errors or having been killed). By adding the PBS directive #PBS -m bea to your submission script you will receive and email once execution has begun (b), when execution has terminated (e) and when execution has been aborted (a). Then include the email address you would like the message to be sent to with the following directive: #PBS -M username@bigelow.org.

Submission Script Guidelines: