# pbsmgr notebook

we'll go through a small-scale example of _pbsmgr_ tools.

TODO: a multiple job example with different priorities.

In [1]:
import os
import pbsmgr as pbs
pbs.QFile = 'example/job_queue.pkl'
pbs.JobDir = 'example/'

## generate a job

we'll start from the default template and update it with data relevant to our example. note that these functions may be run offline.

In [2]:
jobinfo = pbs.get_job_template(SetID=True)
jobinfo['name'] = ['pbsmgr example']
jobinfo['CodeDir'] = os.path.abspath('.')
jobinfo['JobIndex'] = 0
jobinfo['script'] = 'example/template.sh'
jobinfo

{'BatchID': 1524642184,
 'CodeDir': '/tamir1/dalon/github-test/code',
 'JobIndex': 0,
 'data': None,
 'jobfile': None,
 'name': ['pbsmgr example'],
 'priority': 1,
 'queue': 'tamirs3',
 'resources': {'cput': '04:59:00',
  'mem': '1gb',
  'pmem': '1gb',
  'pvmem': '3gb',
  'vmem': '3gb'},
 'script': 'example/template.sh',
 'status': 'init'}

now let's add some random data for the job to operate on.

In [3]:
from numpy.random import randint
data = randint(1, 100, (1, 10**4))
pbs.generate_data(jobinfo, data)
jobinfo['data']

'example//1524642184/data_0.pkl'

In [4]:
pbs.generate_script(jobinfo)
jobinfo['script']

'example/1524642184/template_1524642184_0.sh'

In [5]:
pbs.add_job_to_queue(jobinfo)

now let's check that a new job was added to our queue:

In [6]:
pbs.get_queue()


1524642184: pbsmgr example
{'init': [0]}

missing jobs: {}

total jobs on PBS queue: 1


and verify that the metadata has been properly stored:

In [8]:
pbs.get_job_info(1524642184, 0)

{'BatchID': 1524642184,
 'CodeDir': '/tamir1/dalon/github-test/code',
 'JobIndex': 0,
 'data': 'example//1524642184/data_0.pkl',
 'jobfile': 'example/1524642184/meta_0.pkl',
 'name': ['pbsmgr example'],
 'priority': 1,
 'queue': 'tamirs3',
 'resources': {'cput': '04:59:00',
  'mem': '1gb',
  'pmem': '1gb',
  'pvmem': '3gb',
  'vmem': '3gb'},
 'script': 'example/1524642184/template_1524642184_0.sh',
 'status': 'init',
 'updating_info': False}

## submit jobs
the only job control function that must run on a PBS server.

In [9]:
pbs.submit_jobs()

submiting:	example/1524642184/template_1524642184_0.sh
max jobs: 1000
in queue: 1
submitted: 1


## monitor jobs

let's check if the job is indeed online and running: (note the * next to job-0 in the batch, which indicates that)

In [12]:
pbs.get_queue()


1524642184: pbsmgr example
{'submit': ['0*']}

missing jobs: {}

total jobs on PBS queue: 2
running/complete/total: 1/0/1


this is how the output looks when the job has completed:

In [13]:
pbs.get_queue()


1524642184: pbsmgr example
{'complete': [0]}

missing jobs: {}

total jobs on PBS queue: 1


we can now check the logs created by the job (stdout and sterr), and its post-run metadata (which includes a PBS status report, for example). the results may have been written to a designated file by now.

In [15]:
pbs.print_log(1524642184, 0, 'stdout')
pbs.get_job_info(1524642184, 0)



[[[stdout log for 1524642184/pbsmgr example/part_0:]]]

{'BatchID': 1524642184, 'JobIndex': 0, 'priority': 1, 'name': ['pbsmgr example'], 'data': 'example//1524642184/data_0.pkl', 'script': 'example/1524642184/template_1524642184_0.sh', 'queue': 'tamirs3', 'jobfile': 'example/1524642184/meta_0.pkl', 'resources': {'mem': '1gb', 'pmem': '1gb', 'vmem': '3gb', 'pvmem': '3gb', 'cput': '04:59:00'}, 'status': 'submit', 'CodeDir': '/tamir1/dalon/github-test/code', 'updating_info': False, 'subtime': 1524642202.8677227, 'PBS_ID': '8084220', 'hostname': 'compute-0-85.power5', 'stdout': ['example//1524642184/logs/8084220.power8.tau.ac.il.OU'], 'stderr': ['example//1524642184/logs/8084220.power8.tau.ac.il.ER']}
50.0932


{'BatchID': 1524642184,
 'CodeDir': '/tamir1/dalon/github-test/code',
 'JobIndex': 0,
 'PBS_ID': '8084220',
 'data': 'example//1524642184/data_0.pkl',
 'hostname': 'compute-0-85.power5',
 'jobfile': 'example/1524642184/meta_0.pkl',
 'name': ['pbsmgr example'],
 'priority': 1,
 'qstat': {'Checkpoint': 'u',
  'Error_Path': 'compute-0-83',
  'Hold_Types': 'n',
  'Job_Name': 'template_1524642184_0',
  'Job_Owner': 'dalon',
  'Join_Path': 'n',
  'Keep_Files': 'n',
  'Mail_Points': 'a',
  'Output_Path': 'compute-0-83',
  'Priority': '0',
  'Rerunable': 'True',
  'Resource_List.cput': '04:59:00',
  'Resource_List.mem': '1gb',
  'Resource_List.ncpus': '1',
  'Resource_List.nodect': '1',
  'Resource_List.place': 'pack',
  'Resource_List.pmem': '1gb',
  'Resource_List.pvmem': '3gb',
  'Resource_List.select': '1:mem',
  'Resource_List.vmem': '3gb',
  'Resource_List.walltime': '2400:00:00',
  'Submit_arguments': '-q tamirs3 -e /tamir1/dalon/github-test/code/example/152',
  'Variable_List': 'PBS_O_

finally, we may clear all batches that have completed all their jobs using the following functions:

In [16]:
pbs.remove_batch_by_state('complete')
pbs.get_queue()


missing jobs: {}

total jobs on PBS queue: 1


may the power of PBS be with you!