qtools has helper functions to submit jobs to compute clusters (PBS on TSCC, SGE on oolite) from within Python
- Free software: BSD license
To install this code, clone this github repository and use pip
to install
git clone git@github.com:YeoLab/qtools
cd qtools
pip install . # The "." means "install *this*, the folder where I am now"
Here's an example of a single job where I want to use hmmscan
to find domains in protein sequences, specifying the walltime and number of processors.
import qtools
command = 'bedtools intersect exons.bed placental_conserved_elements.bed'
sub = qtools.Submitter(command, 'intersect')
And this will create a submitter script with the default options:
walltime="00:30:00"
nodes=1
ppn=1
(processors per node - increase this one first, instead of the numbers of nodes. Max is 16)group="yeo-group"
queue="home-scrm"
(could also behome-yeo
This writes a file called intersect.sh
which looks like this:
#!/bin/bash
#PBS -N intersect
#PBS -o intersect.out
#PBS -e intersect.err
#PBS -V
#PBS -l walltime=00:30:00
#PBS -l nodes=1:ppn=1
#PBS -A yeo-group
#PBS -q home
# Go to the directory from which the script was called
cd $PBS_O_WORKDIR
bedtools intersect exons.bed placental_conserved_elements.bed
The output is:
job ID: 3610818
If you have a bunch of independent jobs you want to run, then you can submit
them with one command using array=True
. Here's an example of calculating average conservation of both constitutive and alternative exons.
import os
import glob
import qtools
folder = '/projects/ps-yeolab/obotvinnik/singlecell_pnms'
alt_exons_bedfile = '{}/exon2.bed'.format(folder)
constitutive_bedfile = '{}/constitutive_exons.bed'.format(folder)
bedfiles = alt_exons_bedfile, constitutive_bedfile
commands = []
bw = '/projects/ps-yeolab/genomes/hg19/hg19_phastcons_placental_mammal.bw'
for bedfile in bedfiles:
basename = os.path.basename(bedfile)
prefix = basename.split('.bed')[0]
prefix += '_phastcons_placental_mammal'
bedout = '{}/{}'.format(folder, prefix + '.bed')
outtab = '{}/{}'.format(folder, prefix + '.txt')
command = 'bigWigAverageOverBed {} {} {} -bedOut={}'.format(bw, bedfile, outtab, bedout)
print command
commands.append(command)
jobname = 'exonbody_conservation'
qtools.Submitter(commands, jobname, array=True, walltime='2:00:00')
Output:
running 2 tasks as an array-job.
job ID: 3614584
This creates the file exonbody_conservation.sh
which looks like this:
#!/bin/bash
#PBS -N exonbody_conservation
#PBS -o /projects/ps-yeolab/obotvinnik/singlecell_pnms/exonbody_conservation.out
#PBS -e /projects/ps-yeolab/obotvinnik/singlecell_pnms/exonbody_conservation.err
#PBS -V
#PBS -l walltime=2:00:00
#PBS -l nodes=1:ppn=1
#PBS -A yeo-group
#PBS -q home
#PBS -t 1-2
# Go to the directory from which the script was called
cd $PBS_O_WORKDIR
cmd[1]="bigWigAverageOverBed /projects/ps-yeolab/genomes/hg19/hg19_phastcons_placental_mammal.bw /projects/ps-yeolab/obotvinnik/singlecell_pnms/exon2.bed /projects/ps-yeolab/obotvinnik/singlecell_pnms/exon2_phastcons_placental_mammal.txt -bedOut=/projects/ps-yeolab/obotvinnik/singlecell_pnms/exon2_phastcons_placental_mammal.bed"
cmd[2]="bigWigAverageOverBed /projects/ps-yeolab/genomes/hg19/hg19_phastcons_placental_mammal.bw /projects/ps-yeolab/obotvinnik/singlecell_pnms/constitutive_exons.bed /projects/ps-yeolab/obotvinnik/singlecell_pnms/constitutive_exons_phastcons_placental_mammal.txt -bedOut=/projects/ps-yeolab/obotvinnik/singlecell_pnms/constitutive_exons_phastcons_placental_mammal.bed"
eval ${cmd[$PBS_ARRAYID]}
If you want your sh
/stdout
/stderr
to be sent to a specific location, instead
of to the folder you're currently in by default, then specify them with sh
,
out
, and err
. You can also specify the queue (home-yeo
vs home-scrm
) with queue="home-scrm"
. The default is home-yeo
.
import qtools
jobname = 'run_outrigger_py'
sh = jobname + '.sh'
out = sh + '.out'
err = sh + '.err'
command = 'python /projects/ps-yeolab/obotvinnik/singlecell_pnms/outrigger/outrigger.py'
n_processors = 16
sub = qtools.Submitter([command], 'run_outrigger_py', queue='home-yeo',
out=out, err=err, sh=sh, walltime='100:00:00', nodes=1,
ppn=n_processors)
Output:
job ID: 3884631