# anthill 

anthill is cluster that uses the "Sun Grid engine" scheduler.



To submit a job to the cluster you need to write a bash script

In [1]:
cat ./sleep_test.sh

date
sleep 60
date


submit using qsub

In [2]:
rm -rf sge sgo
mkdir -p sge
mkdir -p sgo
qsub -cwd -N test -e sge -o sgo -q important ./sleep_test.sh

Your job 6039626 ("test") has been submitted


you can check the status using qstat

In [3]:
qstat

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
6039626 0.00000 test       acantu       qw    12/17/2020 12:04:09                                    1        


If needed, the job can be killed using qdel

In [5]:
#qdel [job-ID]
qdel 6039626

job 6039626 is already in deletion


STDOUT is redirected to sgo and STDERR to sge.

In [6]:
cat sgo/test.o6039626

Thu Dec 17 12:04:18 PST 2020
Thu Dec 17 12:05:18 PST 2020


## cwd
-cwd tells SGE to use the current directory
## N
-N names the job
## e
-e redirects STDERR
## o
-o redirects STDOUT
## q
-q select the queue. there are 3 queues:

# default
The default queue has 35 nodes, each with 16 processors and 128 GB RAM. Each processor runs an independent job, so you can run 560 jobs simultaneously on these machines. This queue is in eternally friendly mode, and all jobs are run on a first-in first-out basis.
# important
The important queue has 4 nodes with 16 processors and 128 GB RAM each. This queue is for single jobs only. Do not run array jobs on this queue or they will be terminated! The queue is for testing and running individual programs.
# smallmem
This queue has nine nodes each with 8 processors (72 computes total), Each processor has 14 GB RAM, except node1 that has 24 GB RAM. People often forget about this queue, so sometimes it is worth checking!

In [7]:
qconf -sql

default
important
smallmem


argument can also be passed in file

In [8]:
cat sleep_test_infile.sh

#!/usr/bin/bash

#$ -cwd
#$ -N test_infile
#$ -e sge
#$ -o sgo
#$ -q important


date
sleep 60
date


In [9]:
qsub ./sleep_test_infile.sh

Your job 6039627 ("test_infile") has been submitted


With an array job, sleep_test_job_array.sh gets passed a special variable called $SGE_TASK_ID that is the number of the job it is running

In [10]:
cat ./sleep_test_job_array.sh

date
sleep 60
echo "this is job number $SGE_TASK_ID"
date


In [11]:
qsub -cwd -N array -e sge -o sgo -t 1:5:1 -q default ./sleep_test_job_array.sh

Your job-array 6039628.1-5:1 ("array") has been submitted


In [16]:
qstat

In [17]:
ls sgo/array*

[0m[38;5;253msgo/array.o6039628.1[0m  [38;5;253msgo/array.o6039628.3[0m  [38;5;253msgo/array.o6039628.5[0m
[38;5;253msgo/array.o6039628.2[0m  [38;5;253msgo/array.o6039628.4[0m


In [18]:
cat sgo/array*

Thu Dec 17 12:06:18 PST 2020
this is job number 1
Thu Dec 17 12:07:18 PST 2020
Thu Dec 17 12:06:18 PST 2020
this is job number 2
Thu Dec 17 12:07:18 PST 2020
Thu Dec 17 12:06:18 PST 2020
this is job number 3
Thu Dec 17 12:07:18 PST 2020
Thu Dec 17 12:06:18 PST 2020
this is job number 4
Thu Dec 17 12:07:18 PST 2020
Thu Dec 17 12:06:18 PST 2020
this is job number 5
Thu Dec 17 12:07:18 PST 2020


a classic hack is to list all commands to run in a file and have a script read it and run.

In [19]:
cat ./list_of_commands.txt

echo "this is the first command"
echo "this is the second command"
echo "this is the third command"


In [20]:
cat ./run_list.sh

#!/usr.bin/bash

sleep 40
run=$(cat list_of_commands.txt | head -n $SGE_TASK_ID | tail -n 1)
$run > $SGE_TASK_ID\_command.txt


In [21]:
qsub -cwd -N list -e sge -o sgo -t 1:3:1 -q default ./run_list.sh

Your job-array 6039629.1-3:1 ("list") has been submitted


In [22]:
qstat

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
6039629 0.55500 list       acantu       r     12/17/2020 12:07:33 default@node30                     1 1
6039629 0.55500 list       acantu       r     12/17/2020 12:07:33 default@node30                     1 2
6039629 0.55500 list       acantu       r     12/17/2020 12:07:33 default@node30                     1 3


In [23]:
ls

[0m[38;5;253m1_command.txt[0m    [38;5;172mfull_path.sh[0m          [38;5;172mpath.sh[0m      [38;5;208;1msleep_test_infile.sh[0m
[38;5;253m2_command.txt[0m    [38;5;253mintro.ipynb[0m           [38;5;172mrun_list.sh[0m  [38;5;208;1msleep_test_job_array.sh[0m
[38;5;253m3_command.txt[0m    [38;5;41mintro.py[0m              [38;5;12msge[0m          [38;5;208;1msleep_test.sh[0m
[38;5;178menvironment.yml[0m  [38;5;253mlist_of_commands.txt[0m  [38;5;12msgo[0m


In [24]:
grep command *command.txt

[35m[K1_command.txt[m[K[36m[K:[m[K"this is the first [01;31m[Kcommand[m[K"
[35m[K2_command.txt[m[K[36m[K:[m[K"this is the second [01;31m[Kcommand[m[K"
[35m[K3_command.txt[m[K[36m[K:[m[K"this is the third [01;31m[Kcommand[m[K"


finally, it is wise to use full path when possible. Don't assume the enviroment is the same

In [25]:
cat path.sh

#!/usr/bin/bash

python -V



In [26]:
qsub -cwd -N ppath -e sge -o sgo -q default ./path.sh

Your job 6039630 ("ppath") has been submitted


In [28]:
cat sgo/ppath* sge/ppath*

bash: python: command not found


In [29]:
cat full_path.sh

#!/usr/bin/bash

/home1/acantu/anaconda3/envs/bash/bin/python -V



In [30]:
qsub -cwd -N fpath -e sge -o sgo -q default ./full_path.sh

Your job 6039631 ("fpath") has been submitted


In [34]:
cat sgo/fpath* sge/fpath*

Python 3.8.6


In [35]:
which -a python

~/anaconda3/envs/bash/bin/python


# extra tips
use -pe make 16 to take a full node for yourself.


In [36]:
qhost -F | grep -e HOSTNAME -e node

[01;31m[KHOSTNAME[m[K                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
[01;31m[Knode[m[K1                   lx-amd64        8    2    8    8  0.00   23.5G  539.4M  100.0G  116.3M
[01;31m[Knode[m[K11                  lx-amd64        8    2    8    8  0.00   13.7G  507.7M  100.0G  114.0M
[01;31m[Knode[m[K12                  lx-amd64       16    2   16   16  0.00  125.9G  890.7M  100.0G  138.0M
[01;31m[Knode[m[K13                  lx-amd64       16    2   16   16  0.00  125.9G    1.1G  100.0G  123.4M
[01;31m[Knode[m[K14                  lx-amd64       16    2   16   16  0.00  125.9G    1.1G  100.0G   97.4M
[01;31m[Knode[m[K15                  lx-amd64       16    2   16   16  0.00  125.9G  941.7M  100.0G  107.7M
[01;31m[Knode[m[K16                  lx-amd64       16    2   16   16  0.00  125.9G  929.2M  100.0G  115.6M
[01;31m[Knode[m[K17                  lx-amd64       16    2   16   16  0.00  125.9G  892.9M  100.0G 