-
Notifications
You must be signed in to change notification settings - Fork 24
How to run Eidos on the Arizona HPC cluster
This document contains a sequence of step-by-step instructions describing how to run Eidos batch jobs on the UA HPC cluster (ocelote
). Obviously, this is valid only for UA people.
For all these instructions the following variables are used:
-
NETID
: your UA NetID, e.g.,msurdeanu
for Mihai -
SCALAVER
: scala version used by Eidos, e.g.,2.12.4
-
EIDOSVER
: Eidos version used, e.g.,0.2.2-SNAPSHOT
-
HDIR
: your home directory on ocelote, e.g.,/home/u13/msurdeanu
for Mihai
In case you do not have an HPC account, create one using the instructions listed here: https://docs.hpc.arizona.edu/display/UAHPC/Account+Creation
If you have one, skip to the next step.
ssh NETID@hpc.arizona.edu
ocelote
Note: this is step is not needed to run Eidos, since the fat jar includes scala as well. However, just in case somebody needs scala on ocelote, we are leaving these instructions here.
Download scala from here: https://www.scala-lang.org/download/all.html
Copy the tar file to your home directory on ocelote:
scp scala-SCALAVER.tar NETID@filexfer.hpc.arizona.edu:
Login to ocelote (see above) and then:
tar xvf scala-SCALAVER.tar
Open your .bash_profile
and add the bin/
directory to your path:
vi .bash_profile
then add the following line before export $PATH
:
PATH=$PATH:HDIR/scala-SCALAVER/bin
Make sure it works:
source ~/.bash_profile
scala -version
On your work machine, in the eidos
directory:
sbt assembly
scp target/scala-2.12/eidos-assembly-EIDOSVER.jar NETID@filexfer.hpc.arizona.edu:.
Let's assume that you have two batches to be processed, for simplicity (of course, in a real use case, the number of batches will be larger). Create input and output directories for each batch, in the /extra
partition, which gives you up to 200GB:
cd /extra/NETID
mkdir eidos
cd eidos
mkdir in.1 in.2 out.1 out.2
In the above, partition i corresponds to the directory in.
i. The output JSON files created for partition i will be saved in the directory out.
i. Then, of course, copy all your files to processed in the corresponding in.*
directory.
Create an eidos.pbs
file that has the following content (make sure to replace the variables with actual values!):
#!/bin/bash
### Script to run Eidos on files contained in one directory and
### save the JSON outputs in another directory
### Job name
#PBS -N eidos
### Specify email address to use for notification (optional).
#PBS -M NETID@email.arizona.edu
### Request email when job ends
#PBS -m bea
### Specify the PI group found with va command
#PBS -W group_list=msurdeanu
### Set the queue to submit this job. (Use windfall for non critical jobs, or msurdeanu for critical ones)
#PBS -q windfall
### Set the number of nodes and cpus that will be used
#PBS -l select=1:ncpus=4:mem=16gb:pcmem=6gb
### Per process virtual memory. pvmem = mem + pcmem
#PBS –l pvmem=24gb
### Not using all processors in node, so allow others to share it
#PBS -l place=pack:shared
### Specify "wallclock time" required for this job, hhh:mm:ss
#PBS -l walltime=00:10:0
### Specify total cpu time required for this job, hhh:mm:ss
### total cputime = walltime * ncpus
#PBS -l cput=00:40:0
### Set directory for execution to the directory where you ran qsub.
### This way, the script will work for multiple projects in different
### directories as long as the files are named the same way.
cd $PBS_O_WORKDIR
### Profile job
echo Running on host `hostname`
echo Working directory is `pwd`
echo Worker id is $PBS_ARRAY_INDEX
### The actual job
echo Time before starting the job is `date`
export WID=$PBS_ARRAY_INDEX
java -J-Xmx16g -cp HDIR/eidos-assembly-EIDOSVER.jar org.clulab.wm.eidos.apps.ExtractFromDirectory /extra/NETID/eidos/in.$WID /extra/NETID/eidos/out.$WID
echo Time after ending the job is `date`
In the above, you may want to adjust ncpus
to increase/decrease the amount of parallelism within an individual worker, walltime
(and cput
) if a worker is expected to take more than 10 minutes, and the RAM allocated to Eidos in the mem
variable as well as the scala command line.
Just FYI, for more help on PBS configuration, see this page: https://docs.hpc.arizona.edu/display/UAHPC/Script+Examples
Also, this page generates custom .pbs
files: https://jobbuilder.hpc.arizona.edu/
Continuing with the assumption that you want to start two workers, use:
cd ~
qsub -J 1-2 eidos.pbs
If you have N workers, use qsub -J 1-
N instead.
Note that after each worker starts, you will receive an email with content similar to this:
PBS Job Id: 1723032[].head1.cm.cluster
Job Name: eidos
Begun execution
After a worker completes execution, you will receive another email with content similar to this:
PBS Job Id: 1723032[].head1.cm.cluster
Job Name: eidos
Execution terminated
Exit_status=0
The log files for each worker are saved in your home directory. For example, worker 1 in the above job will generate two files: eidos.e1723032.1
and eidos.o1723032.1
, where the former is stderr
and the latter is stdout
.