Skip to content

How to run Eidos on the Arizona HPC cluster

Mihai Surdeanu edited this page Dec 29, 2018 · 7 revisions

How to run Eidos on the Arizona HPC cluster

This document contains a sequence of step-by-step instructions describing how to run Eidos batch jobs on the UA HPC cluster (ocelote). Obviously, this is valid only for UA people.

For all these instructions the following variables are used:

  • NETID: your UA NetID, e.g., msurdeanu for Mihai
  • SCALAVER: scala version used by Eidos, e.g., 2.12.4
  • EIDOSVER: Eidos version used, e.g., 0.2.2-SNAPSHOT
  • HDIR: your home directory on ocelote, e.g., /home/u13/msurdeanu for Mihai

Create an account

In case you do not have an HPC account, create one using the instructions listed here: https://docs.hpc.arizona.edu/display/UAHPC/Account+Creation

If you have one, skip to the next step.

Login to ocelote

ssh NETID@hpc.arizona.edu
ocelote

OPTIONAL: Install scala in your home directory on ocelote

Note: this is step is not needed to run Eidos, since the fat jar includes scala as well. However, just in case somebody needs scala on ocelote, we are leaving these instructions here.

Download scala from here: https://www.scala-lang.org/download/all.html

Copy the tar file to your home directory on ocelote:

scp scala-SCALAVER.tar NETID@filexfer.hpc.arizona.edu:

Login to ocelote (see above) and then:

tar xvf scala-SCALAVER.tar

Open your .bash_profile and add the bin/ directory to your path:

vi .bash_profile

then add the following line before export $PATH:

PATH=$PATH:HDIR/scala-SCALAVER/bin

Make sure it works:

source ~/.bash_profile
scala -version

Build and transfer the Eidos fat jar to ocelote

On your work machine, in the eidos directory:

sbt assembly
scp target/scala-2.12/eidos-assembly-EIDOSVER.jar NETID@filexfer.hpc.arizona.edu:.

Organize the files to be processed into batches on ocelote

Let's assume that you have two batches to be processed, for simplicity (of course, in a real use case, the number of batches will be larger). Create input and output directories for each batch, in the /extra partition, which gives you up to 200GB:

cd /extra/NETID
mkdir eidos
cd eidos
mkdir in.1 in.2 out.1 out.2

In the above, partition i corresponds to the directory in.i. The output JSON files created for partition i will be saved in the directory out.i. Then, of course, copy all your files to processed in the corresponding in.* directory.

Prepare the PBS script

Create an eidos.pbs file that has the following content (make sure to replace the variables with actual values!):

#!/bin/bash
### Script to run Eidos on files contained in one directory and
###   save the JSON outputs in another directory
### Job name
#PBS -N eidos
### Specify email address to use for notification (optional).
#PBS -M NETID@email.arizona.edu
### Request email when job ends
#PBS -m bea
### Specify the PI group found with va command
#PBS -W group_list=msurdeanu
### Set the queue to submit this job. (Use windfall for non critical jobs, or msurdeanu for critical ones)
#PBS -q windfall
### Set the number of nodes and cpus that will be used
#PBS -l select=1:ncpus=4:mem=16gb:pcmem=6gb
### Per process virtual memory. pvmem = mem + pcmem
#PBS –l pvmem=24gb
### Not using all processors in node, so allow others to share it
#PBS -l place=pack:shared
### Specify "wallclock time" required for this job, hhh:mm:ss
#PBS -l walltime=00:10:0
### Specify total cpu time required for this job, hhh:mm:ss
### total cputime = walltime * ncpus
#PBS -l cput=00:40:0

### Set directory for execution to the directory where you ran qsub.
### This way, the script will work for multiple projects in different
### directories as long as the files are named the same way.
cd $PBS_O_WORKDIR
### Profile job
echo Running on host `hostname`
echo Working directory is `pwd`
echo Worker id is $PBS_ARRAY_INDEX

### The actual job
echo Time before starting the job is `date`
export WID=$PBS_ARRAY_INDEX
java -J-Xmx16g -cp HDIR/eidos-assembly-EIDOSVER.jar org.clulab.wm.eidos.apps.ExtractFromDirectory /extra/NETID/eidos/in.$WID /extra/NETID/eidos/out.$WID
echo Time after ending the job is `date`

In the above, you may want to adjust ncpus to increase/decrease the amount of parallelism within an individual worker, walltime (and cput) if a worker is expected to take more than 10 minutes, and the RAM allocated to Eidos in the mem variable as well as the scala command line.

Just FYI, for more help on PBS configuration, see this page: https://docs.hpc.arizona.edu/display/UAHPC/Script+Examples

Also, this page generates custom .pbs files: https://jobbuilder.hpc.arizona.edu/

Run and monitor the jobs

Continuing with the assumption that you want to start two workers, use:

cd ~
qsub -J 1-2 eidos.pbs

If you have N workers, use qsub -J 1-N instead.

Note that after each worker starts, you will receive an email with content similar to this:

PBS Job Id: 1723032[].head1.cm.cluster
Job Name:   eidos
Begun execution

After a worker completes execution, you will receive another email with content similar to this:

PBS Job Id: 1723032[].head1.cm.cluster
Job Name:   eidos
Execution terminated
Exit_status=0

The log files for each worker are saved in your home directory. For example, worker 1 in the above job will generate two files: eidos.e1723032.1 and eidos.o1723032.1, where the former is stderr and the latter is stdout.