## Lesson 6: Running a simple HPC Job


 List of CLI commands
This page lists basic CLI commands needed to work with directories and files on an ISU HPC cluster. These commands are described in more details at the end of the guide.

man command - Show manual for command

pwd - Prints the full name (the full path) of current/working directory

ls - List directory contents
ls -a - List all the content, including hidden files
ls -l - List the content and its information

mkdir foldername – Create a new directory foldername

cd foldername – Change the working directory to foldername
cd - Return to $HOME directory
cd .. - Go up a directory
cd - - Return to the previous directory

emacs, nano, vi – File editors

cp source destination – Copy source to destination
cp -r source destination – Copy a directory recursively from source to destination

mv source destination - Move (or rename) a file from source to destination

rm file1 - Remove file1
rm -r folder - Remove a directory and its contents recursively

cat file – Print contents of file on the screen
less file - View and paginate file
head file - Show first 10 lines of file
tail file - Show last 10 lines of file

SLURM
Since there may be many users simulteniously logged into cluster headnode, it's important not to run intensive tasks on the headnode. Such tasks should be performed on compute nodes.

The headnode should be used to edit files and to submit jobs to a workload manager which schedules jobs to run on compute nodes. At ISU we use Slurm Workload Manager.

Jobs can be run in interactive and batch modes. In interactive mode you will be logged (via a Slurm command) onto a compute node. Slurm will allocate requested resources only to your interactive job. You can request as little as 1 core, or multiple nodes, depending on what is needed by your job. Clearly, the more resources (cores, memory, CPU time, etc.) you request the longer you may need to wait for those recourses to become available. Since other jobs won't have access to the resources allocated to your job, it will look like a personal computer/cluster. Your environment, such as loaded environment modules, will be copied to the interactive session. Program/command output will be printed on the screen. Interactive mode should be used mostly to debug jobs. The downside of interactive mode: 1. requested resources may not be available right away, and users will have to wait for those before they will be able to run interactive commands 2. if connection to the cluster is lost, the interactive job will be killed by Slurm.

For production jobs the batch mode should be used. In batch mode the resource requests and list of commands to be executed on compute nodes are placed in a job script, which is submitted to Slurm with sbatch command. Program/command output will be redirected to a file. To make it easier for users to create job scripts, we provide job script generators for each cluster -  see the appropriate cluster User Guide listed on the left.

When an interactive or batch job is submitted to Slurm, the workload manager places job in the queue. Each job is given a priority which may change during the time the job stays in the queue. We use Slurm's fair-share scheduling on clusters at ISU. Job priority depends on how much resources had been used by the user or user's group, group's contribution to the cluster and how long the job has been waiting for resources. In accordance with job priority and amount of resources requested versus available, Slurm decides which resources to allocate to jobs. 

On research clusters you can use slurm-usage.py command to see your group usage. To see available options issue "slurm-usage.py -h" command.


cp /shared/hpc/sample-job-scripts/abaqus/abaqus-2017.sh /work/<your group/path>

Once this file has been copied to the same location as your input file, you will have to edit the "parameter" section of this script (labeled by the #  comments). 

When the changes are saved run the command: chmod +x abaqus-2017.sh

this makes the file executable, then submit your job by executing the script with command:

./abaqus-2017.sh

sample script below:

#!/bin/bash
# 
# Sample script that creates an sbatch script that submits an Abaqus 2017 job to
# SLURM.
# Instructions:
#   1.  Modify the items under PARAMETERS below. Save the file. 
#   2.  Make this file executable (e.g. chmod +x abaqus-2017.sh)
#   3.  Run this file  (.e.g   ./abaqus-2017.sh ) 

# PARAMETERS:  (modify as needed)
JOBNAME=abaqus-job1
WORKDIR=/work/some-group/some-project
INPUTFILE=input.inp
# the USERFILE value is optional.  It is used to provide the name of a user-supplied routine.
# If you need it, uncomment out the line below.

#USERFILE=my-subroutine.for

PARTITION=compute
NUM_NODES=2
PROCS_PER_NODE=16
MAX_TIME=3:00:00
EMAIL=some-netid@iastate.edu
MAIL_TYPES=BEGIN,FAIL,END
# end of PARAMETERS   

TOTAL_PROCS=$((NUM_NODES*PROCS_PER_NODE))
INPUTFILEPATH=${INPUTFILE}
ERROR_FILE=${JOBNAME}.%j.error
OUTPUT_FILE=${JOBNAME}.%j.output

# if a USERFILE is specified, set USERSTRING and call abaqus with it.
if [[ ${USERFILE} != "" ]]; then
   USERSTRING="user=${USERFILE}"
fi

# Everything below from 'cat ..' until END_OF_SCRIPT gets passed to sbatch.  Edit carefully.
# Note that the regular shell variables (i.e.  $var,  ${var} ) are 
# filled in by bash when you run this script.
# The escaped variables (i.e.  \$var ) are filled in by SLURM at run time.
cat <<END_OF_SCRIPT > ${JOBNAME}.sbatch
#!/bin/bash
#SBATCH -J $JOBNAME
#SBATCH -D $WORKDIR
#SBATCH -N $NUM_NODES
#SBATCH -n $TOTAL_PROCS
#SBATCH --partition=$PARTITION
#SBATCH --ntasks-per-node=$PROCS_PER_NODE
# it's a good idea to tell SLURM to use a large amount of memory. 120000 = 120GB.
#SBATCH --mem=120000
#SBATCH --time=$MAX_TIME
####SBATCH -C compute
#SBATCH --error=$ERROR_FILE
#SBATCH --output=$OUTPUT_FILE
#SBATCH --mail-type=$MAIL_TYPES
#SBATCH --mail-user=$EMAIL
cd $WORKDIR

# Load the Intel compiler and Abaqus software.
module load intel/17.4
module load abaqus/2017

# Abaqus doesn't support SLURM natively.  So, the script below gets the list of
# allocated hosts from SLURM and uses it to construct the mp_host_list[] variable.  
# It copies the global custom_v6.env file from the global Abaqus "site" directory and 
# adds the mp_host_list[] line to the bottom of the abaqus_v6.env file in the current folder.

create_abaqus_mp_host_list.sh

unset SLURM_GTIDS

export I_MPI_HYDRA_BOOTSTRAP=ssh

abaqus interactive analysis job=${JOBNAME} input=${INPUTFILEPATH} cpus=${TOTAL_PROCS} mp_mode=mpi memory="80 %" ${USERSTRING} scratch=${WORKDIR}

END_OF_SCRIPT

# Now send the sbatch script created above to sbatch..
echo "running: sbatch ./${JOBNAME}.sbatch"
sbatch ./${JOBNAME}.sbatch