This task processor schedules each step of the GeoFlood workflow on a SLURM cluster. Key features include:
- Logs including:
- Step of GeoFlood workflow
- Elapsed time per step (including task processor overhead)
- SLURM queue per step
- Whether a step takes longer than the longest available queue
- Success/failure of a step
- Exit code of step
- Automatic rescheduling the workflow with a new SLURM job if workflow runs out of time in its queue
- Automatically bumps the workflow up to the next longest available queue
- Restarts workflow from the step that was cut short
- Step-by-step output detection to skip already completed steps of the workflow
Future features include:
- Remote file system download and upload to minimize local disk usage
- User-defined queue limit
While this task processor has been designed with the needs of the GeoFlood workflow in mind, other workflows can be substituted.
Other workflow examples may be found in misc/workflow_examples
.
For another application of this task processor to an extensive workflow, see my HAND-TauDEM GitHub repository.
This task processor is designed to be used in conjunction with GeoFlood. Currently, it is necessary to use my fork of GeoFlood. Differences between this fork and the main GeoFlood repository will be merged soon.
All of the main scripts may be found in geoflood_task_processor/
.
Once the environment is set up (see below), the task processor may be initiated by executing the following command on a scheduler node of a SLURM cluster:
initiate_slurm_task_processor.sh \
--path_taskproc slurm_task_processor.py \
-j 1 \
--path_sbatch node_task_processor.sbatch.sh \
--path_cmds workflow_commands-geoflood-singularity.sh \
--path_log geoflood_singularity.log \
--path_rc workflow_configuration-geoflood_singularity.sh \
--path_img ../geoflood_docker_tacc.sif \
--path_sh node_task_processor.sh \
--minutes 15 \
$(echo $(cat tasks.txt))
Download GeoFlood:
git clone https://github.com/dhardestylewis/GeoFlood.git
Install the GeoFlood Conda environment
conda install -f environments/environment-geoflood.yml
If necessary, prepare DEMs, catchment and flowline vector images, roughness tables. The DEM2basin preprocessing script is available for this, if needed.
python3 geoflood-preprocessing-1m-mp.py \
--shapefile study_area_polygon.shp \
--huc12 WBD-HUC12s.shp \
--nhd NHD_catchments_and_flowlines.gdb/ \
--raster TNRIS-LIDAR-Datasets/ \
--availability TNRIS-LIDAR-Dataset_availability.shp \
--directory HUC12-DEM_outputs/ \
--restart geoflood-preprocessing-study_area.pickle
Create a stage.txt
file
for i in $(seq 0.0 0.1 20.0); do
echo $i >> stage.txt;
done
Download the correct NWM NetCDF file for your flood event.
Organize the DEMs, catchment GIS, flowline GIS, roughness tables, stage tables, and NWM NetCDF files into the GeoFlood file hierarchy as described in the GeoFlood GitHub repository.
Download the GeoFlood HPC Singularity image:
singularity pull docker://geoflood_docker:tacc
Modify geoflood_task_processor/workflow_configuration-geoflood_singularity.sh
to reflect your particular file locations.
Draft a tasks.txt
file that contains the name of each GeoFlood project on each line, for example:
HUC1
HUC2
HUC3
Now run initiate_slurm_task_processor.sh
:
initiate_slurm_task_processor.sh \
--path_taskproc slurm_task_processor.py \
-j 1 \
--path_sbatch node_task_processor.sbatch.sh \
--path_cmds workflow_commands-geoflood-singularity.sh \
--path_log geoflood_singularity.log \
--path_rc workflow_configuration-geoflood_singularity.sh \
--path_img ../geoflood_docker_tacc.sif \
--path_sh node_task_processor.sh \
--minutes 15 \
$(echo $(cat tasks.txt))
The log files are flat CSV tables, with the following columns:
index
- the unique index is generated by concatenating the
start_time
and thepid
- the unique index is generated by concatenating the
pid
- the process id of the executed step of the workflow
start_time
- the start time of the executed step of the workflow in seconds since the 1970 epoch
job_id
- the SLURM job ID
queue
- the selected SLURM queue
elapsed_time
- time elapsed between the task processor's initiation and the start time of the executed step of the workflow
error_long_queue_timeout
- error flag if this step fails due to not enough time available on the longest queue
complete
- flag if the step finishes successfully
last_cmd
- the step of the workflow executed
exit_code
- exit code of the step of the workflow
Overview of this task processor
Example workflow used with this task processor
- All units in seconds
This plot depicts cumulative timings in seconds for each step of the workflow.
- All of the DEMS are reprojected to WGS 84 / UTM 14N, even if the HUC12 is outside of UTM 14.
The DEMs are located on Stampede2 at /scratch/projects/tnris/dhl-flood-modelling/GeoFlood/GeoFlood-Task_processor
.
Please submit a ticket if you have trouble accessing this data. You may also contact me directly at @dhardestylewis or dhl@tacc.utexas.edu