GeoFlood task processor

This task processor schedules each step of the GeoFlood workflow on a SLURM cluster. Key features include:

Logs including:
- Step of GeoFlood workflow
- Elapsed time per step (including task processor overhead)
- SLURM queue per step
- Whether a step takes longer than the longest available queue
- Success/failure of a step
- Exit code of step
Automatic rescheduling the workflow with a new SLURM job if workflow runs out of time in its queue
- Automatically bumps the workflow up to the next longest available queue
- Restarts workflow from the step that was cut short
Step-by-step output detection to skip already completed steps of the workflow

Future features include:

Remote file system download and upload to minimize local disk usage
User-defined queue limit

While this task processor has been designed with the needs of the GeoFlood workflow in mind, other workflows can be substituted. Other workflow examples may be found in misc/workflow_examples. For another application of this task processor to an extensive workflow, see my HAND-TauDEM GitHub repository.

This task processor is designed to be used in conjunction with GeoFlood. Currently, it is necessary to use my fork of GeoFlood. Differences between this fork and the main GeoFlood repository will be merged soon.

Main shell script

All of the main scripts may be found in geoflood_task_processor/. Once the environment is set up (see below), the task processor may be initiated by executing the following command on a scheduler node of a SLURM cluster:

initiate_slurm_task_processor.sh \
    --path_taskproc slurm_task_processor.py \
    -j 1 \
    --path_sbatch node_task_processor.sbatch.sh \
    --path_cmds workflow_commands-geoflood-singularity.sh \
    --path_log geoflood_singularity.log \
    --path_rc workflow_configuration-geoflood_singularity.sh \
    --path_img ../geoflood_docker_tacc.sif \
    --path_sh node_task_processor.sh \
    --minutes 15 \
    $(echo $(cat tasks.txt))

Software dependencies

Setting up the environment

Download GeoFlood:

git clone https://github.com/dhardestylewis/GeoFlood.git

Install the GeoFlood Conda environment

conda install -f environments/environment-geoflood.yml

If necessary, prepare DEMs, catchment and flowline vector images, roughness tables. The DEM2basin preprocessing script is available for this, if needed.

python3 geoflood-preprocessing-1m-mp.py \
    --shapefile study_area_polygon.shp \
    --huc12 WBD-HUC12s.shp \
    --nhd NHD_catchments_and_flowlines.gdb/ \
    --raster TNRIS-LIDAR-Datasets/ \
    --availability TNRIS-LIDAR-Dataset_availability.shp \
    --directory HUC12-DEM_outputs/ \
    --restart geoflood-preprocessing-study_area.pickle

Create a stage.txt file

for i in $(seq 0.0 0.1 20.0); do
    echo $i >> stage.txt;
done

Download the correct NWM NetCDF file for your flood event.

Organize the DEMs, catchment GIS, flowline GIS, roughness tables, stage tables, and NWM NetCDF files into the GeoFlood file hierarchy as described in the GeoFlood GitHub repository.

Download the GeoFlood HPC Singularity image:

singularity pull docker://geoflood_docker:tacc

Modify geoflood_task_processor/workflow_configuration-geoflood_singularity.sh to reflect your particular file locations.

Draft a tasks.txt file that contains the name of each GeoFlood project on each line, for example:

HUC1
HUC2
HUC3

Now run initiate_slurm_task_processor.sh:

initiate_slurm_task_processor.sh \
    --path_taskproc slurm_task_processor.py \
    -j 1 \
    --path_sbatch node_task_processor.sbatch.sh \
    --path_cmds workflow_commands-geoflood-singularity.sh \
    --path_log geoflood_singularity.log \
    --path_rc workflow_configuration-geoflood_singularity.sh \
    --path_img ../geoflood_docker_tacc.sif \
    --path_sh node_task_processor.sh \
    --minutes 15 \
    $(echo $(cat tasks.txt))

Description of log files

The log files are flat CSV tables, with the following columns:

index
- the unique index is generated by concatenating the start_time and the pid
pid
- the process id of the executed step of the workflow
start_time
- the start time of the executed step of the workflow in seconds since the 1970 epoch
job_id
- the SLURM job ID
queue
- the selected SLURM queue
elapsed_time
- time elapsed between the task processor's initiation and the start time of the executed step of the workflow
error_long_queue_timeout
- error flag if this step fails due to not enough time available on the longest queue
complete
- flag if the step finishes successfully
last_cmd
- the step of the workflow executed
exit_code
- exit code of the step of the workflow

Depictions

Overview of this task processor

Workflow example for NOAA counties at 10m

Example workflow used with this task processor

Depiction of HANDs produced from this task processor

Timings obtained from the logs of an example run of this workflow

All units in seconds

GeoNet

TauDEM

GeoFlood

Cumulative timing plot

This plot depicts cumulative timings in seconds for each step of the workflow.

Notes about these processed 10m DEMs

All of the DEMS are reprojected to WGS 84 / UTM 14N, even if the HUC12 is outside of UTM 14.

Where to find them

The DEMs are located on Stampede2 at /scratch/projects/tnris/dhl-flood-modelling/GeoFlood/GeoFlood-Task_processor.

If you run into trouble

Please submit a ticket if you have trouble accessing this data. You may also contact me directly at @dhardestylewis or dhl@tacc.utexas.edu

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
environments		environments
example_commands		example_commands
geoflood_task_processor		geoflood_task_processor
images		images
misc		misc
visualization		visualization
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoFlood task processor

Main shell script

Software dependencies

Setting up the environment

Description of log files

Depictions

Workflow example for NOAA counties at 10m

Depiction of HANDs produced from this task processor

Timings obtained from the logs of an example run of this workflow

Cumulative timing plot

Notes about these processed 10m DEMs

Where to find them

If you run into trouble

About

Releases

Packages

Languages

License

dhardestylewis/GeoFlood-Task_processor

Folders and files

Latest commit

History

Repository files navigation

GeoFlood task processor

Main shell script

Software dependencies

Setting up the environment

Description of log files

Depictions

Workflow example for NOAA counties at 10m

Depiction of HANDs produced from this task processor

Timings obtained from the logs of an example run of this workflow

Cumulative timing plot

Notes about these processed 10m DEMs

Where to find them

If you run into trouble

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages