Fileserver to HPC

Monitors a folder and automatically processes new subdirectories on an HPC cluster.

Usage

Copy your data to the folder on the fileserver that is monitored by Fileserver to HPC. For example //fileserver/process_on_hpc/my_data
Copy the entire workflow folder into the same folder. For example //fileserver/process_on_hpc/my_data/workflow. The workflow folder must contain a file called start.slurm.
Once the data and the workflow are ready for processing, compress the workflow folder into a zip file. For example: //fileserver/process_on_hpc/my_data/workflow.zip
After ~ 1 min, the zip file will disappear. This means that processing on the cluster has started.
Once processing is done, the workflow will make sure that the results are transferred back to the fileserver (see Creating your own workflow below for how this can be implemented).

Creating your own workflow

The start.slurm file should handle processing of your data and make sure that the results are synchronized back to the fileserver. In order to be able to do that, there are several environment variables set:

SOURCE_DIR contains the path to the data on the fileserver from the perspective of the HPC cluster. For example: /grp/g_biapol/process_on_cluster/
TARGET_DIR contains the path to the workspace on the cluster where the processing happens
On the TUD cluser, only the datamover partition is able to access the SOURCE_DIR. Therefore, our workflows start a separate job that runs a small cleanup.slurm file after the start.slurm job is done like this:
```
CLEANUP_JOB_ID=$(sbatch -o "${TARGET_DIR}/log/cleanup.out" --dependency=afterany:"$SLURM_JOB_ID" --export=ALL "$TARGET_DIR"/workflow/cleanup.slurm)
```

cleanup.slurm looks like this:

#!/bin/bash
# SBATCH --partition=datamover

echo "copying results from $TARGET_DIR to $SOURCE_DIR"
rsync -rv "$TARGET_DIR" "$SOURCE_DIR" || (echo "failed to copy results to $SOURCE_DIR" leaving them in "$TARGET_DIR" && exit 1)
echo "removing $TARGET_DIR"
rm -rf "$TARGET_DIR"
echo "done"

Check out the example workflows for templates.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
example_workflows		example_workflows
fileserver_to_hpc		fileserver_to_hpc
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fileserver to HPC

Usage

Creating your own workflow

About

Releases

Packages

Languages

License

BiAPoL/Fileserver-to-HPC

Folders and files

Latest commit

History

Repository files navigation

Fileserver to HPC

Usage

Creating your own workflow

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages