Skip to content

Add HPC launcher and Slurm integration #22

Merged
simon-donike merged 6 commits intoESAOpenSR:mainfrom
mak8427:main
Mar 12, 2026
Merged

Add HPC launcher and Slurm integration #22
simon-donike merged 6 commits intoESAOpenSR:mainfrom
mak8427:main

Conversation

@mak8427
Copy link
Contributor

@mak8427 mak8427 commented Mar 6, 2026

This pull request introduces a new, installable HPC/Slurm launcher for the opensr-model,it adds a new CLI (opensr-hpc) with commands for configuration validation, job submission (patch/grid), status, and output collection, along with structured runtime configuration files and supporting modules. The documentation is updated to reflect these new capabilities, and example configs and usage are provided.

The most important changes are:

HPC/Slurm Launcher and CLI:

  • Added a modular, installable Slurm/HPC launcher under deployment/opensr_hpc, including a user-facing CLI (opensr-hpc) for validating configs, submitting jobs (patch/grid), running tasks, collecting outputs, and checking run status. [1] [2] [3] [4]

Configuration and Example Files:

  • Introduced structured runtime configuration files (runtime.default.yaml, runtime.a100.example.yaml) for specifying environment, model, staging, inference, and Slurm parameters. [1] [2]
  • Added a config loader and validator (config.py) for robust runtime config management.

Documentation Updates:

  • Updated the main README.md with a new section on HPC/Slurm usage, installation instructions, and CLI usage examples. [1] [2]
  • Added a new deployment/README.md explaining the launcher, run layout, configs, and Slurm entrypoint.

Supporting Utilities:

  • Added helpers for checkpoint resolution and hashing, output collection, and inference orchestration. [1] [2] [3]
  • Included a module docstring for deployment/__init__.py.

I need to proof check it a few times more before making i ready

@mak8427
Copy link
Contributor Author

mak8427 commented Mar 6, 2026

this comes from #21

@simon-donike
Copy link
Member

Thanks a lot for putting in the work, I've added a small comment on the patching file. Other than that I see no issues now, but it might take me a moment to validate the whole workflow in practive. Great that you use opensr-utils; the script is a great improvement!

@mak8427 mak8427 marked this pull request as ready for review March 10, 2026 14:39
@mak8427
Copy link
Contributor Author

mak8427 commented Mar 10, 2026

Thanks a lot for putting in the work, I've added a small comment on the patching file. Other than that I see no issues now, but it might take me a moment to validate the whole workflow in practive. Great that you use opensr-utils; the script is a great improvement!

Thank you for putting this cool models open source ! i am using them quite a lot lately, i am running them on the University HPC we have here in gottingen, hopefully it would not give problems on other systems.

@simon-donike
Copy link
Member

Just another quick conceptual question: I see you use opensr_utils.large_file_processing in inference.py L66, which performes the overlapped patching for a single cubo cutout. What about grid-level overlap between separate cubo cutouts, that would result in potential patching artefacts? Or did i misunderstand that case?

@mak8427
Copy link
Contributor Author

mak8427 commented Mar 10, 2026

Good point. the overlap is at two levels:
opensr_utils.large_file_processing handles overlapping inference windows within a single cutout,
build_patches adds overlap between neighboring cubo cutouts viastaging.overlap_meters.
What it isn't currently implemented is the reconcile of those overlapping SR outputs afterward,
so a naive downstream mosaic could still show seams at cutout boundaries.

@simon-donike
Copy link
Member

Okay, that's fine I guess. For this case, can you please print an [info] statement when multiple cubo cutouts are used? I don't think this is important enough to warrant the effort to fix it, but might still be good to inform users that against the artefact-free claims, they might exist for this edge case. Other than that, looks good to me, I haven't run it yet but as soon as that's done I'll merge! Thanks again!

@mak8427
Copy link
Contributor Author

mak8427 commented Mar 12, 2026

Okay, that's fine I guess. For this case, can you please print an [info] statement when multiple cubo cutouts are used? I don't think this is important enough to warrant the effort to fix it, but might still be good to inform users that against the artefact-free claims, they might exist for this edge case. Other than that, looks good to me, I haven't run it yet but as soon as that's done I'll merge! Thanks again!

i've included the info statement :)

@simon-donike simon-donike merged commit 47c96bd into ESAOpenSR:main Mar 12, 2026
@simon-donike
Copy link
Member

thanks a lot @mak8427 , that was a great contribution. I managed to reproduce the results. FYI: In the release 1.1.0, which includes your changes, I also modernized to P3.12 and update the according deps. I will update the docs to an mkdocs version soon and will also give you credit there. I know it's a big ask, but the workflow you implemented is really valueable, and it would be great if that would work on SRGAN too. Would it be too much trouble to implement the same there? Thanks in advance.

@mak8427
Copy link
Contributor Author

mak8427 commented Mar 12, 2026

thanks a lot @mak8427 , that was a great contribution. I managed to reproduce the results. FYI: In the release 1.1.0, which includes your changes, I also modernized to P3.12 and update the according deps. I will update the docs to an mkdocs version soon and will also give you credit there. I know it's a big ask, but the workflow you implemented is really valueable, and it would be great if that would work on SRGAN too. Would it be too much trouble to implement the same there? Thanks in advance.

Sure, I can do that i won't be too hard, also on another note i will start my PhD in May here in Göttingen, i was wondering if you might be interested in working on something together in the future :)

@simon-donike
Copy link
Member

great, I'll contact you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants