Add HPC launcher and Slurm integration #22
Conversation
|
this comes from #21 |
|
Thanks a lot for putting in the work, I've added a small comment on the patching file. Other than that I see no issues now, but it might take me a moment to validate the whole workflow in practive. Great that you use opensr-utils; the script is a great improvement! |
Thank you for putting this cool models open source ! i am using them quite a lot lately, i am running them on the University HPC we have here in gottingen, hopefully it would not give problems on other systems. |
|
Just another quick conceptual question: I see you use |
|
Good point. the overlap is at two levels: |
|
Okay, that's fine I guess. For this case, can you please print an [info] statement when multiple cubo cutouts are used? I don't think this is important enough to warrant the effort to fix it, but might still be good to inform users that against the artefact-free claims, they might exist for this edge case. Other than that, looks good to me, I haven't run it yet but as soon as that's done I'll merge! Thanks again! |
…potential seams in mosaics
i've included the info statement :) |
|
thanks a lot @mak8427 , that was a great contribution. I managed to reproduce the results. FYI: In the release 1.1.0, which includes your changes, I also modernized to P3.12 and update the according deps. I will update the docs to an mkdocs version soon and will also give you credit there. I know it's a big ask, but the workflow you implemented is really valueable, and it would be great if that would work on SRGAN too. Would it be too much trouble to implement the same there? Thanks in advance. |
Sure, I can do that i won't be too hard, also on another note i will start my PhD in May here in Göttingen, i was wondering if you might be interested in working on something together in the future :) |
|
great, I'll contact you! |
This pull request introduces a new, installable HPC/Slurm launcher for the
opensr-model,it adds a new CLI (opensr-hpc) with commands for configuration validation, job submission (patch/grid), status, and output collection, along with structured runtime configuration files and supporting modules. The documentation is updated to reflect these new capabilities, and example configs and usage are provided.The most important changes are:
HPC/Slurm Launcher and CLI:
deployment/opensr_hpc, including a user-facing CLI (opensr-hpc) for validating configs, submitting jobs (patch/grid), running tasks, collecting outputs, and checking run status. [1] [2] [3] [4]Configuration and Example Files:
runtime.default.yaml,runtime.a100.example.yaml) for specifying environment, model, staging, inference, and Slurm parameters. [1] [2]config.py) for robust runtime config management.Documentation Updates:
README.mdwith a new section on HPC/Slurm usage, installation instructions, and CLI usage examples. [1] [2]deployment/README.mdexplaining the launcher, run layout, configs, and Slurm entrypoint.Supporting Utilities:
deployment/__init__.py.I need to proof check it a few times more before making i ready