Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel cmorisation of lpjg output? #800

Open
klauswyser opened this issue Feb 13, 2024 · 7 comments
Open

Parallel cmorisation of lpjg output? #800

klauswyser opened this issue Feb 13, 2024 · 7 comments

Comments

@klauswyser
Copy link
Collaborator

The number of LPJG variables has grown substantially, and processing the LPJG output now takes longer than that of IFS (without model levels). When processing IFS there is the possibility to run tasks in parallel, but this option has no effect with LPJG (unless I'm mistaken). Would it be possible to allow for parallel processing of LPJG?

Pinging @treerink @goord @nierad

@treerink
Copy link
Collaborator

Surprises me because I thought the parallelization takes place on the ece2cmor task level and not (that much) within the component part (for IFS model level I am not sure). @goord is this perception to large extent true?

@klauswyser
Copy link
Collaborator Author

ece2cmor.py -h :
...
  --npp N                     Number of parallel tasks (only relevant for IFS cmorization (default: 8)
...

@treerink
Copy link
Collaborator

Ok, never realized. NEMO is fast. LPJG cmorisation I only recently did myself. But it also explains the slow TM5 cmorisation what I also recently did for the first time myself for a larger set.

So the question might be actually relevant for both LPJG & TM5 cmorisation.

@klauswyser
Copy link
Collaborator Author

...and possibly even for NEMO. For CMIP6 we had mainly monthly means, no big deal, but in OptimESM we have a number of daily fields, plus all the ocean bio-geo-chemistry and many new sea-ice fields, so parallel processing would even help when cmorising NEMO.

@treerink
Copy link
Collaborator

Well I believe my tets-all test covers all those NEMO (LIM) variables, I think it is not a very big deal there. Though faster is always nicer. But with the TM5 FOCI example I recently had, it was over 4 hours per leg if I remember correctly.

@goord
Copy link
Collaborator

goord commented Feb 14, 2024

In my experience, this is not a trivial task. Mainly due to the fact that the cmor library has all variables structured in cmor-tables that need to be intialized first. So I estimate at least 2 weeks work @treerink .

@plesager
Copy link
Contributor

There is a quick and easy way to parallelize ec2cmor: split your varlist (json data request) and run several ec2cmor instances concurrently with the various json files. This is not perfect, but for NEMO and TM5 (and maybe LPJ-Guess but I do not have enough insights to conclude), we should be able to reduced runtime by 3 or 4 if not more, don't you think? Worth testing anyway, which I will do for the TM5 output in FOCI.

Annoying thing is that this is manual work and each time your data request is modified by genecec you have to redo the varlist splitting - but that should not happen too often anymore for the two projects we are considering here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants