Parallel cmorisation of lpjg output? #800

klauswyser · 2024-02-13T10:26:43Z

The number of LPJG variables has grown substantially, and processing the LPJG output now takes longer than that of IFS (without model levels). When processing IFS there is the possibility to run tasks in parallel, but this option has no effect with LPJG (unless I'm mistaken). Would it be possible to allow for parallel processing of LPJG?

Pinging @treerink @goord @nierad

treerink · 2024-02-13T14:46:54Z

Surprises me because I thought the parallelization takes place on the ece2cmor task level and not (that much) within the component part (for IFS model level I am not sure). @goord is this perception to large extent true?

klauswyser · 2024-02-13T14:51:11Z

ece2cmor.py -h :
...
  --npp N                     Number of parallel tasks (only relevant for IFS cmorization (default: 8)
...

treerink · 2024-02-13T14:55:27Z

Ok, never realized. NEMO is fast. LPJG cmorisation I only recently did myself. But it also explains the slow TM5 cmorisation what I also recently did for the first time myself for a larger set.

So the question might be actually relevant for both LPJG & TM5 cmorisation.

klauswyser · 2024-02-13T18:26:17Z

...and possibly even for NEMO. For CMIP6 we had mainly monthly means, no big deal, but in OptimESM we have a number of daily fields, plus all the ocean bio-geo-chemistry and many new sea-ice fields, so parallel processing would even help when cmorising NEMO.

treerink · 2024-02-13T19:04:17Z

Well I believe my tets-all test covers all those NEMO (LIM) variables, I think it is not a very big deal there. Though faster is always nicer. But with the TM5 FOCI example I recently had, it was over 4 hours per leg if I remember correctly.

goord · 2024-02-14T08:03:49Z

In my experience, this is not a trivial task. Mainly due to the fact that the cmor library has all variables structured in cmor-tables that need to be intialized first. So I estimate at least 2 weeks work @treerink .

plesager · 2024-04-22T09:45:53Z

There is a quick and easy way to parallelize ec2cmor: split your varlist (json data request) and run several ec2cmor instances concurrently with the various json files. This is not perfect, but for NEMO and TM5 (and maybe LPJ-Guess but I do not have enough insights to conclude), we should be able to reduced runtime by 3 or 4 if not more, don't you think? Worth testing anyway, which I will do for the TM5 output in FOCI.

Annoying thing is that this is manual work and each time your data request is modified by genecec you have to redo the varlist splitting - but that should not happen too often anymore for the two projects we are considering here.

nierad added the lpj-guess label Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel cmorisation of lpjg output? #800

Parallel cmorisation of lpjg output? #800

klauswyser commented Feb 13, 2024

treerink commented Feb 13, 2024

klauswyser commented Feb 13, 2024

treerink commented Feb 13, 2024

klauswyser commented Feb 13, 2024

treerink commented Feb 13, 2024

goord commented Feb 14, 2024

plesager commented Apr 22, 2024

Parallel cmorisation of lpjg output? #800

Parallel cmorisation of lpjg output? #800

Comments

klauswyser commented Feb 13, 2024

treerink commented Feb 13, 2024

klauswyser commented Feb 13, 2024

treerink commented Feb 13, 2024

klauswyser commented Feb 13, 2024

treerink commented Feb 13, 2024

goord commented Feb 14, 2024

plesager commented Apr 22, 2024