Describe the bug
Ensemble trial data will typically produce directory structure as follows for a particular run:
20240609T0600Z/enuk_um_003, 20240609T0600Z/enuk_um_004, 20240609T0600Z/enuk_um_005. The number in each folder represents the ensemble member number, each one will have exactly the same filenames in them i.e. enukaa_pd000, enukaa_pd003, ... , enukaa_pd123, where the number represents the forecast lead time.
When passing file globs into the workflow (/enuk_um_0*/enukaa_pd*) it will fail because it sets up one directory and creates symbolic links to each of the files in this directory, using only the final part of the path (i.e. the file name) as the link name. So it will correctly expand the pattern and find all the files, but will only load the first set of enukaa_pd* files it finds. Files for subsequent ensemble members are treated as duplicates, so none of the other members are loaded in.
As a result, the plots generated are only of one realization despite the globbing pointing towards multiple members.
How to reproduce
Steps to reproduce the behaviour:
- Choose one trial model and point the filepath in rose-suite.conf to a path ending /enuk_um*/enukaa_pd*
- Run the workflow
- Look at the job.err log for the fetch_m1 task - it should show it loads the first enukaa_pd* files, but if it finds another file path with the same filename endpoint it will print a duplicate error.
Expected behaviour
This probably requires a discussion, Stephen Gallagher (@SGallagherMet) has suggested this could be a common issue for other meteorological centres producing trial output but may be intentionally coded.
Describe the bug
Ensemble trial data will typically produce directory structure as follows for a particular run:
20240609T0600Z/enuk_um_003, 20240609T0600Z/enuk_um_004, 20240609T0600Z/enuk_um_005. The number in each folder represents the ensemble member number, each one will have exactly the same filenames in them i.e. enukaa_pd000, enukaa_pd003, ... , enukaa_pd123, where the number represents the forecast lead time.
When passing file globs into the workflow (/enuk_um_0*/enukaa_pd*) it will fail because it sets up one directory and creates symbolic links to each of the files in this directory, using only the final part of the path (i.e. the file name) as the link name. So it will correctly expand the pattern and find all the files, but will only load the first set of enukaa_pd* files it finds. Files for subsequent ensemble members are treated as duplicates, so none of the other members are loaded in.
As a result, the plots generated are only of one realization despite the globbing pointing towards multiple members.
How to reproduce
Steps to reproduce the behaviour:
Expected behaviour
This probably requires a discussion, Stephen Gallagher (@SGallagherMet) has suggested this could be a common issue for other meteorological centres producing trial output but may be intentionally coded.