-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
put date in MOM output filenames #185
Comments
I'm not sure if this would require changes to the cosima cookbook, but if so they'd be small - the cookbook already supports dates in CICE output filenames. Probably only scripts would need to change, not the cookbook itself? |
If we decide to go ahead with this, should we apply it to all files already in |
Rather than moving the files to new filenames, we could hard- or sym- link. This would preserve existing workflows while also supporting BoM and CSIRO. |
I did this when creating ensembles of the IAF runs. I'll see if the script survived the transition to Gadi. |
Looks like they went kaput 22 hours ago. The surviving json files I have indicated the I was using names like |
Note that the dates can be done automatically via the |
Ah, good to know It would be good to include both starting and ending date, e.g. something like In any case, we would need to use something else if we want to process the existing outputs. |
You can dump files periodically, say monthly. This keeps file sizes under control and you can even exploit the parallelism when postprocessing if you want. You can run with variable numbers of months and across years seamlessly. We use entries like
to dump monthly files full of daily averages. |
Thanks @russfiedler, that's a great tip |
@angus-g am I right in thinking no code changes would be needed in the cookbook? |
@aekiss that's right, it doesn't matter what the filename is. You can still use |
I think this is a good idea. |
I think @russfiedler is on the right track. Save each month's data in a separate file, uniquely named, using the This is part of the configuration, so test it and once happy roll it out to the published config. I'd be using I'd would not support changing existing runs. Simply not worth the time/effort IMO. |
Each month? |
For the tenth monthly, as you never run for a year. For the quarter and tenth degree, probably yearly. This has the benefit that whatever the run length the duration of output files would be consistent. |
I think you want consistent sizes throughout the run. It makes checking things much simpler. I'd suggest monthly output and yearly for the others @aidanheerdegen Great minds think alike! |
True, but we sometimes use 3-monthly output ... |
Now you're just being difficult. |
Three monthly averaged output for the 0.1 or do you mean the others? You'd be putting those outputs in a separate file anyway. |
The entry for the 1 and 0.25 models would be something like "ocean_3mon%4yr",3,"months",1,"days","Time",1,"years" so you would have 4 entries per file. |
Yep, I have only ever used 3-monthly for the 01deg case... |
Ah, so that means you have to run for 3 or 6 month segments at the moment, right? You would have an entry like |
Yes, still running with 3-month segments... A 12-hour wall-time limit (or linear scaling up to 12,000 cores) would allow us to do a year at a time. |
@russfiedler looks like |
@aidanheerdegen Ta. I've popped the scripts to do the inking in |
start from /home/157/amh157/payu/01deg_jra55v13_ryf9091/archive/restart371 using the same config, but use IAF forcing, copied as needed from https://github.com/COSIMA/01deg_jra55_iaf/tree/3411eed79b5b55d8db7b5ddfcbfc111bc9e40abf for - accessom2.nml - atmosphere/forcing.json - config.yaml other changes: - disable all cice output - set up mom outputs - output scalars and 2d surface_temp and eta_t only - 4 hourly - use snapshots - include model date in file - see COSIMA/access-om2#185 - use openMPI4.0.2 executables /g/data/ik11/inputs/access-om2/bin/yatm_575fb04.exe /g/data/ik11/inputs/access-om2/bin/fms_ACCESS-OM_4a2f211_libaccessom2_575fb04.x /g/data/ik11/inputs/access-om2/bin/cice_auscom_3600x2700_722p_365bdc1_libaccessom2_575fb04.exe instead of the openMPI4.0.1 versions /g/data/ik11/inputs/access-om2/bin/yatm_1bb8904.exe /g/data/ik11/inputs/access-om2/bin/fms_ACCESS-OM_97e3429_libaccessom2_1bb8904.x /g/data/ik11/inputs/access-om2/bin/cice_auscom_3600x2700_722p_d3e8bdf_libaccessom2_1bb8904.exe The code differences shouldn't make any scientific difference https://github.com/COSIMA/libaccessom2/compare/1bb8904..575fb04 https://github.com/mom-ocean/MOM5/compare/97e3429..4a2f211 https://github.com/COSIMA/cice5/compare/d3e8bdf..365bdc1 https://github.com/COSIMA/oasis3-mct/compare/d02cc8d896..87a873aa7
I'm trying to come up with a consistent file naming convention for all MOM output at all resolutions in the new configurations. A key objective is to improve data accessibility by making it possible to determine what data is available (variables, temporal sampling, dates) by simply using Here's a proposed convention:
This order of components is designed to sort alphabetically in a helpful way. Examples:
achieved by these
Does that seem OK to people? (ping @AndyHoggANU, @aidanheerdegen, @russfiedler) I'm not sure whether we should to do something like this for CICE output too. Each file includes lots of static grid data so that's an argument to retain our current approach of saving many CICE variables per file. |
@angus-g would a large increase in the number output files cause problems for the COSIMA Cookbook? And will this file naming convention suit the way the cookbook concatenates files on the time axis (e.g. if the final filename component varies during a run)? |
The database part of the cookbook shouldn't have any issues with more files. There would probably be a small increase in the size of the database itself, but I can't see queries getting noticeably slower. For concatenation, the filenames themselves don't matter: the files are sorted by the start time obtained from the time dimension data. The only one thing I could see causing a change from the current behaviour is that we can't quite rely on the same form of filename-based disambiguation. It would be harder for the cookbook to suggest that a query is erroneous, but we can still pass patterns (like |
Wow, OK, I think I like it.
|
I've written a script to automatically generate With this, users will only need to modify a very clean and non-repetitive |
Unfortunately MOM insists on putting The Eliminating the leading underscore would be neater: Also notice that I've decided to retain the |
Example output files are here (this is a 3-month test run with monthly outputs):
|
You could add a prefix to the date field. |
That first form looks atrocious. |
maybe |
Yeah all I could come up with was |
I think judicious use of |
Also we may be unable to put anything after the date part of the filename.
produces files like So we may need to include the reduction method before the date, ie |
We could kill 2 birds with 1 stone my omitting the dash between reduction method and date, e.g.
ie consider the reduction method to be part of the date "field" |
So you could have the reduction method and date in the same field separated by an underscore ... or you could have all the date related stuff in a single field, like It is all completely arbitrary |
snap |
I prefer the latter, including all the date stuff in a single field. Seems consistent and neater. |
These approaches don't work for the scalar files which lack a lot of the date stuff, so I think the Examples: 1 file per field for 2d and 3d
all scalars in one file: edit: see below
static grid data in one file per field, with no date info:
|
Having the day in those monthly files seems completely unintuitive, unnecessary and ugly to me. You've got a 16 there for every file except Feb and that doesn't change in leap years so it serves no purpose. |
I agree - I was thinking the same thing. For snapshots that would mean the month will be at the end of the sampling period (ie the month after the sampling period), and other reduction methods will be in the middle (rounded down). so monthly sampling over 3 months (Jan-March) looks like this
and 3-monthly is like this
I guess that's not too confusing. |
At the risk of pedantry, should it be |
Yes, Maybe, Deprecated. I think the starting month is preferable for multimonth files since you have a far simpler relationship with the beginning and end of the time period. Besides, what if you have a 2 or 4 month file? Edit: Ah, this is for a 3 monthly mean not the individual months. The middle does make a lot of sense in that case as it coincides with the time in the file. |
@aidanheerdegen agreed - 1 char saved! @russfiedler this is just the standard behaviour of MOM with |
@aekiss Yes, before my edit I was thinking your example was the case |
with
|
I think shared scalars file should also include the output frequency (as it's a per-file setting), but omit the reduction method (as it's per-field), eg
|
closing - this has now been implemented in the |
At present all the MOM outputs have the same name for every run, e.g.
ocean.nc
.I propose we include the run date in the filenames, e.g.
ocean_1985_01_01
, which is what we already have in the CICE outputs.This gives many advantages:
This could be implemented as a post-processing step.
I'm not sure if the filename should include just the starting date or both starting and ending dates (the latter being confusing, as it is midnight of the day after).
The text was updated successfully, but these errors were encountered: