Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop Aether lon-lat interface #559

Closed
kdraeder opened this issue Oct 12, 2023 · 35 comments · Fixed by #635
Closed

Develop Aether lon-lat interface #559

kdraeder opened this issue Oct 12, 2023 · 35 comments · Fixed by #635
Assignees
Labels
Enhancement New feature or request

Comments

@kdraeder
Copy link
Contributor

Use case

Collaboration with Aaron Ridley (U Mich) requires development of an interface to his newly developed Aether upper atmosphere model. The first interface will be to the logically rectangular lon-lat grid version.
Later we will develop an interface to the cubed-sphere version.

Describe your preferred solution

We plan to use the tiegcm model_mod as a template representing DART's best practices.
We'll import pieces from gitm, which very closely match the Aether lon-lat restart file structure.
Checkout the "aether" branch to contribute work or see its status.
There are also rst files describing parts of the model and interface, written by @johnsonbk,
which have not been committed yet.

Describe any alternatives you have considered

Developing a model_mod, etc., from scratch. That seems inefficient, given the existence of a gitm interface,
which is closely related to Aether, and the tiegcm interface, which is a recently updated interface
to another upper atmosphere model.

@kdraeder kdraeder added the Enhancement New feature or request label Oct 12, 2023
@kdraeder
Copy link
Contributor Author

@kdraeder is merging the code which reads and writes restart files from gitm into aether.
These can be included and tested using aether_to_dart.f90 and dart_to_aether.f90
without most of the rest of the model_mod being adapted to aether.
(I renamed those programs to use 'dart' instead of gitm's 'netcdf', to be more consistent with other models.)

I believe @johnsonbk will work on translating the interpolation code from the existing Aether C++
and merging it into the model_mod.

@kdraeder
Copy link
Contributor Author

I've merged GITM routines into the aether_lon-lat model_mod, but need to update the model_mod header
to accomodate them. Some of it is grid information. While deciding what the names should be,
I found that "restart" and "output files have different names for dimensions and variables,
(and different variables in some cases).

Contents
  outputs:
               Bulk Ion Velocity ({Zonal,Meridional,Vertical})
  restarts have them for each species
    {Parallel,Perp} Ion Velocity ({Zonal,Meridional,Vertical}) ({O+,...})(x, y, z) ;
Names
-  Don't comply with the CF Metadata Conventions: 
          float Parallel\ Ion\ Velocity\ \(Zonal\)\ \(O+\)
           ->  "Parallel Ion Velocity (Zonal) (O+)"
-  Are different in "restart" and "output" files  (except for "z", I don't know why it's not "alt" in output)
     restart:  float Temperature\ \(O+\)(              x, y, z) ;
     output:  float O+\ Temperature     (block, lon, lat, z) ;

My leaning is to implement the restart dimension names in the aether_to_dart and dart_to_aether code trees
and the output names in the rest of the model_mod (interpolation, get_close, ...).
But I see arguments for adopting just one set of names.
Thoughts?

@kdraeder
Copy link
Contributor Author

@johnsonbk @hkershaw-brown
The Aether restart fields are divided among 2 types of files: neutral and ions.
They are further divided among the "blocks", which are subdomains of the globe.
All of these need to be combined to make a single state vector,
and there's a unique set of these files for each member.
Program aether_to_dart will read all the restart files and repackage them into an ensemble state vector.
Is this a situation that's well-suited for defining 2 domains; one for neutrals and one for ions?
The template model_mod we're using already has multiple domains defined,
so it seems that it would be straightforward to adapt them.

@nancycollins
Copy link
Collaborator

the aether_to_dart program is going to create N netcdf files (one per ensemble member) that filter will read. are there any fields that have the same name in the neutral and ion files? do they cover different parts of space? if not, it seems like it would be better to read in both the neutral and ion data blocks and create a single netcdf output file per ensemble member and have only a single domain. then filter only has to read one file per member instead of 2, the namelist input for reading fields can be simpler, and the interpolate code can be streamlined.

@kdraeder
Copy link
Contributor Author

Thanks for the quick response!
Clearly I was working too late; part of my brain knew that there will be an ensemble of filter input files.

There's no overlap of field names in the 2 files, so they can be combined in aether_to_dart.
But then there needs to be a mechanism in dart_to_aether for reading the filter output (analysis) files
and separating the fields into the neutrals and ions files. Domains look like a good candidate,
but I'll go ahead with the combined strategy.

@nancycollins
Copy link
Collaborator

If the model itself used multiple netcdf files as input, then domains would help since each domain specification includes the input filename and filter would read them directly. But since these netcdf files are transient, you can structure them any way you want. Simpler seems better. You're right - the dart_to_aether program will need to know which netcdf variables go into which files, but that seems not too bad. You have to create the netcdf variable names in aether_to_dart since I'm assuming the block files don't have any metadata, so both parts of the packing and unpacking are under your control.

@kdraeder
Copy link
Contributor Author

The model does expect multiple restart files: {ions,neutrals}x #blocks x #members.

The block files have units metadata, but no field names.

@kdraeder
Copy link
Contributor Author

@johnsonbk @nancycollins
Aether_to_dart needs the time of the restart files.
The time in the sample data files is 0, but there's a time.json file with time 1458347400.0
That's apparently in seconds from some calendar start time.
The sample output time.json has the same time, and a date in the filename of 20110320_000000.
So time = 0 seems to be in Sep. 1964. Does anyone recognize that date?
The closest thing I can think of is the start of the UNIX clock, but that's Jan 1 1970.

It looks like aether will need a custom read_model_time to get the time.
Is there a "best" way to do that using fortran, other than reading it as a text file?

Does DART have code to translate a time from a system with an arbitrary "time=0"
to the gregorian calendar? I don't see that capability in advance_time.
The UNIX date command may be useful.

@jlaucar
Copy link
Contributor

jlaucar commented Oct 20, 2023 via email

@nancycollins
Copy link
Collaborator

Re: multiple files

The model does expect multiple restart files: {ions,neutrals}x #blocks x #members.

Sure, that makes sense. But if the model directly read and wrote multiple netcdf format files, then domains are a way filter can read data from more than one file to build the state. Since you have to make a translator program to stitch the blocks together, it isn't necessary to use multiple domains.

You are going to read in the data once, and write it once. For domains, you would look at interpolate and get close and see if there is any advantage to having multiple domains. If not and if you have multiple domains, each call needs to look up which domain a variable is in before it can operate, and all calls need to pass around a domain number besides a variable name/number/offset. That argues against using them without a good cause.

Re: time
If the file is text, i'd think just calling file_open() (a DART utility routine) and reading it as text and then using the time manager operators in the read_time routine would be good. I don't think you need to use any command line programs for it. Inside the DART time routines there are calls to set a time using Y/M/D/H/m/S for the base time, and then add and subtract with time structures to get the intervals.

@kdraeder
Copy link
Contributor Author

The base date is the hangup.
I've found some probably helpful code in Aether's include/constants.h
const int cREFYEAR = 1965;
const double cREFJULIAN = 2438762.0;
const double cJULIAN2000 = 2451545.0;
If the units are days, then the time=0 is ~6700 years ago.
There's no REF for gregorian.

And in planets.cpp:
nEarthDaysSince2000 = (time.get_julian_day() - cJULIAN2000)
which implies that "julian_day" is not the day of the year,
but is the number of days since the start of the calendar.
Elsewhere in the code it refers to "true" julian day as the number of days since the start of the calendar,
and plain julian day as the day of the year.

The difference between the 2 julian reference dates is 35.0 years,
so cREFJULIAN represents cREFYEAR.

If assume a few things, I can line up this calendar with the gregorian,
but it's probably better to ask Aaron. @johnsonbk can you do that soon, or should I do it?

@kdraeder
Copy link
Contributor Author

In the short term we only need to know the meaning of the times found in the time.json files
(and maybe in the output files. The restart files have time=0).
Using the number of seconds in the time.json file and the date stamp on an output file yields
$ echo 2011032000 -1458347400s | ./advance_time
196412312330
This is nominally the start of 1965, but maybe has some kind of time step adjustment to make Aether happy.
I've defined an aether_ref_date with this value in the model_mod,
so times read from time.json can be translated into gregorian (days,seconds).

@kdraeder
Copy link
Contributor Author

@hkershaw-brown @nancycollins
I'm at the stage where I need to decide how aether_to_dart will handle the ensemble.
I don't know all of the possibilities, so please point me in the right direction.
It seems that all members should be done in parallel, since they are completely independent.
Is there a template for a {model}_to_dart that's an MPI program?
Or a template script that runs multiple {model}_to_darts in parallel?
GITM doesn't seem to do either of those, even though it has the same subdomains.

Then it seems that each member should be able to read all its restart files
in parallel on different processors (hopefully on the same node)
and write to its section of the filter_input.nc file.

@hkershaw-brown
Copy link
Member

@hkershaw-brown @nancycollins I'm at the stage where I need to decide how aether_to_dart will handle the ensemble. I don't know all of the possibilities, so please point me in the right direction. It seems that all members should be done in parallel, since they are completely independent. Is there a template for a {model}_to_dart that's an MPI program? Or a template script that runs multiple {model}_to_darts in parallel? GITM doesn't seem to do either of those, even though it has the same subdomains.

Do the aether files have multiple ensemble members in them?
If not, aether_to_dart does not need to know about the ensemble.
read in aether files for a single ensemble member X, write out filter_inputX.nc

Do you have this already? Is so run ens_size of this program to go parallel. Or just do it in loop for now.

Then it seems that each member should be able to read all its restart files in parallel on different processors (hopefully on the same node) and write to its section of the filter_input.nc file.

Not quite sure what you are getting at here, but
filter_nml 'filter_input{output}_list.txt' is a list of filter_input{output}X.nc (X = 1, ens_size). filter takes care of reading and writing these.

@kdraeder
Copy link
Contributor Author

There are separate restarts for each member, so running ens_size of aether_to_dart seems to be the best.
I seem to remember that it's challenging to put each of those on a separate node.

Yes, filter handles the ensemble of filter_inputX files just fine.
I'm still at the stage of creating those files.
Aether has 2 restart files (ions and neutrals) for each subdomain of the globe.
So there are 2 x #subdomains files to be read for each member's filter_input. #subdomains ~ O(100).
There will be lots of processors available, so I'm hoping that all those files can be read in parallel.
It seems best if they can all be on the same node to minimize communication.
And having each member on a different node would speed things up.

@hkershaw-brown
Copy link
Member

I'd recommend writing the serial version first.

@nancycollins
Copy link
Collaborator

i think at this point i'd get the converter programs running and worry about optimizing i/o at a later time once you can run them and measure the performance. if it's a problem, things can be done with scripting changes once you have the running programs. it seems like getting the blocks into a netcdf file and getting variables from the netcdf file back into blocks is a good first goal.

@nancycollins
Copy link
Collaborator

I'd recommend writing the serial version first.

i don't think we've got any MPI-enabled converters in the system. you write a serial one, you run it serially N times to test and time it. if it takes a significant fraction of the run time, you change the batch scripting to run N copies of it, one for each ensemble member, in parallel. cheap 80-way parallelism without having to write an MPI program! :)

@kdraeder
Copy link
Contributor Author

Perfect! I didn't want to head off in the wrong direction.

@kdraeder
Copy link
Contributor Author

@nancycollins @hkershaw-brown
I'm trying to read a float field from a netcdf file using nc_get_variable.
The module procedure that is being chosen is nc_get_double_3d.
Then the call fails:
get values for Longitude: NetCDF: Start+count exceeds dimension bound.

Why is the double version chosen?
Is there a way to specify the nc_get_real_3d?

If you think that the error message is not related to the procedure choice,
I'll send the call and the argument definitions that I've tried.

Thanks.

@hkershaw-brown
Copy link
Member

temp is r8 that is why nc_get_double_3d is being called

real(r8), allocatable :: temp(:,:,:)

! Read 3D array and extract the longitudes of the non-halo data of this block.
call nc_get_variable(ncid, 'Longitudes', temp, routine)

The Start+count error means this size of what you are reading does not match the variable you've given the read call

@kdraeder
Copy link
Contributor Author

Thanks for digging into it! My fortran is rather rusty.

I changed temp to r4 and it still failed, so I explicitly set the index ranges, starts, etc.:

allocate(temp(1-nGhost:nxPerBlock+nGhost, &
              1-nGhost:nyPerBlock+nGhost, &
              1:nzPerBlock))

starts(1) = 1-nGhost
starts(2) = 1-nGhost
starts(3) = 1
ends(1)   = nxPerBlock+nGhost
ends(2)   = nyPerBlock+nGhost
ends(3)   = nzPerBlock
xcount = nxPerBlock + 2*nGhost
ycount = nyPerBlock + 2*nGhost
zcount = nzPerBlock
    call nc_get_variable(ncid, 'Longitude', &
         temp(starts(1):ends(1),starts(2):ends(2),starts(3):ends(3)), &
         routine, &
         nc_start=(/starts(1),starts(2),starts(3)/), &
         nc_count=(/xcount,ycount,zcount/))

but it's still failing in the same way.

@kdraeder
Copy link
Contributor Author

I think that I found the problem. The Longitude array has 44 altitudes,
but I dimensioned temp to be 40, the number the model uses with no vertical subdivisions,
which is used elsewhere.
So don't spend any more time on this bit.

@kdraeder
Copy link
Contributor Author

I was wrong. There's something more subtle going wrong (or so obvious I can't see it).
I've simplified the situation to the following.
Here's the dump from the file I'm reading

dimensions:
        x = 22 ;
        y = 22 ;
        z = 44 ;
variables:
   float Longitude(x, y, z) ;

Here are the relevant lines from the reading routine.

real(r4) :: simple (22,22,44)
ncid = open_block_file(dirname, 'grid', -1, nb-1, 'read', filename)
call nc_get_variable(ncid, 'Longitude', simple, routine)

open_block_file seems to work:
open_block_file Opening file /Users/raeder/DAI/Manhattan/models/aether_lon-lat/testdata1/restartOut.Sphere.1member/grid_g0000.nc for read
But the nc_get doesn't:

ERROR FROM:
 source : netcdf_utilities_mod.f90
 routine: nc_get_real_3d
 message: get values for Longitude: NetCDF: Start+count exceeds dimension bound
 message: ... get_grid_from_blocks

@nancycollins
Copy link
Collaborator

if reversing the order of the dimensions works, then i only know this from previous horrible experiences with it.

@kdraeder
Copy link
Contributor Author

@nancycollins @mgharamti
Yes, the order of the dimensions was the problem.
And ncdump -h -f [f,c] $file.nc lists the same order of dimensions for f and c.
Thanks for the tip!

@kdraeder
Copy link
Contributor Author

Using the number of seconds in the time.json file and the date stamp on an output file yields $ echo 2011032000 -1458347400s | ./advance_time 196412312330 This is nominally the start of 1965, but maybe has some kind of time step adjustment to make Aether happy.

On second thought, it's more likely that the date stamp in the file name is off by 30 minutes.
Something like that happens in CESM history files which are named according to
the time of the first data (or something else that gives them a misleading name).
Aaron says that the reference time is intended to be 1965-01-01 0 UTC, so I'll change to that
and the model may need to make a correction to the file names.

@kdraeder
Copy link
Contributor Author

@nancycollins @mgharamti Yes, the order of the dimensions was the problem. And ncdump -h -f [f,c] $file.nc lists the same order of dimensions for f and c. Thanks for the tip!

Since I'm reading in the data in (z,y,x) order, would it break any rules or efficiencies
to write it out that way to filter_input.nc (with the proper dimensioning)?
That would avoid having to do the transpose to (x,y,z) (and back again, in dart_to_aether).

@kdraeder
Copy link
Contributor Author

@nancycollins confirmed that I should not transpose the data; just read the block NetCDF file
with its C dimension order and write the data to filter_input.nc in the same order.

We also found that ncdump on Macs is broken; -f f and -f c show the same dimension order.
-F is unrelated to fortran and C.
Variables listed by ncdump as (x,y,z) must be read into a fortran array dimensioned (z,y,x).

@kdraeder
Copy link
Contributor Author

kdraeder commented Jan 24, 2024

@hkershaw-brown @nancycollins

I'm confused by the selection of QTYs that can be associated
with the Aether variables. I made some choices in the early rush
to get something working, but now I'd like to figure out if they were good choices.
I don't even know how to decide whether it matters (much).

My first problem is interpretting what the available QTY's represent.
I haven't found a key to decipher the parts of, e.g. QTY_DENSITY_ION_O2DP .
O2 could mean 'oxygen molecule', D could mean an extra or missing electron
from the D orbital, P could be similar, or mean 'positive'.
Or O could mean 'oxygen atom' with 2D or 2DP meaning something.
Comments in space_quantities_mod.f90 or in a docs.dart page would be helpful.

The Aether variable 'velocity_parallel_up\ (O+)' could potentially have these existing QTYs
associated with it:
QTY_VELOCITY_W
QTY_VELOCITY_W_ION
QTY_VERTICAL_VELOCITY
(There might be more; these were harvested from the list that came out of preprocessor for this case)
or maybe it should have a new QTY like the existing QTY_VELOCITY_VERTICAL_O2:
QTY_VELOCITY_PARALLEL_VERTICAL_OP
This last seems safest, since each ion has its own 2 vertical velocities.
But I don't know how they'll be used, so maybe a simple, generic QTY for all the ions is fine.

@johnsonbk
Maybe the underlying question is "what obs should we prepare for people to use?"
If there are obs of individual ion species, then we need different QTY(s) for each.
If not, is it OK to use the same QTY for multiple ion species?
Will the forward operator use contributions from all of those species?

I've attached the file issues_QTY.txt which gives more details.
I'll remove it from models/aether_lon-lat.

@kdraeder
Copy link
Contributor Author

kdraeder commented Feb 1, 2024

The existing aether_to_dart derives the names of the Aether restart files
from the member number read from standard input, either passed to it from a script or entered by the user.
One suggestion from the code review is to replace that.
(At the moment I've just added a prompt, so that user's will know what to do.)

One alternative is, one member at a time, for the script (or user) to link the restart files
to names that don't have the member number in them, so that aether_to_dart would always read
from the same file names. There are 2 x #blocks files in each member's restart set
(e.g. 400 for a 10x20 block domain), which isn't too hard to code into a script,
but is a pain for a user to link when they need a filter_input.nc file outside of the assimilation work flow.
Is there a better alternative?

@hkershaw-brown
Copy link
Member

Let's focus on getting the aether_to_dart dart_to_aether subroutines out of the model_mod. Once that is done we can sort out what the scripting.

@kdraeder
Copy link
Contributor Author

kdraeder commented Feb 7, 2024

Here's the issues_QTYs.txt file mentioned in the comment ~2024-1-25.
The contents have some of that comment, and additional context.

issue_QTYs.txt

@kdraeder
Copy link
Contributor Author

kdraeder commented Feb 8, 2024

@hkershaw-brown
This might be ready for a PR. @johnsonbk and @kdraeder have made the changes
recommended in code review 2 and they're pushed to NCAR/DART:aether.
Is there a list of things to do before issuing a PR?

@hkershaw-brown
Copy link
Member

@kdraeder you can go ahead and make the PR if you're ready.

Best practice would be to update aether with the main branch, before you make the PR. But we can just do that in the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants