Skip to content
This repository has been archived by the owner on Apr 18, 2018. It is now read-only.

New Package NF90IO: NetCDF writing for diagnostics package #15

Closed
wants to merge 6 commits into from

Conversation

jklymak
Copy link

@jklymak jklymak commented Jul 22, 2017

Preface

Developers: First, feel free to ignore if you are not ready for "real" pull requests yet. I thought at the worst case you could have a full pull request to look at. I'm more than happy to resubmit this at a more appropriate time.

Description

This is a rudimentary implementation of the netcdf4 parallel writing, so that each tile in an mpi process writes to the same file, as described: https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf-f90/

Currently it only supports writes using pkg/diagnostics. Set diag_nf90io=.TRUE. in data.diagnostics.

Modifies genmake2 to test for NF90. Also changes .f.o: Makefile directive to have INCLUDES as a flag for the compiler to accommodate use netcdf commands in the *.f files.

See verification/testNF90io/ for a basic example.

See the README.rst in pkg/nf90io for a possible manual entry.

Todo:

  • Add a test to genmake2 that tests if hdf5 and netcdf have been compiled with parallel support: This may be beyond my expertise. I note that autoconf has such a test, and maybe their macro could be used.
  • Write an appropriate test for automated testing: ditto the above. I've tested this, but I'm not quite sure how one does parallel tests on Travis CI etc: See verification/testNF90io
  • Add to the documentation. (Not sure I should do this now given the state of the new docs)
  • Come up with a way to extend to other packages. How does mnc do the time alignment between the different packages.
  • Redo the test case to use diagnostics
  • Remove some debugging print statements.
  • Remove or add flag to toggle the statevar dump. Probably just add a flag and have it default to .FALSE.

@jklymak
Copy link
Author

jklymak commented Jul 23, 2017

Tested on a local mac (see instructions here)

Tested on conrad.navydsrc.hpc.mil using the cray compiler and module load cray-netcdf-hdf5parallel. This run used 64 cores just to test that multiple login nodes can write to the same file.

@mjlosch
Copy link
Contributor

mjlosch commented Jul 24, 2017

Hi Jody,
this looks great. I personally think that the only package, where this should be implemented (to save time and code) should be the diagnostics package. The diagnostics package is intended to do all output, for every package and the generic output is there for historical reasons.
Within the diagnostics package it is possible to define different output streams, each with different third dimension (which does not have to be the vertical) and different "dumpFreq". Streams can either be averages or snapshots. Currently each stream has its own output file. The simplest solution would be to keep it that way, especially in order easily differentiate between averages and snapshots.

PS. all code changes to this test repository will be lost. As far as I understand, this is just a testbed before the real migration starts.

@edoddridge
Copy link
Contributor

This looks fabulous! I agree with @mjlosch - the diagnostics package is the natural target.

As Martin mentioned, we won't migrate any code changes in this repository when the real switchover happens - a new git repo will be generated from the cvs when that happens. This repo is a testbed for experimenting with git and setting up new things like the testing on Travis. Not preserving the changes gives us the freedom to make mistakes and not worry about it.

A quick comment on commit messages: ideally commit messages should have a short first line (<70 characters), followed by a blank line and a more detailed description if required. If the first line is longer then it gets truncated when viewing on GitHub and using git log in the terminal.

@jklymak
Copy link
Author

jklymak commented Jul 24, 2017

@edoddridge , @mjlosch Glad it will be of use.

First, no prob. about this all being wiped when you move to the real version of the git repository. I still find this easier to keep track of what changed than CVS, and I somewhat hope I can just merge the new master branch when it comes along.

I'll take another look at diagnostics. I've always found it quite hard to understand, and have never convinced myself to use it seriously. This has also likely been because I've historically (i.e. 10 years or so ago) had trouble getting netcdf to compile, and just gotten in the habit of using the mds files.

@christophernhill
Copy link
Contributor

@jklymak this looks like a great start - thanks. Aside from fact the repo is a temporary one I think it is good to merge this, even before thinking about diagnostics. Package diagnostics internally supports several different time frequencies, a few more sorts of fields (scalar, 1d-vector etc...), parameterized names, units etc... So that would be a bit more work.

Might be good to get something simpler, like what you have now, in first. Is there an existing (or new) simple test case that this could go well with? If so we can have a go at travis, genmake etc..

@jklymak
Copy link
Author

jklymak commented Jul 24, 2017

@christophernhill

I'm happy to do the work to get it into diagnostics, so if you guys decide you prefer it just reside in diagnostics, I'm happy w/ that. My quick look at diagnostics this AM makes it clear that it shouldn't be too hard conceptually. I'd forgotten that each filename has its own frequency, so I think it'll be pretty easy to lay out the netcdf files. In fact, I think its clear that diagnostics was designed to be compatible w/ netcdf files. Coding wise, I'll have to see how hard it is ;-)

If we merge the stuff I already did, but then also have an option to write from diagnostics I probably need to add a flag in write_state.F to only write the simple file if the flag is .TRUE.. Right now, all I check is that the package is being used.

A small test case for what I've already done is easy and I'll minimize the one I have already so its suitable.

Conversely, getting genmake2 to test for parallel-netcdf sounds hard, and I may need help. autoconf has a macro fr this check; I don't understand autoconf, but I imagine the macro could be used for genmake2.

@christophernhill
Copy link
Contributor

@jklymak the autoconf macros link was helpful - thanks. It looks like it mostly uses the nc-config program to query stuff (see below for an example where netcdf wasn't built with parallel etc..). We can do something similar in genmake2 here.

-bash-4.1$ nc-config --all

This netCDF 4.4.1 has been built with the following features: 

  --cc        -> gcc
  --cflags    ->  -I/cm/shared/engaging/netcdf/4.4.1/include -I/cm/shared/engaging/hdf5/1.8.17//include
  --libs      -> 

  --has-c++   -> no
  --cxx       -> 
  --has-c++4  -> no
  --cxx4      -> 

  --fc        -> 
  --fflags    -> 
  --flibs     -> 
  --has-f90   -> no
  --has-f03   -> no

  --has-dap   -> yes
  --has-nc2   -> yes
  --has-nc4   -> yes
  --has-hdf5  -> yes
  --has-hdf4  -> no
  --has-logging-> no
  --has-pnetcdf-> no
  --has-szlib -> 

  --prefix    -> /cm/shared/engaging/netcdf/4.4.1
  --includedir-> /cm/shared/engaging/netcdf/4.4.1/include
  --version   -> netCDF 4.4.1

@jklymak
Copy link
Author

jklymak commented Jul 24, 2017

@christophernhill Hmmmm. Except /usr/local/bin/nc-config --has-pnetcdf yields no for me. So maybe their parallel test is out of date?

When compiling netcdf, there is a test: nc_test4_run_par_test, but it takes 27 s to run.

However, maybe a check of hdf5 is all thats needed - netcdf checks that hdf5 is parallel when it compiles (ummm, I think), and /usr/local/bin/h5pcc -showconfig yields

Features:
---------
                  Parallel HDF5: yes
             High-level library: yes
                   Threadsafety: no

(see https://www.gnu.org/software/autoconf-archive/ax_lib_hdf5.html#ax_lib_hdf5)

@jklymak
Copy link
Author

jklymak commented Jul 24, 2017

Note that I'm not using the parallel-netcdf package, which is a thing I guess. I'm just using vanilla nf90

@jklymak
Copy link
Author

jklymak commented Jul 24, 2017

@edoddridge , @mjlosch, @christophernhill

Looked over pkg/diagnostics again.

Its seems a shame that (1, Ny, Nx) fields can't be mixed with (Nr, Ny, Nx) fields. I think if I write this, I may allow two filenames to be the same so long as the absolute frequency is the same so that 2-D variables and 3-D state variables could be saved in the same file. So one could have variables in the netcdf file that are (1, Ny, Nx), (Nr, Ny, Nx), or (1) for their non-unlimited dimensions.

I'll be honest, I think that (nlevel, Ny, Nx) diagnostics, where nlevel was a bit of over-design. I can see how having a few slices at different depths would be great, but I am not convinced that couldn't be just as well served with nlevel separate diagnostics that the user defines. It certainly doesn't fit well into a netcdf paradigm. My tendency would be to ignore such variables, but I'm willing to be overruled on that if I'm told the community makes massive use of these.

So if I had a data.diagnostics:

&diagnostics_list 
 frequency(1)  = 30.0,
 fields(1:3,1) = 'UVEL    ',
 	       	 'VVEL    ',
		 'WVEL    ',
 filename(1)   = 'statevars'
# 2D vars
 frequency(2)  = 30.0,
 fields(1:2,2) = 'ETAN    ',
 	         'PHIBOT  ',
 filename(2)   = 'statevars'
# some slices
 frequency(3)  = 300.0,
 levels(1:5,3) = 1.,3.,5.,7.,9.,
 fields(1:2,3) = 'VVEL    ',
 	         'UVEL    ',
 filename(3)   = 'slices'
/

Then the file statevars.nc would have all the grid data and both the UVEL, VVEL, WVEL velocities in it as well as ETAN and PHIBOT for each timestep. If frequency(2) was different than frequency(1) then an error would be thrown since the user had chosen the same filename for each diagnostic.

The third diagnostics would refuse to output as a nf90io diagnostic because there is no clean way to save this in a netcdf file.

What other issues? I guess if two diagnostics reference the same diagnostic field. i.e. diagnostic 1 and 2 both reference UVEL, and one has frequency(1)=3600 and the other has frequency(1)=-3600 I'd propose to store both these in the same file, but maybe that is getting too fancy. I'd sure hate to have names like UVELD001, UVELD002 etc to differentiate what diagnostic the field came from. My suggestion to a user who wanted two different versions of UVEL would be to save these as two different netcdf files.

@christophernhill
Copy link
Contributor

@jklymak

my comments are below, be good to get comments from at least @jm-c too though. I am not a big user of netCDF so am certainly missing some considerations.

  • mixing 2D and 3D sounds OK.

  • definitely UVEL with frequency(1)=3600 and frequency(1)=-3600 is a pattern that is used. Supporting these with requiring different netCDF file names seems fine to me.

  • I am not clear why sub-sampled level slices (or lateral slices) is a problem. For a first pass leaving it out for a later contribution is fine. However, it looks to me it could have grid with gaps e.g. z=rc([1 3 5 7 9]), delr=delr([1 3 5 7 9]) etc... for your example? I think netCDF would accept that. Plotting and analysis tools would have to do the right thing, but that is always true.

Finally: we were thinking we might try and create an experimental/pr-15 branch for this and then accept pull requests against that (e.g. we could do something like

 $ git clone git@github.com:altMITgcm/MITgcm66h.git
 $ cd MITgcm66h
 $ git checkout -b experimental/pr-15
 $ git pull https://github.com/jklymak/MITgcm66h.git addNF90IO
 $ git push

). This would be an experiment in how to absorb/help coordinate work in progress stuff. The thinking was that others could then also maybe contribute/modify, in a slightly coordinated fashion. If it works well it would get merged/rebased into master when ready. Otherwise, if the code turned out to be too hard, or flawed in some way we would all move on! We are not sure if this makes sense so we are waiting to see what the git literati have to say - this may not be a strictly orthodox approach. What do you think? The more git way is probably to delegate development, have people do pull requests to @jklymak, and then wait for a PR until collation, testing etc... are complete at https://github.com/jklymak/MITgcm66h/tree/addNF90IO. That seems like it could prove to be a bit of an unwieldy and unrealistic way for folks to chip in?

@jklymak
Copy link
Author

jklymak commented Jul 25, 2017

@christophernhill

My experience with different projects on GitHub is pretty minimal, again, mostly just matplotlib.

What matplotlib does, is "assign" a reviewer, who provides feedback for the pull request until it is "mergeable" with the main code, and then it is merged by the reviewer. If the reviewer needs help from other maintainers, they flag them in the PR comments. I think the reviewers are actually self-assigned, so its up to the contributor to attract a reviewer in their comments.

Note that one can go line by line and add comments to the PR using the GitHub GUI. Its super useful, particularly for new contributors, to see how the existing maintainers would handle the same code.

If a contributor wants to keep a PR open, but aren't done with it, then they can always flag as WIP (Work in Progress) and then the maintainers know to not worry about merging it.

I'm not sure what folks do for two contributors working on the same PR. In that case, I expect there would be a branch "expt-feature" that pull requests were made on, and then the branch would be rebased when its ready to go.

@christophernhill christophernhill changed the title New Package NF90IO: Rudimentary Netcdf90 implementation for writing parallel netcdf files. WIP: New Package NF90IO: Rudimentary Netcdf90 implementation for writing parallel netcdf files. Jul 25, 2017
@christophernhill
Copy link
Contributor

@jklymak thats cool. I edited
New Package NF90IO: Rudimentary Netcdf90 implementation for writing parallel netcdf files.
to
WIP: New Package NF90IO: Rudimentary Netcdf90 implementation for writing parallel netcdf files.
hope that is OK. Want to see if that has any weird side-effects for you too!

I think its useful to have the pieces in a PR as they are, for example we can try running genmake2 and testreport against it. But it looks like things might change a bit, so maybe this would be a WIP: type thing - albeit in our play repo for now!

@jklymak
Copy link
Author

jklymak commented Jul 25, 2017

@christophernhill

This is fine. OTOH, the bulk of the pull request is complete. Whats needed is items 1 and 2 in my TODO above, aproper Travis test and a proper genmake2 test. But those are conveniences for end users rather than fundamental, so the pull request could be merged (into the play repository). Whether you as a reviewer were willing to accept it w/o the conveniences depends on how strict you want to be ;-)

What I mean here, is that new features and code need not be user-ready when they are accepted into the main repository. They just need to work, not break other things, and go in a direction that the maintainers are going to be happy with. Future pull requests can add documentation and new user-facing features. If someone else trips across the package and notices there is no documentation, or they want the interface tweaked, they can open an "Issue". Of course its in my interest to provide documentation etc so that users make use of my work. I think the only possible drawback of this is having a lot of orphaned code in your code base. Someone would have to go through every once in a while with some pruning shears and tidy up.

The test is in verification/testNF90io (branch = addNF90io). No idea how to add to any of your automated tests, though I could spend time to figure it out. @edoddridge did you have an example I should look at for adding a test?

@jm-c
Copy link
Contributor

jm-c commented Jul 25, 2017 via email

@christophernhill
Copy link
Contributor

@jklymak what you say is correct. We are still figuring out the right workflow. I think at present it looks like doing things in diagnostics will be where this ends up. So this PR is somewhat on track to be redone/iterated on quite a bit, hence WIP:? We do have a plan for using it for genmake2 and travis/testreport testing of PRs - while it is in WIP: PR state. So it is definitely useful as a test of workflows etc... and we do want to accept things!

BTW - some small things in test,

  • in runs/ you have a .DS_store file
  • runs/ is typically called run/
  • we think we are going to .gitignore on run/ and build/, so python
    and README in runs/ would be better in input/ I think

@jklymak
Copy link
Author

jklymak commented Jul 25, 2017

@christophernhill

That organization structure makes sense to me. I didn't mean to upload anything in runs/ anyway.

One could also argue if its a good idea for gendata.py to put things in run/. I think it is a good idea so that the model steup files are saved with the model output. But of course other folks may have other organization strategies, and I'd be happy making sure that this test is consitent.

You could probably add .DS_store to the global .gitignore for us silly mac folk ;-)

@edoddridge
Copy link
Contributor

I second @jm-c's point about keeping the PR thematically clean - let's deal with making parallel netCDF files, and think about reusing filenames later.

@jklymak - regarding tests, we're still at the "get tests to run" stage, and will progress to the "make tests tell us when they fail" stage at some point in the future.

In the meantime, you might find this interesting (CAUTION, this isn't something that has been discussed, so I wouldn't start investing energy in developing tests using this framework). It's a set of python based tests for a very simple isopycnal model. Essentially, the test functions run different configurations of the model, compare the output with stored output and fail if the difference is above a chosen threshold. Using the python unit testing infrastructure means it's easy to tell Travis-CI (or users) when tests fail.

@christophernhill
Copy link
Contributor

@edoddridge and @jklymak there is an issue on testing here #13.

@jklymak
Copy link
Author

jklymak commented Jul 26, 2017

OK so almost done. Still WIP. diagnostics now outputs global netcdf files. It respects nLevOutp by creating a new netcdf dimension k_level if nLevOutp is not 1 or Nr.

I didn't monkey with whether different diagnostic lists can write to the same file.

This version does not clobber an old file, and there is (currently) no flag to control this behavior. It appends new records if the file exists. If the file exists and is obviously incompatible in dimensions with the new file NF90 will throw an error. I suppose this error might be mysterious to users and some effort could be put in to check for file compatibility in the program. The reason for the default is if someone restarts from pickup, they can write to the same file that the rest of the data was written to. happy to discuss/rejigger any/all of this behavior if folks prefer something else.

Todo:

  • Redo the test case to use diagnostics
  • Remove some debugging print statements.
  • Add some proper debugging output.
  • Remove or add flag to toggle the statevar dump. Probably just add a flag and have it default to .FALSE.

@jklymak
Copy link
Author

jklymak commented Jul 26, 2017

@edoddridge , @christophernhill , @mjlosch

Design question: If we just make nf90io available to diagnostics then we don't need a separate data.nf90io - a flag in data.diagnostics is all thats needed. OTOH, if I do away with data.nf90io then we can't dump the raw state variables (from write_state.F). Should we keep the write_state.F capability, or remove it so that there is one fewer data files, particularly if the hope is to deprecate non-diagnostics i/o?

@jklymak jklymak force-pushed the addNF90IO branch 2 times, most recently from bebd176 to a76bd95 Compare July 26, 2017 18:41
@jklymak jklymak changed the title WIP: New Package NF90IO: Rudimentary Netcdf90 implementation for writing parallel netcdf files. New Package NF90IO: NetCDF writing for diagnostics package Jul 26, 2017
@jklymak
Copy link
Author

jklymak commented Jul 26, 2017

OK, this is basically feature complete, I think.

I took out the write_state.F call, so it only outputs with pkg/diagnostics

I screwed up my .git repo and somehow included one of @efiring commits. Its a very small commit, so I doubt it matters to evaluate what I did here.

Please see modified comments above for remaining issues.

@jklymak
Copy link
Author

jklymak commented Jul 27, 2017

@christophernhill , @mjlosch, @jm-c

Added a test to genmake2 to see if NF90 files will compile. To do this, instead of doing the cat thing that is done in genmake2 to create source files, I simply created a subdirectory in tools/maketests and put the file in there (tools/maketests/f90tst_parallel.f90) and then reference that file in genmake2. That seems cleaner to me than having a bunch of source code sitting around in genmake2.

I also modified how .f.o: works from:

.F.o:
	\$(FC) \$(FFLAGS) \$(FOPTIM) -c \$<

to

.F.o:
	\$(FC) \$(FFLAGS) \$(INCLUDES) \$(FOPTIM) -c \$<

This allows use modulename to be supported.

The only small annoyance I still have is that there are some f90mkdepend errors:

WARNING: f90mkdepend: no source file found for module netcdf
WARNING: f90mkdepend: no source file found for module netcdf

but they don't seem to hurt anything, so...

@christophernhill
Copy link
Contributor

christophernhill commented Jul 27, 2017

@jklymak I will try and test this on some local systems today. We still don't have all the bits to test automatically on cluster platforms and with various different licensed software (Intel compilers,
PGI, TAF). We have to do some of those bits by hand for now, so there is no nice integration. We are looking at whether https://enterprise.travis-ci.com/Travis.CI.Enterprise.Information.Sheet.pdf can help with integrating this properly so PR submitters can get feedback.

I took a look at the new toolsdir/maketests/ stuff, it looks OK to me and makes sense.

Note - there are a couple of things that can be evolved still I suspect, not necessarily in this iteration.

  • be great to use standard names for coordinates. This is a little subtle since we allow
    X, Y and Z to represent different coordinates depending on config. MNC does not do
    a great job of this. It may make sense to start with MNC names though, since possibly
    some tools out there recognize those?

  • the approach will do weird things with cube sphere/lat-lon cap topologies. They will
    go in one file, but the notion of two axes in the horizontal does not make exact
    sense - there are actually 3, that are used in different combinations! For now the
    parallel files will probably need to redistributed a bit in post-processing etc...

@jklymak
Copy link
Author

jklymak commented Jul 27, 2017

@rabernat OK, works now ;-) I had an error in the diagnostics if they have multiple levels. I think its fixed now, and I changed my test to check that this works. You can check out the output here:

https://www.dropbox.com/s/a0x4smpzer6s7nv/DiagOcnLAYERS.nc?dl=0

There is an error using python xarray with the time field, which gives an overflow. If I do
data = xr.open_dataset('input/DiagOcnLAYERS.nc', decode_times=False) then there is no problem. Not sure if this is a problem on my end or xarrays'. I note that the timestart and timeend variables are fine, so its just an xarray issue with the contents of time when it tries to make them into timedelta64[ns]

@rabernat
Copy link
Contributor

its just an xarray issue with the contents of time when it tries to make them into timedelta64[ns]

This limitation is because xarray uses a pandas.DatetimeIndex to represent times, which forces the time units to be nanoseconds. This is unlikely to change (see pandas-dev/pandas#7307). This is a big problem for the climate community in general (e.g. pydata/xarray#789) which people are trying to decide how to fix.

@rabernat
Copy link
Contributor

Your LAYERS netcdf file looks great! 👍

I am worried it will break with the newer LAYERS_THERMODYNAMICS output, which is located at different vertical coordinates (e.g. k_level +/- 0.5). But I don't think there is any verification test for that yet, so it is not your problem here.

@jklymak
Copy link
Author

jklymak commented Jul 27, 2017

@rabernat OK about the calendar issue. Given that its a known problem we can let the xarray people deal with it. Seems we could make the netcdf files a bit better if we did units like "seconds from year x" but would still be a problem for really long runs.

WRT LAYERS_THERMODYNAMICS, if the file is a different file than the other diagnostic file with other k_level then there is only a problem on the users side when they have two files with the same co-ordinate name. If you are saying somehow that the diagnostic is written with some variables with N levels and other variables with N+1, then that'd take some work. I'm assuming all the diagnostics have the same number of levels.


C-----------------------------------------------------------------------
C
C Constants that can be set in data.nf90io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is not going to be a data.nf90io file, this comment will need changing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, a lot of these files were obsolete. I moved into a directory .oldcode/ since I don't want to have to reproduce this stuff later. genmake2 seems smart enough to ignore code in subdirectories, so I hope this is a good way to stash old code. OTOH, if there is another way, let me know


## Options

Right now the only option is set in `data.nf90io` and is `NF90ioFileName`, the filename for the netcdf file that is to be written.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reference to data.nf90io.

saltStepping=.FALSE.,
# minimum cell fraction. This reduces steppiness..
hFacMin=0.1,
# implicSurfPress=0.5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth removing some of these unused options?

@edoddridge
Copy link
Contributor

@rabernat the other tests just haven't been implemented yet - see #13.

Modified genmake2 to check for NF90
--------------------------------------------------

Check for NF90io ability.  Added a new subdirectory in `tools/maketests` so that I could just compile a file from there rather than having a long file inside the `genmake2` script.

Cleaned up verification/testNF90io/input/gendata
--------------------------------------------------

As per @rabernat cleaned this script up to minimize python dependencies.

Fixed error with k_levels
---------------------------
Didn’t have a proper check for the existence of k_level.  This new code fixes that.

fixed testNF90io to test multi level
--------------------------------------------------

Stashed old files and updated README.rst
--------------------------------------------------

Some of the files are “old” so I put into `pkg/nf90io/.oldfiles` for posterity, but they shouldn’t link.

Expanded a bit in the README.rst (changed it to an RST)

Updated testNF90io/README.rst
--------------------------------------------------

Added NF90IO to diagnostics package

Modified README.rst

Modified genmake2 to check for NF90
--------------------------------------------------

Check for NF90io ability.  Added a new subdirectory in `tools/maketests` so that I could just compile a file from there rather than having a long file inside the `genmake2` script.

Cleaned up verification/testNF90io/input/gendata
--------------------------------------------------

As per @rabernat cleaned this script up to minimize python dependencies.

Fixed error with k_levels
---------------------------
Didn’t have a proper check for the existence of k_level.  This new code fixes that.

fixed testNF90io to test multi level
--------------------------------------------------

Stashed old files and updated README.rst
--------------------------------------------------

Some of the files are “old” so I put into `pkg/nf90io/.oldfiles` for posterity, but they shouldn’t link.

Expanded a bit in the README.rst (changed it to an RST)

Updated testNF90io/README.rst
--------------------------------------------------

Changed README.rst  to be more complete.

This should be the manual entry eventually when the new manual comes online.

Added checkit.py
@christophernhill
Copy link
Contributor

@jklymak I haven't forgotten about this. Did a little container for build test to try and get something in Travis. I noticed two things

  1. In nf90io_init_file.F would it make sense to do a non MPI alternate to

        mode_flag = IOR(mode_flag, nf90_mpiio)
        err = nf90_create(ncfilename, mode_flag, ncid, comm =
     $       MPI_COMM_WORLD,info = MPI_INFO_NULL)
    

    which would be wrapped in something like

     IF (usingMPI) THEN
      #ifdef ALLOW_USE_MPI
      ..current form...
     #endif
     ELSE
       ...non MPI Init form...
       (i.e. non IOR() for MPI and no named args - I think?)
      ENDIF
    

    I think that would allow code to be used in non-MPI mode too, which is nicer if possible.

  2. In genmake2 it looks like it would be more precise to use $F90C instead for $FC for f90tst_parallel.f90 e.g

      diff --git a/tools/genmake2 b/tools/genmake2
      index 005c19f..9b026b8 100755
      --- a/tools/genmake2
      +++ b/tools/genmake2
      @@ -1097,5 +1097,5 @@ check_nf90io_libs()  {
           echo "<<<  f90tst_parallel.f90 ===" >> f90tst_parallel.log
      -    echo "$FC $FFLAGS $FOPTIM -c f90tst_parallel.f90 \ " >> f90tst_parallel.log
      +    echo "$F90C $FFLAGS $FOPTIM -c f90tst_parallel.f90 \ " >> f90tst_parallel.log
           echo "  &&  $LINK $FFLAGS $FOPTIM -o f90tst_parallel.o $LIBS" >> f90tst_parallel.log
      -    $FC $FFLAGS $FOPTIM  $INCLUDES -c ${TOOLSDIR}/maketests/f90tst_parallel.f90 >>      f90tst_parallel.log 2>&1  \
      +    $F90C $FFLAGS $FOPTIM  $INCLUDES -c ${TOOLSDIR}/maketests/f90tst_parallel.f90 >>      f90tst_parallel.log 2>&1  \
              &&  $LINK $FFLAGS $FOPTIM -o  f90tst_parallel f90tst_parallel.o $LIBS >> f90tst_parallel.log 2>&1
      @@ -1110,5 +1110,5 @@ check_nf90io_libs()  {
              echo "==> try again with added '-lnetcdf'" > f90tst_parallel.log
      -       echo "$FC $FFLAGS $FOPTIM -c f90tst_parallel.f90 \ " >> f90tst_parallel.log
      +       echo "$F90C $FFLAGS $FOPTIM -c f90tst_parallel.f90 \ " >> f90tst_parallel.log
              echo "  &&  $LINK $FFLAGS $FOPTIM -o f90tst_parallel.o $LIBS -lnetcdf" >>      f90tst_parallel.log
      -       $FC $FFLAGS $FOPTIM  $INCLUDES -c  ${TOOLSDIR}/maketests/f90tst_parallel.f90 >>      f90tst_parallel.log 2>&1  \
      +       $F90C $FFLAGS $FOPTIM  $INCLUDES -c  ${TOOLSDIR}/maketests/f90tst_parallel.f90 >>      f90tst_parallel.log 2>&1  \
                  &&  $LINK $FFLAGS $FOPTIM -o  f90tst_parallel f90tst_parallel.o $LIBS -lnetcdf >>      f90tst_parallel.log 2>&1
      @@ -1125,5 +1125,5 @@ check_nf90io_libs()  {
           
      -           echo "$FC $FFLAGS $FOPTIM -c f90tst_parallel.f90 \ " >> f90tst_parallel.log
      +           echo "$F90C $FFLAGS $FOPTIM -c f90tst_parallel.f90 \ " >> f90tst_parallel.log
                  echo "  &&  $LINK $FFLAGS $FOPTIM -o f90tst_parallel.o $LIBS -lnetcdf" >>      f90tst_parallel.log
      -           $FC $FFLAGS $FOPTIM  $INCLUDES -c  ${TOOLSDIR}/maketests/f90tst_parallel.f90 >>      f90tst_parallel.log 2>&1  \
      +           $F90C $FFLAGS $FOPTIM  $INCLUDES -c  ${TOOLSDIR}/maketests/f90tst_parallel.f90      >> f90tst_parallel.log 2>&1  \
                      &&  $LINK $FFLAGS $FOPTIM -o f90tst_parallel f90tst_parallel.o $LIBS -lnetcdf -lnetcdff >> f90tst_parallel.log 2>&1
    

@jklymak
Copy link
Author

jklymak commented Aug 8, 2017

@christophernhill

  1. That seems good to me. I'll need to play around a bit with it to see what the non-mpi options are, and test them. I didn't even think of that use case.

  2. OK, I tried making all the nf90io files .F90, but they didn't compile, likely because I didn't have my FC90 flags set correctly. However, my setup isn't wild, and I've not seen too many build_options files that have FC90 set up at all. So... I kind of assumed MITgcm never explicitly called FC90 on a regular basis, so I'd better not do so here. Of course in the test I could do so, but again the user may not have FC90 set? Maybe I'm completely wrong about this, and I just have a bad build_option file.

@christophernhill
Copy link
Contributor

@jklymak - great. For your FC90 item, I think we may need to add

if test "x$FC90" = x ; then
    FC90=${FC}
fi

somewhere in genmake2 startup, then it will be set for future. I will check with @jm-c and @jahn tomorrow.

@christophernhill
Copy link
Contributor

@christophernhill
Copy link
Contributor

@jm-c this did not make the thread?

Hi Chris,
In many optfiles (including all that are tested except g77),
F90C is set. And it is used in few experiments.
So I don't know which one Jody is using.
Jean-Michel

On Tue, Aug 08, 2017 at 03:48:44AM +0000, christophernhill wrote:
> @jklymak - great. For your FC90 item, I think we may need to add
> ```
> if test "x$FC90" = x ; then
>     FC90=${FC}
> fi
> ```
> somewhere in genmake2 startup, then it will be set for future. I will check with @jm-c and @jahn tomorrow.
>
> --
> You are receiving this because you were mentioned.
> Reply to this email directly or view it on GitHub:
> https://github.com/altMITgcm/MITgcm66h/pull/15#issuecomment-320842520

@christophernhill
Copy link
Contributor

@jm-c after

. ${OPTFILE}

bits we can do

if test "x$FC90" = x ; then
    FC90=${FC}
fi

I think that would deal with the optfiles that are not the 56 out of ~101 that have
an explicit FC90=

@jklymak
Copy link
Author

jklymak commented Aug 8, 2017

@christophernhill @jm-c

I made the files .F fortran files, but they definitely need a F90-compatible compiler (mostly because I call use netcdf to get the NF90 include information.)

Are there many Fortran compilers out there that don't allow F90 syntax or have special F90 flags? It looks like right now only pkg/atm_phys, OAD_support and verification/flt_example/extra/cvfloat.F90 use the F90 suffix. My extremely limited compiler experience (ifort, gfortran, cray compiler) says that modern fortran compilers don't care what flavor is being used. I'm suggesting maybe the distinction need not be made any more, but I don't have any idea what your user base may be constrained by.

I'll work under the assumption that you want these to be F90 files and test that setup and let you know of any problems.

@jklymak
Copy link
Author

jklymak commented Aug 8, 2017

@christophernhill

OK number 1, above, allowing for non-MPI nf90io was a bit more work than I thought. *.nc files when opened get parallel access enabled, and varaibles need to be put into par_access mode before they are written. So I made two utility functions are now in nf90io_utils.F that wrap those functions with the appropriate usingMPI checks. All seems to work in both modes now.

genmake2 also needed a new check if MPI wasn't enabled because the existing check used MPI... I removed check_nf90io_libs() and replaced with check_nf90io_nompi_libs() and check_nf90io_mpi_libs(). I was a bit ignorant about how to check for MPI inside genmake2. I did the following, but feel free to suggest a better way:

printf "  Can we create NF90-enabled binaries...  "
if  [[ $DEFINES ==  *"-DALLOW_USE_MPI"* ]]; then
    printf "with mpi...  "
    check_nf90io_mpi_libs
else
    printf "without mpi...  "
    check_nf90io_nompi_libs
fi
if test "x$HAVE_NF90IO" != x ; then
    DEFINES="$DEFINES -DHAVE_NF90IO"
    echo "yes"
else
    echo "no"
fi

I'll work on changing over to F90 files at another juncture, but I did want you and @jm-c to weigh in on if the distinction between ye-olde fortran and f90 was really necessary any more.

@jklymak
Copy link
Author

jklymak commented Aug 8, 2017

@christophernhill @jm-c

WRT 2: genmake2 pre-processes *.F90 to *.fr9. Unfortunately, gfortran at least, will not compile *.fr9 files into *.o. It seems to only want a limited set of suffixes (.f90, for). More to the point, it assumes the files are "free-form" if you use .f90 and "fixed-form" if you use .for.

So I am super-confused about what the right thing to do here is.

I wrote my routines in "fixed-form", so cpp->*.for works great, and hence using *.F as the suffix works well. This was completely out of ignorance that there was such a thing as "free-form".

What does a *.F90 suffix mean to genmake2? I'd assumed it just meant use a different version of the compiler, but it appears to also mean the source files will be "free-form" instead of "fixed-form". Is that the intent or is that just a gfortran idiosyncracy?

I'm not even sure if my files are F90 files: they have use netcdf in them, which I understand is a more modern construct, and they now have functions in them (is that modern?)

Sorry for my ignorance here - my Fortran knowlegde is quite thin.

@christophernhill
Copy link
Contributor

@jklymak sticking with .F is fine. The F90C fix isn't something
that should require any other changes. It may be useful going forward and
in the test for nf90 piece in genmake2 on some platform/compiler combinations - once
we get there!

@jklymak
Copy link
Author

jklymak commented Aug 9, 2017

Ok, I think sticking w fixed-format is the safest to make sure the most people can compile this.

I'll open an "issue" wrt gfortran and the fr9 suffix when you have the github page running. There is very little free format code in the code base, but as setup now, gfortan may not be able to compile it (or im doing something wrong)

@jklymak
Copy link
Author

jklymak commented Aug 14, 2017

Grr, OK was trying this out on an IBM machine (haise.navo.hpc.mil) and the genmake2 test passed, but the application couldn't open netcdf files in parallel mode. Turns out HDF5 was present, but not compiled in parallel mode, so the parallel test compiled fine, but returned the same failure to open in parallel mode when I tried to run it.

The fundamental problem here for genmake2 is that in order to test if a parallel program can execute, you need to be able to run it in parallel. But often login nodes don't let you run test programs in parallel.

Some installs have hdf5/lib/libhdf5.settings available, and that will tell us if hdf5 has been compiled with mpi. But, it seems fragile to depend on that - for instance my MacOS brew-installed hdf5 doesn't have that file anywhere.

So, this remains a todo. We need to find a robust non-realtime way to test if hdf5 has been compiled with mpi support.

@jklymak
Copy link
Author

jklymak commented Aug 15, 2017

https://github.com/Novartis/hdf5r/blob/master/inst/m4/ax_lib_hdf5.m4

Check to see if h5pcc exists. If it does, then hdf5 parallel is implemented This should be done before checking netcdf. We can also used nc-config to see if netcdf is compiled with hdf5.

However, some systems seem to have fortran netcdf compiled elsewhere from c netcdf. That makes checking and compatibility harder. One hopes such systems are in decline

@jklymak
Copy link
Author

jklymak commented Dec 21, 2017

pinging about this PR. Is the other branch accepting PRs now? I think I can figure out how to move this over.

I think this still needs some genmake2 help in that its pretty difficult to reliably test for hdf5 and netcdf-parallel on different machines.

@jklymak
Copy link
Author

jklymak commented Jan 17, 2018

Close in favour of main repository

@jklymak jklymak closed this Jan 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants