Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more tests, more internal documentation, minor bug fixes #711

Merged
merged 44 commits into from
Dec 19, 2017

Conversation

edhartnett
Copy link
Contributor

@edhartnett edhartnett commented Dec 3, 2017

More tests for nc4var.c code. Also some minor fixes and a bunch of (internal) documentation.

This gets test coverage of nc4var.c from 70% to ~85%, but there is more to go.

Added or fixed doxygen documentation for all functions in libsrc4.

Also added documentation for a few missing public functions.

Part of #702.
Fixes #704.
Fixes #709.
Fixes #707.
Fixes #706.
Fixes #714.
Fixes #713.
Fixes #716.

From PR #700:

Fixes #699.
Fixes #698.
Fixes #697.
Fixes #696.
Fixes #694.
Fixes #662.

From PR #658:

Fixes #392.

@WardF
Copy link
Member

WardF commented Dec 6, 2017

I merged in some changes I made from master and turned on parallel tests for CI. Let me explore that and figure out what's going on, I don't believe this is a 'true' failure.

@edhartnett
Copy link
Contributor Author

I see this branch is passing travis again. Hopefully that means it can now be merged. ;-)

@WardF
Copy link
Member

WardF commented Dec 8, 2017

Compiler errors in Windows but I don't expect them to be difficult to handle. Will follow up once fixed.

@WardF
Copy link
Member

WardF commented Dec 9, 2017

Got caught up in another project and then having to update Visual Studio (which has taken quite a while so far), still working on this.

@WardF
Copy link
Member

WardF commented Dec 11, 2017

At AGU but pursing this now

@WardF
Copy link
Member

WardF commented Dec 11, 2017

tst_nccopy4 is failing. I will examine why and follow up.

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 12, 2017 via email

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 12, 2017 via email

@WardF
Copy link
Member

WardF commented Dec 12, 2017

When you say built in, do you mean hooks to link against, like pnetcdf? Looking forward to seeing it! Take your time, the WiFi at the convention center is terrible.

@WardF
Copy link
Member

WardF commented Dec 12, 2017

Oh, it is failing on Windows and I went "too clever" trying to fix it last night. Still working on it this morning.

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 12, 2017 via email

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 12, 2017 via email

@DennisHeimbigner
Copy link
Collaborator

Have you had to make significant changes outside of the libpio and pio_test directories?
As per our previous discussion about per-variable pio information, we may be able
to simplify your changes by simple extensions to the netcdf API.

@edhartnett
Copy link
Contributor Author

Some changes in libdispatch/dfile.c in order to invoke things properly for opens/creates. I am isolating most of the changes in functions that will only be built and called if the library is built with PIO.

Otherwise no changes outside the libpio and pio_test directories. Basically the libpio code sits on top of the libsrc/libsrc4 code and calls it as needed.

I added several new functions, which have no analogs elsewhere in the library. This is the complete list:

   /* Init decomposition with 0-based compmap array. */
   extern int
   nc_init_decomp(int iosysid, int pio_type, int ndims, const int *gdimlen, int maplen,
                  const PIO_Offset *compmap, int *ioidp, int rearranger,
                  const PIO_Offset *iostart, const PIO_Offset *iocount);

   /* Read a decomposition file. */
   int nc_read_decomp(int iosysid, const char *filename, int *ioidp, MPI_Comm comm,
                      int pio_type, char *title, char *history, int *fortran_order);

   /* Write a decomposition file. */
   int nc_write_decomp(int iosysid, const char *filename, int cmode, int ioid,
                       char *title, char *history, int fortran_order);

   extern int
   nc_init_intracomm(MPI_Comm comp_comm, int num_iotasks, int stride, int base,
                     int rearr, int *iosysidp);

    /* Creates an MPI intracommunicator between a set of IO tasks and
    * one or more sets of computational tasks. */
  extern int
   nc_init_async(MPI_Comm world, int num_io_procs, int *io_proc_list,
                 int component_count, int *num_procs_per_comp, int **proc_list,
                 MPI_Comm *user_io_comm, MPI_Comm *user_comp_comm, int rearranger,
                 int *iosysidp);
   
   extern int
   nc_set_iosystem_error_handling(int iosysid, int method, int *old_method);
   
   extern int
   nc_advanceframe(int ncid, int varid);

   extern int
   nc_setframe(int ncid, int varid, int frame);

   extern int
   nc_write_darray(int ncid, int varid, int ioid, PIO_Offset arraylen, void *array,
                   void *fillvalue);

   extern int
   nc_read_darray(int ncid, int varid, int ioid, PIO_Offset arraylen, void *array);

   /* Free a decomposition map. */
   int nc_free_decomp(int iosysid, int ioid);

   /* Finalize an IO system for PIO. */
   int nc_free_iosystem(int iosysid);

I may combine the setframe function into the read/write_darray functions. So there will be some minor churn...

@DennisHeimbigner
Copy link
Collaborator

That is a lot of changes to the netcdf-c API, which makes me unhappy :-(
The IMO less disruptive alternative is to create data structures that hold, in effect,
the arguments to those functions. Then pass in that data structure
using the existing API and have the necessary code be invoked
inside libpio. This is somewhat similar to the way in-memory and mpi
info is passed.

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 12, 2017

This is a maximum. A bunch of functions will go away. I need to get all tests working first. ;-)

PIO introduces two new items to the netCDF data model, IO Systems and decompositions.

IO systems describe the HPC environment, how many cores, how many to do I/O, and which ones do which. They allow the program to adjust at start-up to the number of available cores and how the user wants to divide the computation/IO workload.

Decompositions map netCDF file data space across the available computational cores. For example if a var is 10x10x10 and there are 100 cores, then you could put 10 value per core. The decomposition specifies this.

With IO systems I can say that if my program is running on 10K cores, that 1000 of them should be for I/O, and 5K for an atmospheric model and 4K for an ocean model.

With decompositions I say how the 3D and 2D data arrays (each get their own decomposition) are scattered across the 5K atmospheric cores.

Now each of the 5K cores can call nc_write_darray() with their local array, and the 5000 sub-arrays will be automatically transfered to the 1000 I/O cores. Those I/O cores will (later) write that data to disk in the netCDF file. The decompostion(s) and IO system allow the PIO code to do all the donkey-work to make it happen.

So the decomposition and IO system exist independently of any netCDF file. Both might easily be used to create and/or read many netCDF files. In fact, that would be typical. The user will set up the IOSystem and decompositions once for the supercomputer, then all subsequent netCDF file access, reading or writing, will use those.

It's neat, so it's worth a few extra functions. However, after I take my next pass, if you can suggest any further reductions, that would be good.

@DennisHeimbigner
Copy link
Collaborator

If by extending the data model, you mean
defining structures in a include/netcdf_pio.h file, then fine.
Also, frankly, I doubt that you need all those extra functions as such. Why do you
reject my data-structure-oriented approach? As I said, I am unhappy with this.

@DennisHeimbigner
Copy link
Collaborator

Never mind. I will wait until you have a functioning system and then Ward and I
will decide what changes need to be made.

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 12, 2017 via email

@DennisHeimbigner
Copy link
Collaborator

The IO system must be set up before the first nc_create()/nc_open(),
We handle this like we do e.g. hdf5 initialization. Namely, nc_create and nc_open
examine internal global variables to see if the necessary pio initialization has been
performed, and if not, do it.

The code that splits up the work between different sets of cores is, by necessity, user code.
I am certainly not opposed to user provided callback functions, but again,, that can be passed
as data to the existing netcdf-c API functions.

@DennisHeimbigner
Copy link
Collaborator

In any case, as I said. The important thing is to get a working version in a branch
on our github repo. Then we can consider tweaking the API.

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 12, 2017 via email

@DennisHeimbigner
Copy link
Collaborator

I don't follow. You can pass all the info you want about the other cores as the parameter
argument to nc_create.

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 13, 2017 via email

@edhartnett
Copy link
Contributor Author

edhartnett commented Dec 13, 2017 via email

@wkliao
Copy link
Contributor

wkliao commented Dec 13, 2017

Hi, @edhartnett
In one of your earlier posts, you mentioned

And the usefulness of the features is already well proven by the CESM model.

This sounds interesting. Could you please provide references, publications to this work? Thanks

@edhartnett
Copy link
Contributor Author

@wkliao I don't have any references or publications. I'm a programmer dude, not a scientist. :-)

However, take a look at CESM here: http://www.cesm.ucar.edu/. Probably there are lots of papers about it, but probably none of them mention the "minor" implementation detail of how they do IO. ;-)

Anyway, further discussion of PIO merges should take place on it's newly-created PR #720.

Meanwhile, this PR contains minor cleanup and a bunch of documentation. Hopefully we can get it merged this week.

@WardF WardF merged commit 77cca75 into Unidata:master Dec 19, 2017
@edhartnett edhartnett deleted the ejh_more_tests branch December 19, 2017 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment