parallel netcdf writes #23

jswhit2 · 2019-12-10T23:13:24Z

Parallel write capability is needed in module_write_netcdf.F90. The current version of the netcdf library does not support parallel writing of compressed files, but uncompressed parallel writes should work. Here are some steps needed to implement this:

add a flag to model_configure to indicate parallel IO is desired (for now make sure this flag is set to false if compression is enabled).
if parallel IO is enabled, open the file using nf90_create on all tasks and pass the optional mpi_comm and mpi_info arguments.
the nf90_put_var calls need to be modified to write independent slices (defined by istart,jstart,iend,jend,kstart,kend). The ESMF_Gather call should be skipped.

junwang-noaa · 2019-12-24T01:53:23Z

Jeff,

Just a quick update, I am working on this following the steps you listed here. The coding is mostly done. One thing not listed in your steps and not clear to me is about writing attribute, I assume it is OK to call nf90_put_att on all the mpi tasks?

jswhit · 2019-12-24T02:01:09Z

Yes, creating the attributes, dimensions and variables can be done on all tasks. The only thing that needs to change is the nf90_put_var (writing data to the variable).

junwang-noaa · 2019-12-24T02:04:38Z

Great! Currently I am going to test the code on dell as it has the parallel netcdf lib Cory built. Will copy the code to hera for you to check when the coding is done. Thanks.

…

On Mon, Dec 23, 2019 at 9:01 PM Jeff Whitaker ***@***.***> wrote: Yes, creating the attributes, dimensions and variables can be done on all tasks. The only thing that needs to change is the nf90_put_var (writing data to the variable). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TONEKGGDWOMAMCN2QDQ2FUONA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHSIVYI#issuecomment-568625889>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TKKUWELW3LD6DMUASDQ2FUONANCNFSM4JZGFTHQ> .

jswhit · 2019-12-24T02:11:39Z

Jun - one thing to be aware of. There are two parallel IO implementations with the same API - one that works with classic (non-hdf5) files and one that works with hdf5 files. To enable both, you have to build with --enable-parallel4 and --enable-pnetcdf. If Cory did not build with --enable-pnetcdf you will have to make this change

-    if (ideflate == 0) then
-        ncerr = nf90_create(trim(filename), &
-        cmode=IOR(IOR(NF90_CLOBBER,NF90_64BIT_OFFSET),NF90_SHARE), &
-        ncid=ncid); NC_ERR_STOP(ncerr)
-        ncerr = nf90_set_fill(ncid, NF90_NOFILL, oldMode); NC_ERR_STOP(ncerr)
-    else
-        ncerr = nf90_create(trim(filename), cmode=IOR(IOR(NF90_CLOBBER,NF90_NETCDF4),NF90_CLASSIC_MODEL), &
-        ncid=ncid); NC_ERR_STOP(ncerr)
-        ncerr = nf90_set_fill(ncid, NF90_NOFILL, oldMode); NC_ERR_STOP(ncerr)
-    endif
+   ncerr = nf90_create(trim(filename), cmode=IOR(IOR(NF90_CLOBBER,NF90_NETCDF4),NF90_CLASSIC_MODEL), &
+   ncid=ncid); NC_ERR_STOP(ncerr) ! modify if parallel IO needed
+   ncerr = nf90_set_fill(ncid, NF90_NOFILL, oldMode); NC_ERR_STOP(ncerr)

junwang-noaa · 2019-12-24T02:18:24Z

I see, thanks for letting me know. Jun

…

On Mon, Dec 23, 2019 at 9:11 PM Jeff Whitaker ***@***.***> wrote: Jun - one thing to be aware of. There are two parallel IO implementations with the same API - one that works with classic (non-hdf5) files and one that works with hdf5 files. To enable both, you have to build with --enable-parallel4 *and* --enable-pnetcdf. If Cory did not build with --enable-pnetcdf you will have to make this change - if (ideflate == 0) then- ncerr = nf90_create(trim(filename), &- cmode=IOR(IOR(NF90_CLOBBER,NF90_64BIT_OFFSET),NF90_SHARE), &- ncid=ncid); NC_ERR_STOP(ncerr)- ncerr = nf90_set_fill(ncid, NF90_NOFILL, oldMode); NC_ERR_STOP(ncerr)- else- ncerr = nf90_create(trim(filename), cmode=IOR(IOR(NF90_CLOBBER,NF90_NETCDF4),NF90_CLASSIC_MODEL), &- ncid=ncid); NC_ERR_STOP(ncerr)- ncerr = nf90_set_fill(ncid, NF90_NOFILL, oldMode); NC_ERR_STOP(ncerr)- endif+ ncerr = nf90_create(trim(filename), cmode=IOR(NF90_CLOBBER,NF90_NETCDF4), &+ ncid=ncid); NC_ERR_STOP(ncerr)+ ncerr = nf90_set_fill(ncid, NF90_NOFILL, oldMode); NC_ERR_STOP(ncerr) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TJABBJLGZ32ICMEYMDQ2FVVZA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHSJD2Q#issuecomment-568627690>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TJKZOC6WIDNYISE36DQ2FVVZANCNFSM4JZGFTHQ> .

climbfuji · 2019-12-24T03:24:23Z

For your information, the pnetcdf (parallel-netcdf) library developed at Argonne Labs writes data in netCDF cdf-2 format (called 64-bit offset in pnetcdf; is essentially the same as netCDF 3 format and can be read and written with any unidata netCDF library from 3.x upwards, but parallel reads/writes require pnetcdf support as described by Jeff for netCDF 4.x). Note that early versions of netcdf 4 (4.5.x) had some serious bugs when writing to those cdf-5 files, which in some cases corrupted the data.

For larger files (individual variables larger than 2GB), the pnetCDF format must be switched to 64-bit data format (aka cdf-5; NF90_64BIT_DATA instead of NF90_64BIT_OFFSET in the code snippet above), which is incompatible with standard netCDF installations and also requires building newer unidata netcdf4.x versions with the flags that Jeff described above. The drawback of this format is that many other downstream tools don't recognize it (you will see an error message of the kind "MPI routine called before MPI init" or so) - unless they were built using the same pnetcdf-enabled version of netCDF 4.x.

The other format using parallel hdf5 as backend for so-called netCDF4 or netCDF4-classic files. This does't require the parallel-netcdf backend and seems to be easier. BUT: the parallel reading and writing of netCDF4 files through the HDF5 format is several times slower than that through parallel-netCDF!

Long story short: for the given "small" global meshes of 13km resolution, we can use the CDF-2 parallel-netcdf version, which can be read or written with any existing software that can read netCDF3 classic files. We should not use the netCDF4-phdf5 backend, because it is very, very slow. I have some HPC reports (unpublished work) from my previous job where I was using pnetcdf, netcdf4-phdf5 and SIONlib in MPAS extreme scaling experiments if you are interested.

jswhit · 2019-12-24T04:33:30Z

We have to use the hdf5 backend to get compression. My experience is that parallel-hdf5 performance has improved quite a bit in recent versions.

climbfuji · 2019-12-24T04:41:17Z

I forgot about the compression part. Good to hear that there has been a speedup in recent versions. The real test will be when we scale up the model to 3-5km global resolution and 100k +/- MPI tasks. Out of curiosity, have you tried different compression levels (1-9 if I remember correctly) in terms of smaller file sizes versus loss in read/write performance? Thanks for implementing this parallel read/write capability, dearly needed.

…

On Dec 23, 2019, at 9:33 PM, Jeff Whitaker ***@***.***> wrote: We have to use the hdf5 backend to get compression. My experience is that parallel-hdf5 performance has improved quite a bit in recent versions. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AB5C2RMD4SFK4XYAORJZD3DQ2GGJXA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHSOPEI#issuecomment-568649617>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RJ6QW6XZ4RVAXK3PZTQ2GGJXANCNFSM4JZGFTHQ>.

jswhit · 2019-12-24T15:28:58Z

We're using a compression level of 1.

edwardhartnett · 2020-01-03T03:07:01Z

Howdy!

To clarify a few points:

The CDF5 format originally developed by Argonne for pnetcdf is now part of the Unidata netCDF library as well, and is a canonical and supported binary format of netCDF. Older versions of netCDF will not understand it, but all recent and future versions do. So it is safe to use and distribute, however offers no compression.
I have a PR open on the netcdf-c projec that will allow parallel writes with zlib, and that should help.
I am going to re-instate the szip filter in the netCDF C library as well. This will allow szip to be easily used. Since szip was supported in netCDF in read-only mode, all existing netCDF installs will be able to read these files, as long as HDF5 was installed with szip capability.
I am exploring some new filters, specifically LZ4. I should offer much better read and write performance, at a cost of slightly less good compression. The challenge is making it available to everyone, but this is something we are working out.
HDF5 parallel I/O should be reasonably fast, when settings are correct. If you build netcdf-c with the --enable-benchmarks option, you get a program nc_perf/bm_file, which will allow you to test your file read and write time with a variety of chunksize settings, so you can get a feel for how it changes performance. Let me know if you want help.

climbfuji · 2020-01-03T03:13:31Z

Howdy!

To clarify a few points:

The CDF5 format originally developed by Argonne for pnetcdf is now part of the Unidata netCDF library as well, and is a canonical and supported binary format of netCDF. Older versions of netCDF will not understand it, but all recent and future versions do. So it is safe to use and distribute, however offers no compression.

I have a PR open on the netcdf-c projec that will allow parallel writes with zlib, and that should help.

I am going to re-instate the szip filter in the netCDF C library as well. This will allow szip to be easily used. Since szip was supported in netCDF in read-only mode, all existing netCDF installs will be able to read these files, as long as HDF5 was installed with szip capability.

I am exploring some new filters, specifically LZ4. I should offer much better read and write performance, at a cost of slightly less good compression. The challenge is making it available to everyone, but this is something we are working out.

HDF5 parallel I/O should be reasonably fast, when settings are correct. If you build netcdf-c with the --enable-benchmarks option, you get a program nc_perf/bm_file, which will allow you to test your file read and write time with a variety of chunksize settings, so you can get a feel for how it changes performance. Let me know if you want help.

Thanks for the update, Ed. Did you ever get to play with the SIONlib backend and test its performance? We had a telecon with the developers in Juelich some time ago, and I didn't have any time to follow up on this, sorry.

edwardhartnett · 2020-01-03T13:50:13Z

No I have not played with SIONlib but would be interested in learning more about it. There was an idea of writing a SIONlib read/write module for netcdf...

edwardhartnett · 2020-01-03T15:54:06Z

OK, some more news: turns out that szip is already enabled in netcdf-c for writes! It's a bit of an undocumented feature.

For this to work, HDF5 must be built with szip (as well as the usual zlib).

Once netcdf-c is built on a HDF5 that supports szip, then szip compression may be turned on for a var like this:

#define HDF5_FILTER_SZIP 4
    /*
     * Set parameters for SZIP compression; check the description of
     * the H5Pset_szip function in the HDF5 Reference Manual for more
     * information.
     */
    szip_params[0] = H5_SZIP_NN_OPTION_MASK;
    szip_params[1] = H5_SZIP_MAX_PIXELS_PER_BLOCK_IN;
    stat = nc_def_var_filter(ncid, varid, HDF5_FILTER_SZIP, 2, szip_params);

Currently this will only work for sequential access. I am working on getting it working for parallel access next. ;-)

junwang-noaa · 2020-01-03T16:06:59Z

That is great news! Thanks for letting us know! Jun

…

On Fri, Jan 3, 2020 at 10:54 AM Edward Hartnett ***@***.***> wrote: OK, some more news: turns out that szip is already enabled in netcdf-c for writes! It's a bit of an undocumented feature. For this to work, HDF5 must be built with szip (as well as the usual zlib). Once netcdf-c is built on a HDF5 that supports szip, then szip compression may be turned on for a var like this: #define HDF5_FILTER_SZIP 4 /* * Set parameters for SZIP compression; check the description of * the H5Pset_szip function in the HDF5 Reference Manual for more * information. */ szip_params[0] = H5_SZIP_NN_OPTION_MASK; szip_params[1] = H5_SZIP_MAX_PIXELS_PER_BLOCK_IN; stat = nc_def_var_filter(ncid, varid, HDF5_FILTER_SZIP, 2, szip_params); Currently this will only work for sequential access. I am working on getting it working for parallel access next. ;-) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TJVLVQDM6DL7SL36NLQ35NR7A5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIBNVXI#issuecomment-570612445>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TM447FCPHTBTVRB34LQ35NR7ANCNFSM4JZGFTHQ> .

jswhit2 · 2020-01-07T22:02:56Z

@junwang-noaa - do you have a branch with the parallel-io mods in it that I can play with?

junwang-noaa · 2020-01-08T03:51:03Z

Yes. The code gets compiled on mars, but I haven't tested the parallel netcdf case yet.

[submodule "FV3"]
path = FV3
url = https://github.com/junwang-noaa/fv3atm
branch = netcdf_parallel

The branch also has the code changes for real(8) lon/lat in netcdf file and a bug fix for post.

jswhit2 · 2020-01-08T18:38:49Z

Thanks @junwang-noaa, I will give it a spin on hera.

jswhit2 · 2020-01-08T22:04:04Z

I've made a few changes to @junwang-noaa's netcdf_parallel branch and almost have it running on hera (without compression). The files are created, but the data isn't actually written. Jun - if you give me access to your fork I can push my changes there, or if you prefer I can create my own fork.

junwang-noaa · 2020-01-09T02:00:10Z

I added you as a collaborator, once you accept the invitation, you should be able to push to the branch. Please let me know if you still can't. Jun

…

On Wed, Jan 8, 2020 at 5:04 PM Jeffrey Whitaker ***@***.***> wrote: I've made a few changes to @junwang-noaa <https://github.com/junwang-noaa>'s netcdf_parallel branch and *almost* have it running on hera (without compression). The files are created, but the data isn't actually written. Jun - if you give me access to your fork I can push my changes there, or if you prefer I can create my own fork. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TNWGNVSQGKKVRCXRFTQ4ZEVLA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIOESSY#issuecomment-572279115>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TKYPSGM3AI5OG42PZTQ4ZEVLANCNFSM4JZGFTHQ> .

jswhit2 · 2020-01-09T17:53:10Z

I've updated https://github.com/junwang-noaa/fv3atm/tree/netcdf_parallel, to include bug fixes and changes from PR #18. It now runs on hera, and for uncompressed data shows some significant speedups. Next step is to build the netcdf library with Unidata/netcdf-c#1582 so we can test parallel compressed writes.

jswhit2 · 2020-01-09T17:55:53Z

BTW - the code is now using independent (not collective) parallel access. Collective access seems to be quite a bit slower in my tests - don't understand why. Using independent access required changing the time dimension from unlimited to fixed length.

edwardhartnett · 2020-01-09T17:59:17Z

Collective access is required for any filters in HDF5, including all compression filters. :-(

jswhit2 · 2020-01-09T18:08:00Z

OK - I've updated the code to turn collective access for variables that are compressed.

junwang-noaa · 2020-01-09T18:12:20Z

Ed, Would we have a netcdf beta library that we can test in our FV3 parallel? Thanks Jun

…

On Thu, Jan 9, 2020 at 12:59 PM Edward Hartnett ***@***.***> wrote: Collective access is required for any filters in HDF5, including all compression filters. :-( — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TOQXUVXW7EJYPKCO7TQ45QXNA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRGK3Q#issuecomment-572679534>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TIIMLDC6R324IHS7JLQ45QXNANCNFSM4JZGFTHQ> .

jswhit2 · 2020-01-09T18:35:36Z

Jun - it's not merged into master yet but we can check out https://github.com/NOAA-GSD/netcdf-c/tree/ejh_parallel_zlib. I'll build this on hera.

junwang-noaa · 2020-01-09T18:46:36Z

Jeff, Do you have instructions on how to build it? I can build it on wcoss. Thanks. Jun

…

On Thu, Jan 9, 2020 at 1:35 PM Jeffrey Whitaker ***@***.***> wrote: Jun - it's not merged into master yet but we can check out https://github.com/NOAA-GSD/netcdf-c/tree/ejh_parallel_zlib. I'll build this on hera. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TJ2Y3GQ2XNR23QFTJ3Q45U7TA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRJZIY#issuecomment-572693667>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TKPFV2TCHN4XBFWMWDQ45U7TANCNFSM4JZGFTHQ> .

jswhit2 · 2020-01-09T19:50:02Z

Jun: Here's what I did on hera (haven't yet tested it)

create a directory ${parlibpath}
cd ${parlibpath}
download hdf5-1.10.6.tar.gz and netcdf-fortran-4.5.2.tar.gz to that directory

To build HDF5:

tar -xvzf hdf5-1.10.6.tar.gz
cd hdf5-1.10.6
./configure --prefix=${parlibpath} --enable-hl --enable-parallel
make
make install

To build netcdf-c:

git clone https://github.com/NOAA-GSD/netcdf-c
cd netcdf-c; git checkout ejh_parallel_zlib
autoreconf -i
setenv LDFLAGS -L${parlibpath}/lib
setenv CPPFLAGS -I${parlibpath}/include
./configure --prefix=${parlibpath} --enable-netcdf-4 --enable-shared --disable-dap --enable-parallel4
make
make install

To build netcdf-fortran:

tar -xvzf netcdf-fortran-4.5.2.tar.gz
cd netcdf-fortran-4.5.2
setenv FC mpif90
setenv CC mpicc
setenv LDFLAGS -L${parlibpath}/lib
setenv CPPFLAGS -I${parlibpath}/include
./configure --prefix=${parlibpath}
make
make install

climbfuji · 2020-01-09T19:54:32Z

Just for your information, I compiled the entire parallel I/O stack on hera for some other folks using MPAS and for my own 3km-global MPAS tests. I made those available as modules: module use -a /scratch1/BMC/gmtb/software/modulefiles/intel-18.0.5.274/impi-2018.0.4 module load intel/18.0.5.274 module load impi/2018.0.4 module load pnetcdf/1.11.2-for-pio module load netcdf/4.7.0-for-pio # module load pio/2.4.4 # not needed for FV3 module list Maybe that is good enough for your testing? Thanks, Dom

…

On Jan 9, 2020, at 12:50 PM, Jeffrey Whitaker ***@***.***> wrote: Jun: Here's what I did on hera (haven't yet tested it) create a directory ${parlibpath} cd ${parlibpath} download hdf5-1.10.6.tar.gz and netcdf-fortran-4.5.2.tar.gz to that directory To build HDF5: tar -xvzf hdf5-1.10.6.tar.gz cd hdf5-1.10.6 ./configure --prefix=${parlibpath} --enable-hl --enable-parallel make make install To build netcdf-c: git clone https://github.com/NOAA-GSD/netcdf-c <https://github.com/NOAA-GSD/netcdf-c> cd netcdf-c; git checkout ejh_parallel_zlib autoreconf -i setenv LDFLAGS -L${parlibpath}/lib setenv CPPFLAGS -I${parlibpath}include ./configure --prefix=${parlibpath} --enable-netcdf-4 --enable-shared --disable-dap --enable-parallel4 make make install To build netcdf-fortran: tar -xvzf netcdf-fortran-4.5.2.tar.gz cd netcdf-fortran-4.5.2 setenv FC mpif90 setenv CC mpicc setenv LDFLAGS -L${parlibpath}/lib setenv CPPFLAGS -I${parlibpath}include ./configure --prefix=${parlibpath} make — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AB5C2RM7N62X5X6K5MRIDMDQ455WVA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRRU7Q#issuecomment-572725886>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RIGSAJQRJHH2367NTDQ455WVANCNFSM4JZGFTHQ>.

jswhit2 · 2020-01-09T19:55:57Z

Dom: We need a bleeding edge version of netcdf-c from https://github.com/NOAA-GSD/netcdf-c/tree/ejh_parallel_zlib compiled with parallel hdf support.

jswhit2 · 2020-01-09T20:16:47Z

@edwardhartnett - with your branch I'm still getting NetCDF: Invalid argument in nf90_def_var when I enable compression filters in parallel mode. Is there something that needs to be updated in the fortran interface?

jswhit2 · 2020-01-26T17:58:18Z

I ran a C768 test on hera, and I still see a benefit to parallel IO for the 2D files. Using 12 write tasks, with parallel IO for both 2d and 3d files I get

 parallel netcdf      Write Time is   36.91967 at Fcst   03:00
 parallel netcdf      Write Time is   18.39263 at Fcst   03:00
 total                Write Time is   55.56771 at Fcst   03:00

whereas turning off parallel IO for 2d files I see

 parallel netcdf      Write Time is   37.76248 at Fcst   03:00
 netcdf               Write Time is   33.08457 at Fcst   03:00
 total                Write Time is   71.07401 at Fcst   03:00

and without parallel IO for either 2d or 3d files

 netcdf            Write Time is  206.89221 at Fcst   03:00
 netcdf            Write Time is   29.70651 at Fcst   03:00
 total             Write Time is  236.85511 at Fcst   03:00

jswhit2 · 2020-01-26T18:36:09Z

Seems like the 2D writes are very sensitive to chunksize. If I set the chunksize to be the same as the size of the array on each write task, I get

 parallel netcdf      Write Time is   35.68702 at Fcst   03:03
 parallel netcdf      Write Time is    9.80075 at Fcst   03:03
 total                Write Time is   45.72166 at Fcst   03:03

The optimal chunksize may be platform dependent. I'll look at adding the chunksize as a runtime parameter in model_configure

jswhit2 · 2020-01-26T20:41:07Z

Added ichunk2d, jchunk2d parameters in model_configure (can be used to tune chunksize for best parallel IO performance). Default is size of array on each write task.

jswhit2 · 2020-01-27T02:43:09Z

Also added ichunk3d,jchunk3d,kchunk3d to set 3d variable chunksize. Default is ichunk3d=ichunk2d, jchunk3d=jchunk2d, kchunk3d=nlevs. This results in the fastest writes for me on hera:

 parallel netcdf      Write Time is   24.97020 at Fcst   03:03
 parallel netcdf      Write Time is    9.98413 at Fcst   03:03
 total                Write Time is   35.23754 at Fcst   03:03

To restore the previous behavior, set ichunk3d=imo,jchunk3d=jmo,kchunk3d=1.

Writes are slower for this setting, but reading 2d horizontal slices is much faster.

@junwang-noaa - could you run your test again on WCOSS with these new default chunksizes?

junwang-noaa · 2020-01-27T15:04:09Z

Here are results using default chunksize (decomposition size on write tasks). parallel netcdf Write Time is 43.48440 at Fcst 00:00 parallel netcdf Write Time is 42.56229 at Fcst 00:00 total Write Time is 141.08820 at Fcst 00:00 parallel netcdf Write Time is 45.74817 at Fcst 01:00 parallel netcdf Write Time is 29.93798 at Fcst 01:00 total Write Time is 128.55868 at Fcst 01:00 parallel netcdf Write Time is 37.90728 at Fcst 02:00 parallel netcdf Write Time is 42.14662 at Fcst 02:00 total Write Time is 131.89296 at Fcst 02:00 parallel netcdf Write Time is 50.03225 at Fcst 03:00 parallel netcdf Write Time is 39.72588 at Fcst 03:00 total Write Time is 141.78540 at Fcst 03:00 Using chunksize 3D:(imo,jmo,1,nlevs), 2D:(imo,jmo,1) parallel netcdf Write Time is 31.30396 at Fcst 03:00 parallel netcdf Write Time is 72.84347 at Fcst 03:00 total Write Time is 158.10736 at Fcst 03:00 Using 3D chunk size (imo,jmo,1,nlevs), sequential write 2D: parallel netcdf Write Time is 29.84425 at Fcst 03:00 netcdf Write Time is 46.99665 at Fcst 03:00 total Write Time is 131.50041 at Fcst 03:00

…

On Sun, Jan 26, 2020 at 9:43 PM Jeffrey Whitaker ***@***.***> wrote: Also added ichunk3d,jchunk3d,kchunk3d to set 3d variable chunksize. Default is ichunk3d=ichunk2d, jchunk3d=jchunk2d, kchunk3d=nlevs. This results in the fastest writes for me on hera: parallel netcdf Write Time is 24.97020 at Fcst 03:03 parallel netcdf Write Time is 9.98413 at Fcst 03:03 total Write Time is 35.23754 at Fcst 03:03 To restore the previous behavior, set ichunk3d=imo,jchunk3d=jmo,kchunk3d=1. Writes are slower for this setting, but reading 2d horizontal slices is much faster. @junwang-noaa <https://github.com/junwang-noaa> - could you run your test again on WCOSS with these new default chunksizes? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TO7GUC5M5GG7QZCOTLQ7ZC35A5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ6FQ4I#issuecomment-578574449>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TJTN2PC2URQKOF7SZLQ7ZC35ANCNFSM4JZGFTHQ> .

jswhit2 · 2020-01-27T15:22:36Z

did you mean (imo,jmo,nlevs,1) for the 3d chunk size?

junwang-noaa · 2020-01-27T15:55:25Z

Sorry,I am using an old version, I think it is (im,jm,1,1). It looks to me writing 3D fields with default chunksize is slower on dell compared to previous version. @@ -176,7 +173,7 @@ module module_write_netcdf_parallel ncerr = nf90_def_var(ncid, trim(fldName), NF90_FLOAT, & (/im_dimid,jm_dimid,pfull_dimid,time_dimid/), varids(i), & shuffle=.false.,deflate_level=ideflate,& - chunksizes=(/im,jm,1,1/),cache_size=40*im*jm); NC_ERR_STOP(ncerr) + chunksizes=(/ichunk3d,jchunk3d,kchunk3d,1/)); NC_ERR_STOP(ncerr) ! compression filters require collective access.

…

On Mon, Jan 27, 2020 at 10:22 AM Jeffrey Whitaker ***@***.***> wrote: did you mean (imo,jmo,nlevs,1) for the 3d chunk size? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TLDIHT3N7U4KEU24JTQ73333A5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ74QNQ#issuecomment-578799670>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TPH4BNU7R2X4AAQ7MLQ73333ANCNFSM4JZGFTHQ> .

jswhit2 · 2020-01-27T16:11:03Z

Hmm. Wonder why I'm getting ~4x faster writes on hera with the new default chunksizes and 12 write tasks (total write time of 35 secs vs 140 secs on WCOSS).

junwang-noaa · 2020-01-27T16:29:05Z

Jeff, The 140s on dell includes 50s inline post time. So the total time is about 90s for writing netcdf files.

…

On Mon, Jan 27, 2020 at 11:11 AM Jeffrey Whitaker ***@***.***> wrote: Hmm. Wonder why I'm getting ~4x faster writes on hera with the new default chunksizes and 12 write tasks (total write time of 35 secs vs 140 secs on WCOSS). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23?email_source=notifications&email_token=AI7D6TMQRHG74DIIXHWV3V3Q74BRRA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKACARQ#issuecomment-578822214>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TNFC5LAQJGKMAEAN33Q74BRRANCNFSM4JZGFTHQ> .

junwang-noaa · 2020-01-27T17:33:11Z

Here are results on dell using default chunk size for 2D file, and (im,jm,1,1) for 3D files, basically writing netcdf files takes 60-70s. In this run, I am using 2 write groups instead of 7 groups used in the initial test from fv3_parallel (reducing 300 tasks). parallel netcdf Write Time is 32.15252 at Fcst 00:00 parallel netcdf Write Time is 45.53429 at Fcst 00:00 total Write Time is 132.93790 at Fcst 00:00 parallel netcdf Write Time is 31.60762 at Fcst 01:00 parallel netcdf Write Time is 43.13410 at Fcst 01:00 total Write Time is 127.48863 at Fcst 01:00 parallel netcdf Write Time is 30.07048 at Fcst 02:00 parallel netcdf Write Time is 33.14977 at Fcst 02:00 total Write Time is 114.95780 at Fcst 02:00 parallel netcdf Write Time is 32.54118 at Fcst 03:00 parallel netcdf Write Time is 37.68903 at Fcst 03:00 total Write Time is 122.18182 at Fcst 03:00 parallel netcdf Write Time is 30.96448 at Fcst 04:00 parallel netcdf Write Time is 31.85908 at Fcst 04:00 total Write Time is 114.86909 at Fcst 04:00 parallel netcdf Write Time is 32.27664 at Fcst 05:00 parallel netcdf Write Time is 25.77598 at Fcst 05:00 total Write Time is 110.33424 at Fcst 05:00 parallel netcdf Write Time is 27.19671 at Fcst 06:00 parallel netcdf Write Time is 20.22734 at Fcst 06:00 total Write Time is 99.69328 at Fcst 06:00 On Mon, Jan 27, 2020 at 11:28 AM Jun Wang - NOAA Federal <jun.wang@noaa.gov> wrote:

…

Jeff, The 140s on dell includes 50s inline post time. So the total time is about 90s for writing netcdf files. On Mon, Jan 27, 2020 at 11:11 AM Jeffrey Whitaker < ***@***.***> wrote: > Hmm. Wonder why I'm getting ~4x faster writes on hera with the new > default chunksizes and 12 write tasks (total write time of 35 secs vs 140 > secs on WCOSS). > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#23?email_source=notifications&email_token=AI7D6TMQRHG74DIIXHWV3V3Q74BRRA5CNFSM4JZGFTH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKACARQ#issuecomment-578822214>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AI7D6TNFC5LAQJGKMAEAN33Q74BRRANCNFSM4JZGFTHQ> > . >

jswhit · 2020-01-27T18:39:03Z

To recap, netcdf chunksizes (for parallel and serial compressed IO) can be set at runtime by specifying parameters ichunk2d,jchunk2d,ichunk3d,jchunk3d,kchunk3d in model_configure. The default values (if these parameters are not given) are the MPI decomposition size. If the parameters are set to a negative value, then the netcdf C library will choose the chunksize. If compression is turned off, the netcdf C library chooses the chunksize (the chunking parameters are not used).

My tests on hera show that the default values produced the fastest parallel IO throughput. However, this might not be true on other platforms (and Jun's tests suggest it may not be true on WCOSS)

Note that for parallel IO, the chunksize that produces the fastest write speed may not be optimal for read speed (depending on the access pattern). For example, my tests indicate that setting the chunksize for 3d variables to imo,jmo,1,1 greatly speeds up reads for 2d slices (a very common access pattern), but slows down the writes by 25% or so on hera.

In general, the write speeds for parallel IO with compression appear to be quite sensitive to chunksize (much less so for serial IO).

The nccopy utility can be used to change the chunking in an existing netcdf file.

jswhit · 2020-01-28T18:18:23Z

To compile on hera, I made the following change to modulefiles/hera.intel/fv3:

[ufs-weather-model-parnc]$ git diff modulefiles/hera.intel/fv3
diff --git a/modulefiles/hera.intel/fv3 b/modulefiles/hera.intel/fv3
index a369558..362dea3 100644
--- a/modulefiles/hera.intel/fv3
+++ b/modulefiles/hera.intel/fv3
@@ -50,6 +50,8 @@ module load post/8.0.1
 ##
 module use -a /scratch1/NCEPDEV/nems/emc.nemspara/soft/modulefiles
 module load esmf/8.0.0
+module use -a /scratch2/BMC/gsienkf/whitaker/modulefiles/intel
+module load netcdf-parallel/4.7.3

 ##
 ## load cmake

This, and corresponding changes to the wcoss modulefiles, plus the addition of PARALLEL_NETCDF to the cmake files, need to be included in a companion pull request to https://github.com/ufs-community/ufs-weather-model.

DusanJovic-NOAA · 2020-01-29T14:00:03Z

The netcdf-c branch named "ejh_parallel_zlib" described in this comment:

#23 (comment)

does not exist anymore. Which version of netcdf-c should we use?

edwardhartnett · 2020-01-29T14:05:11Z

Use the netcdf-c master branch. All my changes have been merged to master.

DusanJovic-NOAA · 2020-01-29T14:09:18Z

https://github.com/NOAA-GSD/netcdf-c
or
https://github.com/Unidata/netcdf-c

I see NOAA-GSD:master is 6 commits behind Unidata:master

edwardhartnett · 2020-01-29T14:13:55Z

Use the Unidata master. THe NOAA-GSD one is my fork that I use for working on the Unidata one.

junwang-noaa · 2020-01-30T03:38:07Z

Ed,

just to confirm, I am building netcdf-c from unidata on hera, the revision is 2a34eb2a.

.../nems/emc.nemspara/soft/netcdf_parallel/netcdf-c> git clone https://github.com/Unidata/netcdf-c
.../nems/emc.nemspara/soft/netcdf_parallel/netcdf-c> git branch

master
[ =0 03:35:41 10034 emc.nemspara@hfe05 ]
.../nems/emc.nemspara/soft/netcdf_parallel/netcdf-c> git log |more

commit 2a34eb2ac5996dc23339bdb72918eb5503393d77
Merge: 2e7234ff 0f5bdafe
Author: Ward Fisher WardF@users.noreply.github.com
Date: Mon Jan 27 17:44:48 2020 -0700

Merge pull request #1603 from NOAA-GSD/ejh_release_notes

updated RELEASE_NOTES to include results of recent PR merges

junwang-noaa · 2020-01-31T01:49:17Z

I have problem to build the hdf5/1.10.6 on cray, I have

module load PrgEnv-intel
module rm intel
module rm NetCDF-intel-sandybridge
module load intel/16.3.210

export FC=ftn
export CC=cc
export CXX=CC
export LD=ftn
export LDFLAGS=-L${parlibpath}/lib
export CPPFLAGS=-I${parlibpath}/include
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/gpfs/hps3/emc/nems/noscrub/emc.nemspara/soft/netcdf_parallel/lib

but I got error:
configure:4721: ./conftest

Please verify that both the operating system and the processor support Intel(R) MOVBE, FMA, BMI, LZCNT and AVX2 instructions.

configure:4725: $? = 1
configure:4732: error: in `/gpfs/hps3/emc/nems/noscrub/emc.nemspara/soft/netcdf_parallel/hdf5-1.10.6':
configure:4734: error: cannot run C compiled programs.

Does anybody have any idea?

edwardhartnett · 2020-01-31T02:10:22Z

@junwang-noaa you do have the correct netcdf.

However, I don't know what the HDF5 build problem is. Have you asked HDF5 support?

jswhit · 2020-01-31T17:25:09Z

@junwang-noaa - have a look at config.log file in the build directory after configure fails to find more information as to the cause of failure. Search for "cannot" and examine the lines preceding it.

edwardhartnett · 2020-01-31T19:56:34Z

@jswhit BTW I'm still working on szip and netcdf-fortran. There are some complexities but I hope to have a working branch for you soon...

junwang-noaa · 2020-02-02T03:31:39Z

A detailed configuration testing on wcoss dell is at:

https://docs.google.com/document/d/18vqajgOv3flbS35eNPMnYpFyNiprkJpHkVZpP--dx5o/edit

junwang-noaa · 2020-02-02T03:35:12Z

@jswhit2 The error message I listed above is from the config.log. Not sure why the executable compiled from c program can't run on cray.

junwang-noaa · 2020-02-03T01:56:25Z

@edwardhartnett Can you create a tag from the master?

edwardhartnett · 2020-02-03T11:25:35Z

@jswhit2 there is now a branch ejh_szip on netcdf-fortran which has the szip code in the fortran APIs.

@junwang-noaa I cannot create a tag. I am not part of Unidata any longer. ;-) However, I believe they will be doing a release in the next few weeks. (No guarantee though.)

junwang-noaa · 2020-02-04T20:10:14Z

Code is committed. I will open a new issue for installing the hdf/netcdf lib on cray.

edwardhartnett · 2020-02-11T19:06:35Z

@jswhit2 the szip changes have been merged into master branch on Unidata's netcdf-fortran project. So just grab that, and build it against the netcdf-c master you have already built.

Then you can call nf90_def_var_szip() on a variable. Make sure you also turn off deflate. You can't use both deflate and szip.

edwardhartnett mentioned this issue Jan 28, 2020

enabling szlib compression #51

Closed

yangfanglin mentioned this issue Feb 2, 2020

update feature/gfsv16b to use parallel netcdf for I/O NOAA-EMC/global-workflow#16

Closed

junwang-noaa closed this as completed Feb 4, 2020

parallel netcdf writes #23

parallel netcdf writes #23

Comments

jswhit2 commented Dec 10, 2019

junwang-noaa commented Dec 24, 2019

jswhit commented Dec 24, 2019

junwang-noaa commented Dec 24, 2019 via email

jswhit commented Dec 24, 2019 • edited Loading

junwang-noaa commented Dec 24, 2019 via email

climbfuji commented Dec 24, 2019

jswhit commented Dec 24, 2019

climbfuji commented Dec 24, 2019 via email

jswhit commented Dec 24, 2019

edwardhartnett commented Jan 3, 2020

climbfuji commented Jan 3, 2020

edwardhartnett commented Jan 3, 2020

edwardhartnett commented Jan 3, 2020

junwang-noaa commented Jan 3, 2020 via email

jswhit2 commented Jan 7, 2020

junwang-noaa commented Jan 8, 2020

jswhit2 commented Jan 8, 2020

jswhit2 commented Jan 8, 2020

junwang-noaa commented Jan 9, 2020 via email

jswhit2 commented Jan 9, 2020

jswhit2 commented Jan 9, 2020

edwardhartnett commented Jan 9, 2020

jswhit2 commented Jan 9, 2020

junwang-noaa commented Jan 9, 2020 via email

jswhit2 commented Jan 9, 2020

junwang-noaa commented Jan 9, 2020 via email

jswhit2 commented Jan 9, 2020 • edited Loading

climbfuji commented Jan 9, 2020 via email

jswhit2 commented Jan 9, 2020

jswhit2 commented Jan 9, 2020

jswhit2 commented Jan 26, 2020 • edited Loading

jswhit2 commented Jan 26, 2020 • edited Loading

jswhit2 commented Jan 26, 2020

jswhit2 commented Jan 27, 2020

junwang-noaa commented Jan 27, 2020 via email

jswhit2 commented Jan 27, 2020

junwang-noaa commented Jan 27, 2020 via email

jswhit2 commented Jan 27, 2020

junwang-noaa commented Jan 27, 2020 via email

junwang-noaa commented Jan 27, 2020 via email

jswhit commented Jan 27, 2020 • edited Loading

jswhit commented Jan 28, 2020 • edited Loading

DusanJovic-NOAA commented Jan 29, 2020

edwardhartnett commented Jan 29, 2020

DusanJovic-NOAA commented Jan 29, 2020

edwardhartnett commented Jan 29, 2020

junwang-noaa commented Jan 30, 2020

junwang-noaa commented Jan 31, 2020

edwardhartnett commented Jan 31, 2020

jswhit commented Jan 31, 2020

edwardhartnett commented Jan 31, 2020

junwang-noaa commented Feb 2, 2020

junwang-noaa commented Feb 2, 2020

junwang-noaa commented Feb 3, 2020

edwardhartnett commented Feb 3, 2020

junwang-noaa commented Feb 4, 2020

edwardhartnett commented Feb 11, 2020

jswhit commented Dec 24, 2019 •

edited

Loading

jswhit2 commented Jan 9, 2020 •

edited

Loading

jswhit2 commented Jan 26, 2020 •

edited

Loading

jswhit2 commented Jan 26, 2020 •

edited

Loading

jswhit commented Jan 27, 2020 •

edited

Loading

jswhit commented Jan 28, 2020 •

edited

Loading