-
Notifications
You must be signed in to change notification settings - Fork 0
A modern interface to the NetCDF library #3
Comments
Since I first wrote this there's now the rhdf5 package on Bioconductor which provides support for both groups and compound types (which neither of ncdf4 or RNetCDF do): http://bioconductor.org/packages/devel/bioc/html/rhdf5.html I will revisit this topic with that new package. |
Michael, I have a wishlist more than half similar to yours. Over the past few years I needed to write R packages with domain specific abstractions on top of netCDF. Usually some spatial abstractions are available out there but others (notably for time series) not as readily. I am currently working in a domain with ensemble forecast time series (time series of ensembles of time series). I've built a package (not yet open source) with abstractions, on top of ncdf4, using 'xts' for R series handling. I think your ncload package has some overlap. Independent from this former R package, I have a C++ library with time series abstractions, with netCDF I/O. The rationale is to have it accessible with consistent user experience from R, Python, Matlab and so on. I believe there is a business case for most of what I outlined to be open source, and this is a process I am initiating. I'd welcome a collaboration with you and other interested parties to support this case. Background code I can point to for information: |
Several of the ideas here overlap with those written in the stars proposal. |
An update to my original post as quite a bit has occurred that is relevant in this.
http://bioconductor.org/packages/devel/bioc/vignettes/Rhdf5lib/inst/doc/Rhdf5lib.html#motivation A foundation like that for classic NetCDF 3 is what I'd like to see. It's likely that Since starting this, I've rewritten an approach for extracting file metadata here: https://github.com/hypertidy/ncmeta - using both Item 1) and 2) above can be done at the R level and I'm interested to see if that can work (but would still benefit from a modernized core wrapper). Item 3) is now covered by Item 4) is still pending, it's pretty much not possible on Windows without compiling yourself or getting the Unidata binaries in |
A new package for the NetCDF library would be helpful, using modern techniques and Rcpp ideally.
NetCDF files store variables (arrays), built on dimensions(axes used by the variables, with metadata), and attributes (variable metadata, and global metadata).
http://www.unidata.ucar.edu/software/netcdf/
Existing R packages on CRAN:
https://cloud.r-project.org/web/packages/ncdf4/index.html
https://cloud.r-project.org/web/packages/RNetCDF/index.html
There is an enormous volume of data available in NetCDF, it's the predominant format for global climate studies, ocean modelling, and many remote sensing streams. It's use in R is relatively limited (IMO) restricted to domain experts already familiar with the API model, or to users of the higher level wrappers. Much of the data is available via Thredds (or OpenDAP) servers, and this is easily leveraged using ncdf4 or rgdal R packages (though the option is not turned on for support in the Windows binaries on CRAN).
Typically, you create a connection to a file, and use that connection to read in a variable (or a slice from one) after interrogating the conection for the variable's dimensions and attributes. Sometimes these are mapped grids with a simple 1D axis variable for each dimension - affine, or rectilinear referencing, others have "coordinate arrays" where the positioning is stored explicitly - curvilinear referencing.
Currently there is good support in R for NetCDF via the ncdf4 and RNetCDF packages, and indirectly via the raster package (leveraging ncdf4) and rgdal (leveraging GDAL). It's impressive how much abstraction raster and GDAL provide, but it covers only a relatively small range of the possible file configurations. This level of abstraction is rare for NetCDF use from what I've seen though, another example is Ferret: http://ferret.pmel.noaa.gov/Ferret
Apart from the domain-specific higher level functions in rgdal and raster for dealing with 2 or 3D grids with affine georeferencing, there is little abstraction over the standard API use.
A modern i/o package for the format would allow domain-specific packages to be more easily written for specific sources. This is possible now, but it's limited and quite challenging for many users. It would be awesome (and readily achievable with a new wrapper I believe)
to be able to have a virtual R array that just paged for data from a NetCDF file collection as it was needed
to write a DBI front-end on top of a new modern wrapper to NetCDF.
to have support for composite types, which are not provided by any R package at the moment
to provide consistent support for the OpenDAP/Thredds sources in different OS.
The text was updated successfully, but these errors were encountered: