Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netcdf string variables unable to be read #78

Open
durack1 opened this issue Jan 19, 2017 · 16 comments
Open

netcdf string variables unable to be read #78

durack1 opened this issue Jan 19, 2017 · 16 comments
Labels
Milestone

Comments

@durack1
Copy link
Member

durack1 commented Jan 19, 2017

@dnadeau4 FYI, there's another issue that is occurring with the rewritten data:
stringVariableNoRead.zip

ncdump -h ../CMIP6/input4MIPs/UColorado/radiation/RFMIP/fx/atmos/UColorado-RFMIP-0-4/multiple/none/v20170118/multiple_input4MIPs_radiation_RFMIP_UColorado-RFMIP-0-4_none.nc | grep expt_label
        string expt_label(expt) ;
                expt_label:long_name = "experiment description" ;

The string variables are unable to be read:

Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.

infile = '/work/duro/Shared/160427_CMIP6_Forcing/CMIP6/input4MIPs/UColorado/radiation/RFMIP/fx/atmos/UColorado-RFMIP-0-4/multiple/none/v20170118/multiple_input4MIPs_radiation_RFMIP_UColorado-RFMIP-0-4_none.nc'

import cdms2 as cdm

f = cdm.open(infile)

f.variables
Out[4]: 
{'c2f6_GM': <cdms2.fvariable.FileVariable at 0x7f599d9c02d0>,
...
 'expt_label': <cdms2.fvariable.FileVariable at 0x7f599de3ca90>,
...
 'water_vapor': <cdms2.fvariable.FileVariable at 0x7f599d9c0790>}

new = f('expt_label')
Traceback (most recent call last):
  File "<ipython-input-5-dc67af7a8c6f>", line 1, in <module>
    new = f('expt_label')
  File "/export/duro/anaconda2/envs/uvcdatNightly/lib/python2.7/site-packages/cdms2/cudsinterface.py", line 33, in __call__
    return v(*args, **kwargs)
  File "/export/duro/anaconda2/envs/uvcdatNightly/lib/python2.7/site-packages/cdms2/avariable.py", line 159, in __call__
    grid=grid)
  File "/export/duro/anaconda2/envs/uvcdatNightly/lib/python2.7/site-packages/cdms2/selectors.py", line 195, in unmodified_select
    raw=raw)
  File "/export/duro/anaconda2/envs/uvcdatNightly/lib/python2.7/site-packages/cdms2/avariable.py", line 776, in subRegion
    return self.subSlice(*slicelist, **d)
  File "/export/duro/anaconda2/envs/uvcdatNightly/lib/python2.7/site-packages/cdms2/avariable.py", line 566, in subSlice
    d = self.expertSlice (slicelist)
  File "/export/duro/anaconda2/envs/uvcdatNightly/lib/python2.7/site-packages/cdms2/fvariable.py", line 86, in expertSlice
    result = apply(self._obj_.getitem,slist)
ValueError: data type must provide an itemsize

import cdat_info
cdat_info.version()
Out[8]: ['2', '6', '42', 'g910814b']
@durack1
Copy link
Member Author

durack1 commented Jan 19, 2017

@doutriaux1 @sashakames @dnadeau4 FYI as the publisher starts receiving more and more data to publish on the new projects, obs4MIPs, input4MIPs etc the requirement to support ALL netcdf4 types is going to be a major requirement.. I really do think #63 and #65 along with their dupes CDAT/cdat#537, CDAT/cdat#481 should be way up on top of the priority list of cdms work

@dnadeau4
Copy link
Contributor

i did it for attributes, but Sasha now has a file with nc_string variable...

@durack1
Copy link
Member Author

durack1 commented Jan 21, 2017

@doutriaux1 this issue is preventing the ESGF publisher from working with one of the contributed input4MIPs files. @sashakames was planning to "hack" the publisher for this file, but I really think addressing the underlying problem should be a very high priority. As noted in CDAT/cdat#537 (#63) and CDAT/cdat#481 (#65) cdms hasn't kept up to pace with all the netcdf4 data types, and this means that valid netcdf4 data written using more modern packages currently cannot be read using cdms, not an ideal situation at all

@dnadeau4
Copy link
Contributor

dnadeau4 commented Jan 24, 2017

@durack1 Why are your input4MIPs files so "messy"? the NC_STRING is not the "classic format" for netCDF4 which means that it is not compatible with the netCDF3 format.

@durack1
Copy link
Member Author

durack1 commented Jan 24, 2017

@dnadeau4 many of the input4MIPs datasets are.. this file (string variable) was written using netcdf4 for python. For many files I've rewritten them, this cdms2 bug prevented me from doing so in this case

@durack1
Copy link
Member Author

durack1 commented Jan 24, 2017

@dnadeau4 while I do think maintaining backward compatibility would be a nice feature, in my opinion netcdf3 is "dead" as the last bug-fix release (3.6.3) was in 2008 (almost 10 years ago!), and netcdf4.0 was released in 2009. This means that full netcdf4 support is the priority, and only if it comes for free (and doesn't bog down progress in full netcdf4 support) should the netcdf3 support be maintained - if it gets in the way, then surge ahead to netcdf4.

@doutriaux1 do you want to chime in here?

@doutriaux1
Copy link
Contributor

@dnadeau4 is the incompatibility preventing us to read via cdunif? One thing we should look into is to remove cdunif all together when it comes to netcdf and use pynetcdf4 that would save us a lot of troubles, the cdunif espcially for writing is causing us grief. I still sudpect it's the reason my mpi implentation failed to many mode switches.

@durack1
Copy link
Member Author

durack1 commented Jan 24, 2017

@doutriaux1 @dnadeau4 https://github.com/Unidata/netcdf4-python is the Unidata supported interface, so if we're using their C libraries may as well also use their python libraries

@durack1
Copy link
Member Author

durack1 commented Jan 24, 2017

@doutriaux1 @dnadeau4 do either of you have an idea what the difference is between https://anaconda.org/anaconda/netcdf4/ (latest version 1.2.4) and https://github.com/Unidata/netcdf4-python (latest version 1.2.7rel)?

@doutriaux1
Copy link
Contributor

myguess is that the official anaconda distribution didn't bother to update their version to the very latest from unidata
part of the reason we are using conda forge channel:

     conda-forge/netcdf4       |    1.2.7 | conda           | linux-64, win-32, win-64, osx-64

@dnadeau4
Copy link
Contributor

@durack1 there is now way to actually use MPI with netcdf4-python. This is a strength of CDMS that we can use mpi4py and enable MPI I/O. Something that we cannot find in other python package.

@durack1
Copy link
Member Author

durack1 commented Jan 30, 2017

@dnadeau4 ok, dangling the carrot of MPI I/O to be included in cdms is a good enough reason to not go down the netcdf4 path for me..

@doutriaux1
Copy link
Contributor

@dnadeau4 why not mpi with pynetcdf4? As long as your netcdf is compiled against MPI it shouldn't matter.

@dnadeau4
Copy link
Contributor

dnadeau4 commented Feb 7, 2017

@doutriaux1 pynetcdf4 does not exist and I am sure it does not support MPI I/O.

@doutriaux1 doutriaux1 modified the milestone: 2.10 May 5, 2017
@doutriaux1 doutriaux1 added the bug label May 8, 2017
@durack1
Copy link
Member Author

durack1 commented Jan 10, 2018

@dnadeau4 this issue doesn't appear to be fixed, so reopening. Did you need me to upload the file here again? It's 1.8Mb

@sashakames

@durack1 durack1 reopened this Jan 10, 2018
@durack1
Copy link
Member Author

durack1 commented Jan 18, 2018

@dnadeau4 has a solution for this issue been implemented? I'd be keen to test

@doutriaux1 doutriaux1 modified the milestones: 2.10, 3.1 Mar 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants