Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_auto_maskandscale on variables without _FillValue attribute #209

Closed
dopplershift opened this issue Feb 26, 2014 · 9 comments
Closed

Comments

@dopplershift
Copy link
Member

From DeM...@gmail.com on November 27, 2013 01:49:14

Hi,

I try to use set_auto_maskandscale on a uint8 scaled variable without _FillValue (the full uint8 range is used for the scaling) and the problem is that the library force a _FillValue of 255 when I use set_auto_maskandscale, which is not what I want.

It this a bug? Or is there a way to handle this kind of variables?

Thanks in advance

PS: I use netCDF4-python v1.0.5

Original issue: http://code.google.com/p/netcdf4-python/issues/detail?id=209

@dopplershift
Copy link
Member Author

From whitaker.jeffrey@gmail.com on November 27, 2013 08:47:56

The netcdf C library uses a default fill value of 255 for uint8 (set by NC_FILL_UBTYE in netcdf.h). If you don't want a fill value, you must set the fill_value keyword to False when you create the variable with the createVariable Dataset method. If you don't do this, then the netcdf C library will use the default fill value, and you should only use the range 0-254 for scaling.

@dopplershift
Copy link
Member Author

From whitaker.jeffrey@gmail.com on November 27, 2013 08:52:49

If you didn't create this dataset, then a workaround would be to set var.set_auto_maskandscale(False), and then do the scaling manually.

@dopplershift
Copy link
Member Author

From DeM...@gmail.com on November 28, 2013 01:01:48

Yes I have the problem when reading the variable, I didn't create it.
For the moment I have implemented your workaround, but is it not possible to modify set_auto_maskandscale behavior to return a masked array only if the variable has a fill or missing value attribute? For me a variable only scaled but without fill value should be only automatically scaled into a numpy.ndarray, not into a masked array.
What do you think about that?

@dopplershift
Copy link
Member Author

From whitaker.jeffrey@gmail.com on November 28, 2013 07:47:14

Sounds reasonable, but... Technically speaking, every netcdf variable has a _FillValue, since the library sets one by default. That is, unless nc_def_var_fill was used to explicitly disable filling.

Ultimately, I think your data provider was wrong to provide a dataset with valid data equal to the fill value. They should have disabled filling.

@dopplershift
Copy link
Member Author

From whitaker.jeffrey@gmail.com on November 28, 2013 07:50:54

I just realized that I was not checking to see if filling was disabled before masking data equal to the default _FillValue. This is now fixed. Can you try updating from SVN? It's possible your data provider did disable filling, in which case you should get the desired result now.

@dopplershift
Copy link
Member Author

From DeM...@gmail.com on November 29, 2013 07:50:38

Ok thanks for the explanations, which is also what I found in the NetCDF4 documentation, it is more clear for me now. I write/read NetCDF since a long time but never get this point about default fillvalue. So a variable has a fill value even if it doesn't have an explicit _FillValue attribute (and so by default you cannot use the full range of the variable except by setting the fill mode). This seems not very intuitive to me and I think a lot of files on the world doesn't follow this rule... but anyway this is not your problem because this is the NetCDF specifications :-)

Note that it seems there is an exception for byte variables: http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/Fill-Values.html#Fill-Values "If you need a fill value for a byte variable, it is recommended that you explicitly define an appropriate _FillValue attribute, as generic utilities such as ncdump will not assume a default fill value for byte variables."
Explained here too: http://www.unidata.ucar.edu/software/netcdf/docs/known_problems.html#ncdump_ubyte_fill "There should be no default fill values when reading any byte type, signed or unsigned, because the byte ranges are too small to assume one of the values should appear as a missing value unless a _FillValue attribute is set explicitly."

I suppose you didn't implement this exception because my test was on a byte variable?

Unfortunately we can't update all our data provider and have tons of existing files supposing there is no fill value if the attribute is missing, so for the moment I will stick to the workaround.

Thanks a lot for your great NetCDF4-Python library, it's very useful, and very well designed!

@dopplershift
Copy link
Member Author

From DeM...@gmail.com on November 29, 2013 07:57:56

I forgot: is there an easy way to dump the fill mode information of each variable with ncdump or with your ncinfo?

@dopplershift
Copy link
Member Author

From whitaker.jeffrey@gmail.com on November 29, 2013 08:33:51

I had not seen that exception for byte variables in the docs - thank you for pointing that out. I have now implemented that exception in svn, so no default fill_value is assumed for signed or unsigned byte data dtypes.

ncdump does not print fill mode information. I just modified ncinfo so it will print fill mode info when you do 'ncinfo -v '.

@dopplershift
Copy link
Member Author

From whitaker.jeffrey@gmail.com on February 25, 2014 18:04:33

Status: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant