New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDMS2 2.12 wrongly scaling variable #174

Closed
JimBiardCics opened this Issue Oct 2, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@JimBiardCics

JimBiardCics commented Oct 2, 2017

CDMS2 version 2.12 is wrongly scaling a variable. The CDL for the variable in question is

 short data(time, lat, lon) ;
	data :long_name = "data" ;
	data :missing_value = -27300s ;
	data :scale_factor = 0.01 ;
	data :add_offset = 273. ;

When the value in the variable in the input data file is -27300, the scaled value should be 0.0. However, when you open the file with CDMS2 and look at the value returned, it is 0.01. The value is masked in the variable data array, but the value is wrong. It can then be propagated into files that are written using data from the variable if the values are unmasked.

All other values are scaled correctly. Only the values corresponding to the missing value have a bad value.

The attached cdl file can be used to create a netCDF file with ncgen. The command is
ncgen -k 3 scaledShort.txt
The python script will open the file scaledShort.nc that was generated by ncgen and write out the unmasked first data value, which CDMS2 auto-converted from int16 to float.

readScaledShort.py.txt
scaledShort.txt

(I uploaded new files and corrected the CDL snippet from my original post.)

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Oct 11, 2017

The unpacking is done by libnetcdf automatically, so you should have the same problem with NCO.

Your missing value types is Long Long which is wrong, it should be short like your scaledShort.txt output.

@doutriaux1 doutriaux1 added this to the 3.0 milestone Oct 11, 2017

@JimBiardCics

This comment has been minimized.

JimBiardCics commented Oct 11, 2017

@dnadeau4 when does libnetcdf do automatic unpacking? I'm not aware of that one. Also, if you will check the CDL and the output from ncgen, the missing_value attribute type is short, not long long. I'm not sure where the LL came from in the snippet in my original post. I have also discovered that I skipped a beat when I made the readScaledShort.py file that I posted before. I've attached a corrected script file. I'll also edit and correct the original post.

The problem stands.

readScaledShort.py.txt

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Oct 11, 2017

It does automatic unpacking with OpenDAP and netcdfJava

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Oct 11, 2017

This look fine to me. If you can improve it you can create a pull request.
https://github.com/UV-CDAT/cdms/blob/master/Lib/avariable.py#L1451-L1475

@JimBiardCics

This comment has been minimized.

JimBiardCics commented Oct 11, 2017

@dnadeau4 @doutriaux1 Have you tried the test I posted? I don't know where the problem is occurring, but it the test shows it clearly. The first value in the array dumped by the little python script should be 0.0. It's 0.01. The value is masked, but the value is wrong. The particular function you referenced looks OK to me too, but the final result from CDMS2 is wrong.

@JimBiardCics

This comment has been minimized.

JimBiardCics commented Oct 11, 2017

So, I decided to chase this a bit. I found that the error occurs in this line

result = scale_factor * ar + add_offset

https://github.com/UV-CDAT/cdms/blob/master/Lib/avariable.py#L1465

The masked value (for reasons unclear to me) is set to the value of scale_factor instead of being multiplied by it. I examined the result of scale_factor * ar and verified that the masked value was set to scale_factor rather than multiplied by it. The add_offset value is not added to the masked value, and that leaves us in the spot where the masked value is now wrong. This smells like a numpy bug, but it's easy enough to work around.

The output from the internal CMDS2 decode method should either properly scale the values masked by the missing_value or leave them untouched. The netCDF4 packages leaves them untouched, so the original -27300 short becomes a -27300.0 float.

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Oct 11, 2017

This is a numpy problem. The arithmetic with masked array is different than ndarray.

(scale_factor*ar+add_offset)[0,0,0]
masked
(scale_factor*ar+add_offset).data[0,0,0]
0.01
type(ar) 
<class 'numpy.ma.core.MaskedArray'>
(scale_factor*ar.data[:]+add_offset)[0,0,0]
0.0
type(ar.data) 
<type 'numpy.ndarray'>

Since the result is masked, CDMS2 is returning the right answer.

ar.mask[0,0,0]=False
(scale_factor*ar+add_offset)
masked_array(data =
 [[[0.0 3.5 7.0 10.5 14.0 17.5 21.0 24.5 28.0 31.5]
  [35.0 38.5 42.0 45.5 49.0 52.5 56.0 59.5 63.0 66.5]
  [70.0 73.5 77.0 80.5 84.0 87.5 91.0 94.5 98.0 101.5]
  [105.0 108.5 112.0 115.5 119.0 122.5 126.0 129.5 133.0 136.5]
  [140.0 143.5 147.0 150.5 154.0 157.5 161.0 164.5 168.0 171.5]
  [175.0 178.5 182.0 185.5 189.0 192.5 196.0 199.5 203.0 206.5]
  [210.0 213.5 217.0 220.5 224.0 227.5 231.0 234.5 238.0 241.5]
  [245.0 248.5 252.0 255.5 259.0 262.5 266.0 269.5 273.0 276.5]
  [280.0 283.5 287.0 290.5 294.0 297.5 301.0 304.5 308.0 311.5]
  [315.0 318.5 322.0 325.5 329.0 332.5 336.0 339.5 343.0 346.5]]],
             mask =
 [[[False False False False False False False False False False]
  [False False False False False False False False False False]
  [False False False False False False False False False False]
  [False False False False False False False False False False]
  [False False False False False False False False False False]
  [False False False False False False False False False False]
  [False False False False False False False False False False]
  [False False False False False False False False False False]
  [False False False False False False False False False False]
  [False False False False False False False False False False]]],
       fill_value = -27300)

I fixed it for 3.0.

@JimBiardCics

This comment has been minimized.

JimBiardCics commented Oct 12, 2017

Thanks!

@dnadeau4 dnadeau4 closed this Oct 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment