Skip to content
This repository has been archived by the owner on Sep 1, 2022. It is now read-only.

Enumerations become strings when setting the ncML attribute enhance="all" #1042

Open
risquez opened this issue Feb 28, 2018 · 1 comment
Open

Comments

@risquez
Copy link

risquez commented Feb 28, 2018

Please, I would like to confirm that the behavior that I observe is intended and not a bug.

I am interested in how enumerations are managed when they are defined in a ncML and afterwards a netCDF file is created from there.

Shortly:

When I define an enumeration, using enhance="all", an associated variable becomes a string (not an enumeration anymore), and I can write any value into it, not only those values defined in the enumeration. Is this intended or a bug?

This behavior looks different than the netCDF-C library approach. The netCDF-C library always manages the integer value and giving the responsibility of doing the conversion to strings to the user/programmer. From the netCDF-C documentation:

Enums are based on any integer types.
The underlying integer type is what is stored in the file.

Long explanation:

Following the NetcdfDataset Tutorial (enhance):

When using ConvertEnums enhance mode, Variables of type enum are promoted to String types and data is automatically converted using the EnumTypedef objectss, which are maps of the stored integer values to String values.

Ok, I understand, but I was expecting that albeit the variable becomes a netCDF string, the set of values for the variable would be limited to the enumeration mapping. And this is not the case. Let me give an example:

I created the following ncML:

<?xml version="1.0" encoding="UTF-8"?>
<ncml:netcdf xmlns:ncml="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="all">

  <ncml:enumTypedef name="ssd_type" type="enum1">
      <ncml:enum key="0">500m</ncml:enum>
      <ncml:enum key="1">1km</ncml:enum>
      <ncml:enum key="2">2km</ncml:enum>
  </ncml:enumTypedef>

  <ncml:variable name="ssd" shape="" type="enum1" typedef="ssd_type">
  </ncml:variable>

</ncml:netcdf>

Note that I use the enhance="all" ncML attribute, and that ssd is a variable of the ssd_type enumeration data type.

I create a netCDF file using the netCDF-Java library (note: a preview of v5):

java -Xmx1g -classpath netcdfAll-5.0.0-20180211.132322-244.jar ucar.nc2.write.Nccopy --input test.ncml --output test.nc --format netcdf4

The output netCDF file that I get is this (using ncdump):

netcdf test {
types:
    byte enum ssd_type {\500m = 0, \1km = 1, \2km = 2} ;
variables:
    string ssd ;
// global attributes:
    :_CoordSysBuilder = "ucar.nc2.dataset.conv.DefaultConvention" ;
data:
    ssd = _ ;
}

Note that ssd is a variable of string data type. Ok, as described in the tutorial.

But I am surprised when now I could populate the ssd variable with any data. For example, using Python I could write "wrong" in the variable (!), although that is not one of the allowed values in the enumeration (500m, 1km, 2km).

from netCDF4 import Dataset
ncid = Dataset( 'test.nc', 'a' )
ncid.variables['ssd'][0]='wrong'
ncid.close()

And now the output from ncdump is:

netcdf test{
types:
    byte enum ssd_type {\500m = 0, \1km = 1, \2km = 2} ;
variables:
    string ssd ;
// global attributes:
    :_CoordSysBuilder = "ucar.nc2.dataset.conv.DefaultConvention" ;
data:
    ssd = "wrong" ;
}

An enumerated variable becomes a string and therefore it accepts any value. Please, is this the intended behavior or a bug?

@DennisHeimbigner
Copy link
Contributor

Yes, this is intended behavior. Restricting the value set would
in effect require us to implement a type that is equivalent to enum.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants