Skip to content

Improved round tripping for attrs #10275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hmaarrfk opened this issue Apr 30, 2025 · 2 comments
Open

Improved round tripping for attrs #10275

hmaarrfk opened this issue Apr 30, 2025 · 2 comments

Comments

@hmaarrfk
Copy link
Contributor

Is your feature request related to a problem?

Consider the following dataset:

import xarray as xr
dataset = xr.Dataset()
dataset.attrs = {
    'empty_attribute': [],
    'attribute_of_length_1': ['one_item_only'],
    'attribute_of_length_2': ['one_item', 'two_items'],
}
dataset.to_netcdf('foo.nc')

loaded = xr.open_dataset('foo.nc')

from pprint import pprint
pprint(loaded.attrs)
pprint(loaded.attrs)
{'attribute_of_length_1': 'one_item_only',
 'attribute_of_length_2': ['one_item', 'two_items'],
 'empty_attribute': array([], dtype=float64)}
  1. The attrs that contains an empty list has become an array of type float
  2. The attrs that contains a list of one element, has been downcast to a scalar
  3. The attrs that contains two elements has been kept as a list of strings.

Describe the solution you'd like

A away to ensure that lists stay lists.

Does this option already exists?

Describe alternatives you've considered

Writing nice error prone if statements

Additional context

I tried to look through https://github.com/pydata/xarray/blob/main/xarray/tests/test_dataset.py and didn't find anything that was testing this special behavior.

@hmaarrfk
Copy link
Contributor Author

With the default engine

 h5dump foo.nc
HDF5 "foo.nc" {
GROUP "/" {
   ATTRIBUTE "_NCProperties" {
      DATATYPE  H5T_STRING {
         STRSIZE 34;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "version=2,netcdf=4.9.2,hdf5=1.14.3"
      }
   }
   ATTRIBUTE "attribute_of_length_1" {
      DATATYPE  H5T_STRING {
         STRSIZE 13;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "one_item_only"
      }
   }
   ATTRIBUTE "attribute_of_length_2" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): "one_item", "two_items"
      }
   }
   ATTRIBUTE "empty_attribute" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  NULL
      DATA {
      }
   }
}
}

With specifying the engine as h5netcdf:

$ h5dump foo_h5netcdf.nc
HDF5 "foo_h5netcdf.nc" {
GROUP "/" {
   ATTRIBUTE "_NCProperties" {
      DATATYPE  H5T_STRING {
         STRSIZE 48;
         STRPAD H5T_STR_NULLPAD;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "version=2,h5netcdf=1.6.1,hdf5=1.14.3,h5py=3.13.0"
      }
   }
   ATTRIBUTE "attribute_of_length_1" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): "one_item_only"
      }
   }
   ATTRIBUTE "attribute_of_length_2" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): "one_item", "two_items"
      }
   }
   ATTRIBUTE "empty_attribute" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 0 ) / ( 0 ) }
      DATA {
      }
   }
}
}

however, even though the difference DATASPACE SCALAR vs DATASPACE SIMPLE exists, it doesn't affect how the data is read in.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Jun 2, 2025

not sure if anybody has thoughts on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant