RuntimeError when writing to a file with to_netcdf #2079

jwnki · 2022-07-28T11:43:23Z

Describe the bug

Trying to write an InferenceData object to a file using to_netcdfresults in an error:
RuntimeError: NetCDF: Filter error: bad id or parameters or duplicate filter

To Reproduce

import arviz as az
data = az.load_arviz_data("centered_eight")
data.to_netcdf("test.nc")

Stacktrace

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 data.to_netcdf("test.nc")

File ~/.local/lib/python3.9/site-packages/arviz/data/inference_data.py:427, in InferenceData.to_netcdf(self, filename, compress, groups)
    425 if compress:
    426     kwargs["encoding"] = {var_name: {"zlib": True} for var_name in data.variables}
--> 427 data.to_netcdf(filename, mode=mode, group=group, **kwargs)
    428 data.close()
    429 mode = "a"

File ~/.local/lib/python3.9/site-packages/xarray/core/dataset.py:1901, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1898     encoding = {}
   1899 from ..backends.api import to_netcdf
-> 1901 return to_netcdf(
   1902     self,
   1903     path,
   1904     mode,
   1905     format=format,
   1906     group=group,
   1907     engine=engine,
   1908     encoding=encoding,
   1909     unlimited_dims=unlimited_dims,
   1910     compute=compute,
   1911     invalid_netcdf=invalid_netcdf,
   1912 )

File ~/.local/lib/python3.9/site-packages/xarray/backends/api.py:1072, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1067 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
   1068 # to avoid this mess of conditionals
   1069 try:
   1070     # TODO: allow this work (setting up the file for writing array data)
   1071     # to be parallelized with dask
-> 1072     dump_to_store(
   1073         dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1074     )
   1075     if autoclose:
   1076         store.close()

File ~/.local/lib/python3.9/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1116 if encoder:
   1117     variables, attrs = encoder(variables, attrs)
-> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/.local/lib/python3.9/site-packages/xarray/backends/common.py:265, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    263 self.set_attributes(attributes)
    264 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
--> 265 self.set_variables(
    266     variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
    267 )

File ~/.local/lib/python3.9/site-packages/xarray/backends/common.py:303, in AbstractWritableDataStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    301 name = _encode_variable_name(vn)
    302 check = vn in check_encoding_set
--> 303 target, source = self.prepare_variable(
    304     name, v, check, unlimited_dims=unlimited_dims
    305 )
    307 writer.add(source, target)

File ~/.local/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:488, in NetCDF4DataStore.prepare_variable(self, name, variable, check_encoding, unlimited_dims)
    486     nc4_var = self.ds.variables[name]
    487 else:
--> 488     nc4_var = self.ds.createVariable(
    489         varname=name,
    490         datatype=datatype,
    491         dimensions=variable.dims,
    492         zlib=encoding.get("zlib", False),
    493         complevel=encoding.get("complevel", 4),
    494         shuffle=encoding.get("shuffle", True),
    495         fletcher32=encoding.get("fletcher32", False),
    496         contiguous=encoding.get("contiguous", False),
    497         chunksizes=encoding.get("chunksizes"),
    498         endian="native",
    499         least_significant_digit=encoding.get("least_significant_digit"),
    500         fill_value=fill_value,
    501     )
    503 nc4_var.setncatts(attrs)
    505 target = NetCDF4ArrayWrapper(name, self)

File src/netCDF4/_netCDF4.pyx:2838, in netCDF4._netCDF4.Dataset.createVariable()

File src/netCDF4/_netCDF4.pyx:4003, in netCDF4._netCDF4.Variable.__init__()

File src/netCDF4/_netCDF4.pyx:1965, in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Filter error: bad id or parameters or duplicate filter

Expected behavior
The data set gets written to the file.

Additional context
arviz version: 0.12.1
xarray version: 2022.3.0
netCDF4 version: 1.6.0

This was on a computer cluster which runs Debian GNU/Linux 10 (buster).
The file actually is created but reading it in shows that it's corrupted:

In [2]: aa = az.from_netcdf("test.nc")

In [3]: aa
Out[3]: 
Inference data with groups:
	>

The text was updated successfully, but these errors were encountered:

OriolAbril · 2022-07-30T13:44:19Z

We will need to look into this. It is somehow related to netcdf 1.6 but I have no idea why all tests which have already run on netcdf 1.6 continue to pass (some of the things they check is idata-netcdf-idata round trip). For now it looks like you will need to downgrade netcdf or figure out what exactly does the centered_eight example have that triggers this error.

Note: I have done some quick tests, and out of the netcdf examples provided with az.load_arviz_data, the glycan, regression and classification examples do not trigger the error. Both schools examples, radon and rugby do (I did that from memory, not sure if there are more files on which to check)

djmannion · 2022-07-31T04:12:07Z

I have run into this error also. It seems to occur when there are data or coordinates that are of an object or str datatype and when compression is enabled (encoding["zlib"] is True). I think the "centered_eight" example above should work with data.to_netcdf("test.nc", compress=False).

This issue in netcdf4-python looks to be related and suggests that compression isn't supported for this sort of data.

ahartikainen · 2022-07-31T16:27:14Z

I wonder if we could add checks for these?

twiecki · 2022-10-05T13:16:26Z

Anyone able to take this on?

OriolAbril added this to the v0.13 milestone Jul 30, 2022

farr mentioned this issue Aug 25, 2022

Use proper coordinates in the pymc models, store whitened data, compute WAIC and LOO in fit object. maxisi/ringdown#14

Merged

ahartikainen mentioned this issue Oct 6, 2022

Compress only a subset of types #2129

Merged

4 tasks

michaelosthege mentioned this issue Oct 11, 2022

Drop object-dtyped variables and coords before saving #2134

Closed

5 tasks

ahartikainen closed this as completed in #2129 Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError when writing to a file with to_netcdf #2079

RuntimeError when writing to a file with to_netcdf #2079

jwnki commented Jul 28, 2022

OriolAbril commented Jul 30, 2022

djmannion commented Jul 31, 2022 •

edited

Loading

ahartikainen commented Jul 31, 2022

twiecki commented Oct 5, 2022

RuntimeError when writing to a file with to_netcdf #2079

RuntimeError when writing to a file with to_netcdf #2079

Comments

jwnki commented Jul 28, 2022

OriolAbril commented Jul 30, 2022

djmannion commented Jul 31, 2022 • edited Loading

ahartikainen commented Jul 31, 2022

twiecki commented Oct 5, 2022

djmannion commented Jul 31, 2022 •

edited

Loading