Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpinning netCDF4 version causes failure #988

Open
lsetiawan opened this issue Mar 15, 2023 · 2 comments
Open

Unpinning netCDF4 version causes failure #988

lsetiawan opened this issue Mar 15, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@lsetiawan
Copy link
Member

lsetiawan commented Mar 15, 2023

Overview

This issue brings back #801. Currently we have a pin on netCDF4, which doesn't bring in the latest netCDF4 version 1.6.3. After digging into this some more it seems like this is an issue with us using zlib as mentioned in the other PR. However, now netCDF4 actually allows for compression attribute to be used, see https://unidata.github.io/netcdf4-python/#Dataset.createVariable. But, the package we depend on, xarray doesn't currently have this option as mentioned in pydata/xarray#7388. Someone has put in a PR for this fix 3 weeks ago (pydata/xarray#7551), but it's still active, so at some point, we can potentially unpin and use the zstd compression like we are using with zarr as seen in

"zarr": {
"float": {"compressor": zarr.Blosc(cname="zstd", clevel=3, shuffle=2)},
"int": {"compressor": zarr.Blosc(cname="lz4", clevel=5, shuffle=1, blocksize=0)},
"string": {"compressor": zarr.Blosc(cname="lz4", clevel=5, shuffle=1, blocksize=0)},
"time": {"compressor": zarr.Blosc(cname="lz4", clevel=5, shuffle=1, blocksize=0)},
},
.

@lsetiawan lsetiawan added the bug Something isn't working label Mar 15, 2023
@lsetiawan lsetiawan changed the title Unpin netCDF4 version causes failure Unpinning netCDF4 version causes failure Mar 16, 2023
@emiliom
Copy link
Collaborator

emiliom commented Aug 2, 2023

Currently, tests/convert/test_convert_source_target_locs.py::test_convert_ek fails when running with the "netcdf4" engine and on a conda environment where netCDF4 was installed from conda-forge. It leads to a segmentation fault error. The error doesn't occur when netCDF4 is install via pip install. In both cases, that's with netCDF4 pinned to <1.6.

The segmentation doesn't occur writing to netcdf, though. It happens when reading the netcdf file created by the test. More specifically, it occurs when reading two of the netcdf groups: "Sonar/Beam_group1" and "Vendor_specific":

ds = xr.open_dataset('Summer2017-D20170615-T190214.nc', engine='netcdf4', group='Sonar/Beam_group1')
Segmentation fault (core dumped)

In both of these groups, channel is a coordinate variable, and it's set as a vlen string. It appears that the xarray vlen issue happens specifically with compressed vlen string variables that are coordinate variables; see NCAS-CMS/cfdm#199.

In utils/coding.py, we set the default zlib to True for netcdf4. For zarr we're able to handle data-type-specific encodings, but netcdf4 doesn't support that approach. So, if we set zlib to False, all variables, all types, will be uncompressed.

I think a possible solution could be setting channel to be a fixed-length string type using an arbitrarily large string length.

@emiliom
Copy link
Collaborator

emiliom commented Aug 12, 2023

After digging into this some more it seems like this is an issue with us using zlib as mentioned in the other PR.

This has been addressed in #1112. Now zlib is set to False for string variables.

However, now netCDF4 actually allows for compression attribute to be used, see https://unidata.github.io/netcdf4-python/#Dataset.createVariable. But, the package we depend on, xarray doesn't currently have this option as mentioned in pydata/xarray#7388. Someone has put in a PR for this fix 3 weeks ago (pydata/xarray#7551), but it's still active, so at some point, we can potentially unpin and use the zstd compression like we are using with zarr

PR #1112 doesn't introduce the use of a compression attribute for netCDF4. The core issue was the failure with variable-length string variables when compression via zlib was set to True. That has been fixed by setting zlib to False for all string variables. The use of a compression attribute per se is left for future consideration as an enhancement (not a bug fix -- that's already done!), once it's implemented in xarray.

@emiliom emiliom modified the milestone: 0.8.0 Sep 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants