pygmt.which: Errors if downloading multiple tiled grids #3170

seisman · 2024-04-15T09:14:56Z

The issue was originally reported in #3148 (comment).

Here is a minimal example to reproduce the issue.

!rm -r ~/.gmt/server/earth/earth_relief/earth_relief_15s_p  # Make sure files are not downloaded
from pygmt import which
which(fname=["@N30W120.earth_relief_15s_p.nc", "@N00E000.earth_relief_15s_p.nc"], download="a")

The errors are:

File ~/OSS/gmt/pygmt/pygmt/src/which.py:67, in which(fname, **kwargs)
     62     with lib.virtualfile_out(kind="dataset") as vouttbl:
     63         lib.call_module(
     64             module="which",
     65             args=build_arg_string(kwargs, infile=fname, outfile=vouttbl),
     66         )
---> 67         paths = lib.virtualfile_to_dataset(vfname=vouttbl, output_type="strings")
     69 match paths.size:
     70     case 0:

File ~/OSS/gmt/pygmt/pygmt/clib/session.py:1940, in Session.virtualfile_to_dataset(self, vfname, output_type, column_names, dtype, index_col)
   1937 result = self.read_virtualfile(vfname, kind="dataset").contents
   1939 if output_type == "strings":  # strings output
-> 1940     return result.to_strings()
   1942 result = result.to_dataframe(
   1943     column_names=column_names, dtype=dtype, index_col=index_col
   1944 )
   1945 if output_type == "numpy":  # numpy.ndarray output

File ~/OSS/gmt/pygmt/pygmt/datatypes/dataset.py:156, in _GMT_DATASET.to_strings(self)
    154         if segment.contents.text:
    155             textvector.extend(segment.contents.text[: segment.contents.n_rows])
--> 156 return np.char.decode(textvector) if textvector else np.array([], dtype=str)

File ~/opt/miniconda/envs/pygmt/lib/python3.12/site-packages/numpy/core/defchararray.py:615, in decode(a, encoding, errors)
    572 @array_function_dispatch(_code_dispatcher)
    573 def decode(a, encoding=None, errors=None):
    574     r"""
    575     Calls ``bytes.decode`` element-wise.
    576
   (...)
    612
    613     """
    614     return _to_bytes_or_str_array(
--> 615         _vec_string(a, object_, 'decode', _clean_args(encoding, errors)))

TypeError: string operation on non-string array

The bug is most likely caused by an upstream API bug (for example, the memory that holds the text is mistakenly freed before writing), so it's not trivial to fix it.

The text was updated successfully, but these errors were encountered:

…r issue #3170

weiji14 · 2024-04-15T21:24:25Z

What if we replaced None with an empty string '' in the to_strings() method? We could also raise a warning of the None -> '' conversion just in case.

seisman · 2024-04-15T23:09:50Z

it sounds a clever workaround

weiji14 · 2024-04-15T23:16:38Z

Looks like np.char.decode has an 'errors' parameter that is passed to bytes.decode. Haven't tested it yet, but maybe setting np.char.decode(..., errors="replace") would work? See also https://docs.python.org/3/library/codecs.html#error-handlers

seisman · 2024-04-16T00:11:39Z

It doesn't work as I expected:

In [1]: import numpy as np

In [2]: x = np.array([b'abc', b'def', None])

In [3]: np.char.decode(x, errors="replace")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 np.char.decode(x, errors="replace")

File ~/opt/miniconda/envs/pygmt/lib/python3.12/site-packages/numpy/core/defchararray.py:615, in decode(a, encoding, errors)
    572 @array_function_dispatch(_code_dispatcher)
    573 def decode(a, encoding=None, errors=None):
    574     r"""
    575     Calls ``bytes.decode`` element-wise.
    576 
   (...)
    612 
    613     """
    614     return _to_bytes_or_str_array(
--> 615         _vec_string(a, object_, 'decode', _clean_args(encoding, errors)))

TypeError: string operation on non-string array

On decoding, use � (U+FFFD, the official REPLACEMENT CHARACTER).

Even if it works, None will be replaced with �, which is not as good as an empty string.

seisman added bug Something isn't working upstream Bug or missing feature of upstream core GMT labels Apr 15, 2024

seisman added a commit that referenced this issue Apr 15, 2024

CI: Reorganized the list of data files for caching as a workaround fo…

e23d438

…r issue #3170

seisman mentioned this issue Apr 15, 2024

Reorganize the list of data files for caching #3171

Merged

seisman added a commit that referenced this issue Apr 15, 2024

CI: Reorganized the list of data files for caching as a workaround fo…

06caf28

…r issue #3170

seisman mentioned this issue Apr 15, 2024

pygmt.which: Refactor to get rid of temporary files #3148

Merged

seisman mentioned this issue Apr 16, 2024

GMT_DATASET: Add workaround for None values in the trailing text #3174

Merged

seisman mentioned this issue Apr 24, 2024

Use unique earth_age tiles in test_dataset_to_strings_with_none_values #3200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pygmt.which: Errors if downloading multiple tiled grids #3170

pygmt.which: Errors if downloading multiple tiled grids #3170

seisman commented Apr 15, 2024 •

edited

weiji14 commented Apr 15, 2024

seisman commented Apr 15, 2024 •

edited

weiji14 commented Apr 15, 2024 •

edited

seisman commented Apr 16, 2024

pygmt.which: Errors if downloading multiple tiled grids #3170

pygmt.which: Errors if downloading multiple tiled grids #3170

Comments

seisman commented Apr 15, 2024 • edited

weiji14 commented Apr 15, 2024

seisman commented Apr 15, 2024 • edited

weiji14 commented Apr 15, 2024 • edited

seisman commented Apr 16, 2024

seisman commented Apr 15, 2024 •

edited

seisman commented Apr 15, 2024 •

edited

weiji14 commented Apr 15, 2024 •

edited