Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygmt.which: Errors if downloading multiple tiled grids #3170

Open
seisman opened this issue Apr 15, 2024 · 4 comments
Open

pygmt.which: Errors if downloading multiple tiled grids #3170

seisman opened this issue Apr 15, 2024 · 4 comments
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT

Comments

@seisman
Copy link
Member

seisman commented Apr 15, 2024

The issue was originally reported in #3148 (comment).

Here is a minimal example to reproduce the issue.

!rm -r ~/.gmt/server/earth/earth_relief/earth_relief_15s_p  # Make sure files are not downloaded
from pygmt import which
which(fname=["@N30W120.earth_relief_15s_p.nc", "@N00E000.earth_relief_15s_p.nc"], download="a")

The errors are:

File ~/OSS/gmt/pygmt/pygmt/src/which.py:67, in which(fname, **kwargs)
     62     with lib.virtualfile_out(kind="dataset") as vouttbl:
     63         lib.call_module(
     64             module="which",
     65             args=build_arg_string(kwargs, infile=fname, outfile=vouttbl),
     66         )
---> 67         paths = lib.virtualfile_to_dataset(vfname=vouttbl, output_type="strings")
     69 match paths.size:
     70     case 0:

File ~/OSS/gmt/pygmt/pygmt/clib/session.py:1940, in Session.virtualfile_to_dataset(self, vfname, output_type, column_names, dtype, index_col)
   1937 result = self.read_virtualfile(vfname, kind="dataset").contents
   1939 if output_type == "strings":  # strings output
-> 1940     return result.to_strings()
   1942 result = result.to_dataframe(
   1943     column_names=column_names, dtype=dtype, index_col=index_col
   1944 )
   1945 if output_type == "numpy":  # numpy.ndarray output

File ~/OSS/gmt/pygmt/pygmt/datatypes/dataset.py:156, in _GMT_DATASET.to_strings(self)
    154         if segment.contents.text:
    155             textvector.extend(segment.contents.text[: segment.contents.n_rows])
--> 156 return np.char.decode(textvector) if textvector else np.array([], dtype=str)

File ~/opt/miniconda/envs/pygmt/lib/python3.12/site-packages/numpy/core/defchararray.py:615, in decode(a, encoding, errors)
    572 @array_function_dispatch(_code_dispatcher)
    573 def decode(a, encoding=None, errors=None):
    574     r"""
    575     Calls ``bytes.decode`` element-wise.
    576
   (...)
    612
    613     """
    614     return _to_bytes_or_str_array(
--> 615         _vec_string(a, object_, 'decode', _clean_args(encoding, errors)))

TypeError: string operation on non-string array

The bug is most likely caused by an upstream API bug (for example, the memory that holds the text is mistakenly freed before writing), so it's not trivial to fix it.

@weiji14
Copy link
Member

weiji14 commented Apr 15, 2024

What if we replaced None with an empty string '' in the to_strings() method? We could also raise a warning of the None -> '' conversion just in case.

@seisman
Copy link
Member Author

seisman commented Apr 15, 2024

it sounds a clever workaround

@weiji14
Copy link
Member

weiji14 commented Apr 15, 2024

Looks like np.char.decode has an 'errors' parameter that is passed to bytes.decode. Haven't tested it yet, but maybe setting np.char.decode(..., errors="replace") would work? See also https://docs.python.org/3/library/codecs.html#error-handlers

@seisman
Copy link
Member Author

seisman commented Apr 16, 2024

It doesn't work as I expected:

In [1]: import numpy as np

In [2]: x = np.array([b'abc', b'def', None])

In [3]: np.char.decode(x, errors="replace")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 np.char.decode(x, errors="replace")

File ~/opt/miniconda/envs/pygmt/lib/python3.12/site-packages/numpy/core/defchararray.py:615, in decode(a, encoding, errors)
    572 @array_function_dispatch(_code_dispatcher)
    573 def decode(a, encoding=None, errors=None):
    574     r"""
    575     Calls ``bytes.decode`` element-wise.
    576 
   (...)
    612 
    613     """
    614     return _to_bytes_or_str_array(
--> 615         _vec_string(a, object_, 'decode', _clean_args(encoding, errors)))

TypeError: string operation on non-string array

On decoding, use � (U+FFFD, the official REPLACEMENT CHARACTER).

Even if it works, None will be replaced with , which is not as good as an empty string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT
Projects
None yet
Development

No branches or pull requests

2 participants