-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Importing an extension type without ARROW:extension:metadata
crashes
#41741
Comments
A reproducer just with pyarrow as well (using your registered ext type example): class DummyArray:
def __init__(self):
self._field = pa.field("", pa.null(), metadata={"ARROW:extension:name": "arrow.test"})
self._array = pa.array([None, None], pa.null())
def __arrow_c_array__(self, requested_schema=None):
return self._field.__arrow_c_schema__(), self._array.__arrow_c_array__()[1]
pa.array(DummyArray()) which crashes, and when adding Now, apart from your fix (thanks for that!), I realize from the example that it is right now actually impossible to register an extension type without metadata (at least from Python). I don't know it we should fix that (e.g. allow |
We should probably also check that this works in other places that imports extension types, like reading IPC. |
For IPC it seems to work: table = pa.Table.from_arrays(
[pa.array([None, None], pa.null())],
schema=pa.schema([pa.field("a", pa.null(), metadata={"ARROW:extension:name": "arrow.test"})])
)
with pa.ipc.new_file("/tmp/test_ext_no_meta.arrow", table.schema) as f:
f.write(table)
with pa.ipc.open_file("/tmp/test_ext_no_meta.arrow") as f:
result = f.read_all()
print(result.schema)
pa.register_extension_type(DummyExtType())
with pa.ipc.open_file("/tmp/test_ext_no_meta.arrow") as f:
result = f.read_all()
print(result.schema) And the same for write/read Parquet (where we get the extension type from the ARROW:schema) |
ARROW:extension:metadata
that was registered with metadata crashesARROW:extension:metadata
crashes
Good call! It looks like the version of this that IPC and Parquet (which I believe uses the IPC Schema encoding) use is here and has more or less the same logic. arrow/cpp/src/arrow/ipc/metadata_internal.cc Lines 871 to 892 in 065a6da
(Also good call on the registration not mattering!) |
…ttempting to delete it (#41763) ### Rationale for this change Neither Schema.fbs nor the Arrow C Data interface nor the columnar specification indicates that the ARROW:extension:metadata key must be present; however, the `ImportType()` implementation assumes that both `ARROW:extension:name` and `ARROW:extension:metadata` are both present and throws an exception if `ARROW:extension:metadata` is missing. This causes pyarrow to crash (see issue for reproducer). ### What changes are included in this PR? This PR checks that the extension metadata is present before attempting to delete it. ### Are these changes tested? Yes (test added). ### Are there any user-facing changes? No. * GitHub Issue: #41741 Authored-by: Dewey Dunnington <dewey@voltrondata.com> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
Issue resolved by pull request 41763 |
…fore attempting to delete it (apache#41763) ### Rationale for this change Neither Schema.fbs nor the Arrow C Data interface nor the columnar specification indicates that the ARROW:extension:metadata key must be present; however, the `ImportType()` implementation assumes that both `ARROW:extension:name` and `ARROW:extension:metadata` are both present and throws an exception if `ARROW:extension:metadata` is missing. This causes pyarrow to crash (see issue for reproducer). ### What changes are included in this PR? This PR checks that the extension metadata is present before attempting to delete it. ### Are these changes tested? Yes (test added). ### Are there any user-facing changes? No. * GitHub Issue: apache#41741 Authored-by: Dewey Dunnington <dewey@voltrondata.com> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
Describe the bug, including details regarding any error messages, version, and platform.
I ran into this when running tests in geoarrow-c, which includes an implementation of the GeoArrow extension types for Arrow C++ ( geoarrow/geoarrow-c#94 ). The sequence of events that triggers this is:
ARROW:extension:name
andARROW:extension:metadata
ImportType()
from the C data interface with an extension type that only containsARROW:extension:name
.The backtrace from C++ that throws the exception is:
Reproducer in Python:
Component(s)
C++
The text was updated successfully, but these errors were encountered: