Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] segfault when calling nbytes on empty table with dictionary field #33971

Closed
0x26res opened this issue Feb 1, 2023 · 0 comments · Fixed by #33994
Closed

[Python] segfault when calling nbytes on empty table with dictionary field #33971

0x26res opened this issue Feb 1, 2023 · 0 comments · Fixed by #33994

Comments

@0x26res
Copy link
Contributor

0x26res commented Feb 1, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Using pyarrow.Table.nbytes on an empty table with a dictionary field causes a segmentation fault.

import pyarrow as pa

schema = pa.schema(
    [
        pa.field("foo", pa.dictionary(pa.int32(), pa.string())),
    ]
)
table = pa.table({"foo": []}, schema=schema)

print(table.nbytes)  # segmentation fault  python

A few notes:

  • get_total_buffer_size works (which is a good workaround for now)
  • It works if the table has got one or more row (including nulls)
  • It works if the table is empty but there are not dictionary fields
  • I'm using pyarrow==11.0.0 and Python 3.9.16 (but it is happening in 10.0.1 as well, so not a new bug)

Component(s)

Python

@westonpace westonpace self-assigned this Feb 2, 2023
westonpace added a commit that referenced this issue Feb 7, 2023
…#33994)

If the AdaptiveIntBuilder was empty it would yield a null for the values buffer.  The byte_size.h utilities were not expecting this which led to the error reported in the issue.

Example:

```
>>> pa.array([], type=pa.dictionary(pa.int32(), pa.string())).buffers()
[None, None]
```
* Closes: #33971

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
@westonpace westonpace added this to the 12.0.0 milestone Feb 7, 2023
sjperkins pushed a commit to sjperkins/arrow that referenced this issue Feb 10, 2023
…buffer (apache#33994)

If the AdaptiveIntBuilder was empty it would yield a null for the values buffer.  The byte_size.h utilities were not expecting this which led to the error reported in the issue.

Example:

```
>>> pa.array([], type=pa.dictionary(pa.int32(), pa.string())).buffers()
[None, None]
```
* Closes: apache#33971

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
gringasalpastor pushed a commit to gringasalpastor/arrow that referenced this issue Feb 17, 2023
…buffer (apache#33994)

If the AdaptiveIntBuilder was empty it would yield a null for the values buffer.  The byte_size.h utilities were not expecting this which led to the error reported in the issue.

Example:

```
>>> pa.array([], type=pa.dictionary(pa.int32(), pa.string())).buffers()
[None, None]
```
* Closes: apache#33971

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Feb 24, 2023
…buffer (apache#33994)

If the AdaptiveIntBuilder was empty it would yield a null for the values buffer.  The byte_size.h utilities were not expecting this which led to the error reported in the issue.

Example:

```
>>> pa.array([], type=pa.dictionary(pa.int32(), pa.string())).buffers()
[None, None]
```
* Closes: apache#33971

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants