New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] pyarrow.dataset.FileSystemDataset.take method causes Segmentation Fault #30777
Comments
Joris Van den Bossche / @jorisvandenbossche: I can reproduce this, and running it under gdb gives me the following traceback: Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fb5d0f2b8dc in std::__shared_ptr<arrow::ArrayData, (__gnu_cxx::_Lock_policy)2>::get (this=0x8)
at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/9.3.0/bits/shared_ptr_base.h:1310
1310 { return _M_ptr; }
(gdb) bt
#0 0x00007fb5d0f2b8dc in std::__shared_ptr<arrow::ArrayData, (__gnu_cxx::_Lock_policy)2>::get (this=0x8)
at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/9.3.0/bits/shared_ptr_base.h:1310
#1 0x00007fb5d0f2864b in std::__shared_ptr_access<arrow::ArrayData, (__gnu_cxx::_Lock_policy)2, false, false>::_M_get (this=0x8)
at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/9.3.0/bits/shared_ptr_base.h:1021
#2 0x00007fb5d0f25495 in std::__shared_ptr_access<arrow::ArrayData, (__gnu_cxx::_Lock_policy)2, false, false>::operator-> (this=0x8)
at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/9.3.0/bits/shared_ptr_base.h:1015
#3 0x00007fb5d0f22387 in arrow::Array::null_count (this=0x0) at ../src/arrow/array/array_base.cc:52
#4 0x00007fb5cd0cd9ea in arrow::dataset::Scanner::TakeRows (this=0x5584436ebbd0, indices=...) at ../src/arrow/dataset/scanner.cc:1036
#5 0x00007fb5cd32cfcc in __pyx_pw_7pyarrow_8_dataset_7Scanner_15take(_object*, _object*) () from /home/joris/scipy/repos/arrow/python/pyarrow/_dataset.cpython-38-x86_64-linux-gnu.so |
Joris Van den Bossche / @jorisvandenbossche: This happens because we simply pass the user-passed |
Dustin Zubke: And thank you for the observation to use a pyarrow array. I have confirmed that this approach works for me without raising a Segmentation Fault. It may be helpful for future users to specify that the |
Dustin Zubke: Thanks! |
Joris Van den Bossche / @jorisvandenbossche: |
Dustin Zubke: Happy New Year! |
Antoine Pitrou / @pitrou: |
Whenever I try calling the
pyarrow.dataset.FileSystemDataset.take
method, I get a segmentation fault.I first encountered this using a proprietary dataset and recreated it with the UC Davis data below. I can successfully run the pyarrow.dataset.FileSystemDataset.to_batches method but not the take() method.
Steps to recreate:
Creating a dataset from a directory as below also results in a segfault.
Environments tried:
Environment: Ubuntu 18.04, Python 3.8.6; macOS 11.6.2, Python 3.7.5
Reporter: Dustin Zubke
Assignee: Joris Van den Bossche / @jorisvandenbossche
PRs and other links:
Note: This issue was originally created as ARROW-15286. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: