New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Comparison kernels crashing for string array with null string scalar #18369
Comments
Joris Van den Bossche / @jorisvandenbossche: >>> pc.not_equal(pa.array(["a", None, "b"]), pa.scalar(None, type="string", from_pandas=True))
[Thread 0x7fbdb87fc700 (LWP 11192) exited]
[Thread 0x7fbdb7ffb700 (LWP 11193) exited]
[Thread 0x7fbdbaffd700 (LWP 11191) exited]
[Thread 0x7fbdbf7fe700 (LWP 11190) exited]
[Thread 0x7fbdbffff700 (LWP 11189) exited]
[Thread 0x7fbdc8fbc700 (LWP 11188) exited]
[Thread 0x7fbdc97bd700 (LWP 11187) exited]
[Detaching after fork from child process 11201]
[Detaching after fork from child process 11206]
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fbdce798223 in arrow::Buffer::operator nonstd::sv_lite::basic_string_view<char, std::char_traits<char> > (this=0x0) at ../src/arrow/buffer.h:175
175 return util::string_view(reinterpret_cast<const char*>(data_), size_);
(gdb) bt
#0 0x00007fbdce798223 in arrow::Buffer::operator nonstd::sv_lite::basic_string_view<char, std::char_traits<char> > (this=0x0) at ../src/arrow/buffer.h:175
#1 0x00007fbdcebc9511 in arrow::compute::internal::UnboxScalar<arrow::BinaryType, void>::Unbox (val=...) at ../src/arrow/compute/kernels/codegen_internal.h:275
#2 0x00007fbdcec625e6 in arrow::compute::internal::applicator::ScalarBinary<arrow::BooleanType, arrow::BinaryType, arrow::BinaryType, arrow::compute::internal::(anonymous namespace)::NotEqual>::ArrayScalar (
ctx=0x7fff74cc7620, arg0=..., arg1=..., out=0x7fff74cc7420) at ../src/arrow/compute/kernels/codegen_internal.h:697
#3 0x00007fbdcec58c5e in arrow::compute::internal::applicator::ScalarBinary<arrow::BooleanType, arrow::BinaryType, arrow::BinaryType, arrow::compute::internal::(anonymous namespace)::NotEqual>::Exec (
ctx=0x7fff74cc7620, batch=..., out=0x7fff74cc7420) at ../src/arrow/compute/kernels/codegen_internal.h:727
#4 0x00007fbdceb66e31 in std::_Function_handler<void (arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*), void (*)(arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*)>::_M_invoke(std::_Any_data const&, arrow::compute::KernelContext*&&, arrow::compute::ExecBatch const&, arrow::Datum*&&) (__functor=..., __args#0=@0x7fff74cc73a0: 0x7fff74cc7620, __args#1=...,
__args#2=@0x7fff74cc7390: 0x7fff74cc7420) at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/std_function.h:316
#5 0x00007fbdceabda82 in std::function<void (arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*)>::operator()(arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*) const (this=0x55dacf413718, __args#0=0x7fff74cc7620, __args#1=..., __args#2=0x7fff74cc7420)
at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/std_function.h:706
#6 0x00007fbdceab9458 in arrow::compute::detail::(anonymous namespace)::ScalarExecutor::ExecuteBatch (this=0x55dacf712020, batch=..., listener=0x55dacf772a30) at ../src/arrow/compute/exec.cc:578
#7 0x00007fbdceab8afa in arrow::compute::detail::(anonymous namespace)::ScalarExecutor::Execute (this=0x55dacf712020, args=..., listener=0x55dacf772a30) at ../src/arrow/compute/exec.cc:516
#8 0x00007fbdceac8b7f in arrow::compute::Function::Execute (this=0x55dacf411640, args=..., options=0x0, ctx=0x7fff74cc7850) at ../src/arrow/compute/function.cc:146
#9 0x00007fbdb360749c in __pyx_pf_7pyarrow_8_compute_8Function_6call(__pyx_obj_7pyarrow_8_compute_Function*, _object*, __pyx_obj_7pyarrow_8_compute_FunctionOptions*, __pyx_obj_7pyarrow_3lib_MemoryPool*) [clone .isra.501] () from /home/joris/scipy/repos/arrow/python/pyarrow/_compute.cpython-37m-x86_64-linux-gnu.so
# |
Kirill Lykov / @KirillLykov: I don't see cpp tests for this use case:
Let me know if I look into the wrong place. I also think it makes sense to add test on pyarrow. Something similar to arrow/python/pyarrow/tests/test_compute.py Line 769 in 64f9b3f
The problem is that the scalar is invalid ( To fix the bug, I guess some additional checks should be added to https://github.com/apache/arrow/blame/ca685a0c08bb41f43a80e5605e4cc8f9efb77cca/cpp/src/arrow/compute/kernels/codegen_internal.h#L273
|
Kirill Lykov / @KirillLykov: |
Joris Van den Bossche / @jorisvandenbossche: |
Comparing a string array with a string scalar works:
but if the scalar is a null (from the proper string type), it crashes:
(and not even debug messages ..)
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Kirill Lykov / @KirillLykov
PRs and other links:
Note: This issue was originally created as ARROW-10578. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: