Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Python] Crashes and incorrect results when converting large integers to dates #27842

Closed
asfimport opened this issue Mar 18, 2021 · 5 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Mar 18, 2021

Running this code snippet will cause a crash. This happens for a range of numbers around this one as well:

 

import pyarrow
date = pyarrow.array([-1448879500], pyarrow.date32())
print(date)

I don't know where this crash is coming from, so it might be in the C++ code rather than the Python bindings.

For other extreme numbers you get the wrong result. It looks like something is overflowing. Here is the input and result for a few different examples:

  • -2000000000 -> 31179-12-27

  • -1000000000 -> 16574-12-29

  • 2000000000 -> -27240-01-06

  • 1000000000 -> -12635-01-03

    I would prefer if these gave errors rather than silently overflowing.

     

Environment: OS: Windows 10 Pro (Version 20H2)
CPU: AMD Ryzen 5 1600 Six-Core Processor 3.20 GHz
Python: 3.8.8 AMD64
pyarrow is latest version installed with pip
Reporter: Tim Evans
Assignee: Antoine Pitrou / @pitrou

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12011. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
Thanks for the report! I can confirm this happens on the main branch (commit 43d00e9629fe34dc40c78ea96c008de186726a39).

In all cases, it's because the given date either overflows or is an invalid value for the underlying C++ date type. I'm not sure if we should disallow these values entirely, since the format (as far as I can see) says nothing about the range of valid values, and the underlying value is valid, if extreme - but at least you'd expect it to not crash when printing. I see @bkietz  and @jorisvandenbossche  have looked at similar issues before - what do you think?

Trimmed backtrace for the crash. The main issue is that the date::year_month_day value is invalid (in particular, the year is invalid, it's -32768).


(gdb) bt
#0  0x00007ffff6e54fb7 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff6e56921 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff40b3892 in __gnu_cxx::__verbose_terminate_handler () at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1610729750655/work/.build/x86_64-conda-linux-gnu/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff40b1f69 in __cxxabiv1::__terminate (handler=<optimized out>) at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1610729750655/work/.build/x86_64-conda-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x00007ffff40b1fab in std::terminate () at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1610729750655/work/.build/x86_64-conda-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x00007ffff40b2194 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x555555de36d0, tinfo=tinfo@entry=0x7ffff416d1a8 <typeinfo for std::__ios_failure>, dest=dest@entry=0x7ffff40d11d4 <std::__ios_failure::~__ios_failure()>)
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1610729750655/work/.build/x86_64-conda-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6  0x00007ffff40af3a2 in std::__throw_ios_failure (__s=__s@entry=0x7ffff412e067 "basic_ios::clear")
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1610729750655/work/.build/x86_64-conda-linux-gnu/src/gcc/libstdc++-v3/src/c++11/cxx11-ios_failure.cc:115
#7  0x00007ffff40eb0aa in std::basic_ios<char, std::char_traits<char> >::clear (this=<optimized out>, __state=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1610729750655/work/.build/x86_64-conda-linux-gnu/build/build-cc-gcc-final/x86_64-conda-linux-gnu/libstdc++-v3/include/bits/ios_base.h:166
#8  0x00007ffff5289af3 in arrow_vendored::date::to_stream<char, std::char_traits<char>, std::chrono::duration<long, std::ratio<1l, 1l> > > (os=..., fmt=0x7ffff5cf063d "F", fds=..., abbrev=0x7fffffffb170, offset_sec=0x7fffffffb168)
    at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/vendored/datetime/date.h:5078
#9  0x00007ffff527678c in arrow_vendored::date::to_stream<char, std::char_traits<char>, std::chrono::duration<int, std::ratio<86400l, 1l> > > (os=..., fmt=0x7ffff5cf063c "%F", tp=...)
    at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/vendored/datetime/date.h:5995
#10 0x00007ffff52718f4 in arrow_vendored::date::format<char, std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<int, std::ratio<86400l, 1l> > > > (fmt=0x7ffff5cf063c "%F", tp=...)
    at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/vendored/datetime/date.h:6021
#11 0x00007ffff5353770 in arrow::ArrayPrinter::FormatDateTime<std::chrono::duration<int, std::ratio<86400l, 1l> > > (this=0x7fffffffb610, fmt=0x7ffff5cf063c "%F", value=-1448879500, add_epoch=true)
    at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/pretty_print.cc:398
#12 0x00007ffff5350d86 in std::enable_if<std::is_base_of<arrow::DateType, arrow::NumericArray<arrow::Date32Type>::TypeClass>::value, arrow::Status>::type arrow::ArrayPrinter::WriteDataValues<arrow::NumericArray<arrow::Date32Type> >(arrow::NumericArray<arrow::Date32Type> const&)::{lambda(long)#1}::operator()(long) const (this=0x7fffffffb610, i=0) at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/pretty_print.cc:170
#13 0x00007ffff535395b in arrow::ArrayPrinter::WriteValues<std::enable_if<std::is_base_of<arrow::DateType, arrow::NumericArray<arrow::Date32Type>::TypeClass>::value, arrow::Status>::type arrow::ArrayPrinter::WriteDataValues<arrow::NumericArray<arrow::Date32Type> >(arrow::NumericArray<arrow::Date32Type> const&)::{lambda(long)#1}>(arrow::Array const&, std::enable_if<std::is_base_of<arrow::DateType, arrow::NumericArray<arrow::Date32Type>::TypeClass>::value, arrow::Status>::type arrow::ArrayPrinter::WriteDataValues<arrow::NumericArray<arrow::Date32Type> >(arrow::NumericArray<arrow::Date32Type> const&)::{lambda(long)#1}&&) (this=0x7fffffffb610, array=..., func=...)
    at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/pretty_print.cc:137
#14 0x00007ffff5350dd5 in arrow::ArrayPrinter::WriteDataValues<arrow::NumericArray<arrow::Date32Type> > (this=0x7fffffffb610, array=...) at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/pretty_print.cc:170
#15 0x00007ffff534ee4f in arrow::ArrayPrinter::Visit<arrow::NumericArray<arrow::Date32Type> > (this=0x7fffffffb610, array=...) at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/pretty_print.cc:314
#16 0x00007ffff534ccd1 in arrow::VisitArrayInline<arrow::ArrayPrinter> (array=..., visitor=0x7fffffffb610) at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/visitor_inline.h:126
#17 0x00007ffff534b352 in arrow::ArrayPrinter::Print (this=0x7fffffffb610, array=...) at /home/lidavidm/Code/upstream/arrow-12011/cpp/src/arrow/pretty_print.cc:389
 

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
There is a related issue for validating temporal data, so perhaps these out-of-bounds values should be rejected as suggested.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:

I'm not sure if we should disallow these values entirely, since the format (as far as I can see) says nothing about the range of valid values, and the underlying value is valid, if extreme - but at least you'd expect it to not crash when printing.

I agree that if the format doesn't mention anything about a valid range, we should allow that but of course ensure this doesn't crash when printing.

There is a related issue for validating temporal data, so perhaps these out-of-bounds values should be rejected as suggested.

I think that might be about the fact that for date64 type, the values should be multiples of 86400000. Are there other inherent restrictions for any of the other temporal types?

The date library that we use (vendor) for this formatting is https://github.com/HowardHinnant/date, it might possibly also be worth updating our vendored version.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
Ah, good point about the linked issue. I don't see any other restrictions.

It's worth seeing if an updated date helps, though I saw issues like HowardHinnant/date#436 suggesting that overflow is really the user's responsibility. (It is a little odd in that while the documentation states year::ok is always true, that's not actually the case as implemented.)

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 10988
#10988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants