Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12011: [C++] Fix crashes and incorrect results when printing extreme date values #10988

Closed
wants to merge 4 commits into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Aug 24, 2021

The arrow_vendored::date library represents year numbers as a C short
and may silently wraparound its value
(but also throw an exception if the year has the value -32768).

@github-actions
Copy link

@pitrou
Copy link
Member Author

pitrou commented Aug 30, 2021

@jorisvandenbossche Would you like to review some C++ code?

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, otherwise this looks good

[&](int64_t i) { FormatDateTime(type->unit(), "%T", data[i], false); });
WriteValues(array, [&](int64_t i) {
const int32_t v = data[i];
if (v >= -12687428 && v <= 11248737) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please extract named constants like

Suggested change
if (v >= -12687428 && v <= 11248737) {
if (v >= kMinimumDay && v <= kMaximumDay) {

(*sink_) << arrow_vendored::date::format(fmt, Unit{value});
try {
if (add_epoch) {
(*sink_) << arrow_vendored::date::format(fmt, epoch_ + Unit{value});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range check could possibly even just be here:

Suggested change
(*sink_) << arrow_vendored::date::format(fmt, epoch_ + Unit{value});
constexpr Unit kMin = duration_cast<Unit>(kMinDay);
constexpr Unit kMax = duration_cast<Unit>(kMaxDay);
if (Unit{value} >= kMin && Unit{value} <= kMax) {
(*sink_) << arrow_vendored::date::format(fmt, epoch_ + Unit{value});
} else {
WriteOutOfRange(value);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... nice idea, but it unfortunately overflows for nanoseconds.

/home/antoine/arrow/dev/cpp/src/arrow/pretty_print.cc: In instantiation of 'void arrow::{anonymous}::ArrayPrinter::FormatDateTime(const char*, int64_t, bool) [with Unit = std::chrono::duration<long int, std::ratio<1, 1000000000> >; int64_t = long int]':
/home/antoine/arrow/dev/cpp/src/arrow/pretty_print.cc:505:71:   required from here
/home/antoine/arrow/dev/cpp/src/arrow/pretty_print.cc:483:59:   in 'constexpr' expansion of 'std::chrono::duration_cast<std::chrono::duration<long int, std::ratio<1, 1000000000> >, int, std::ratio<86400, 1> >(std::chrono::duration<int, std::ratio<86400, 1> >(-12687428))'
/usr/include/c++/9/chrono:200:21:   in 'constexpr' expansion of 'std::chrono::__duration_cast_impl<std::chrono::duration<long int, std::ratio<1, 1000000000> >, std::ratio<86400000000000, 1>, long int, false, true>::__cast<int, std::ratio<86400, 1> >((* & __d))'
/home/antoine/arrow/dev/cpp/src/arrow/pretty_print.cc:483:20: error: overflow in constant expression [-fpermissive]
  483 |     constexpr Unit kMin = std::chrono::duration_cast<Unit>(arrow_vendored::date::days{-12687428});
      |                    ^~~~

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it works by templating a bit differently.

@pitrou pitrou force-pushed the ARROW-12011-date-formatting branch from 4e6020b to 3917ed1 Compare August 31, 2021 09:11
@jorisvandenbossche
Copy link
Member

I didn't yet look at the code, but one note: the strftime compute kernel has the same problem (at least for timestamp type, since it doesn't yet support date32/64 formatting. Checking the limits encoded in this PR for timestamp, using one out of range crashes with strftime).

That can certainly be for another JIRA, but wondering a bit if it might be possible to share some of those checks (the printing should also work if compute is not available I suppose, so cannot directly rely on it)

@pitrou
Copy link
Member Author

pitrou commented Aug 31, 2021

…treme date values

The `arrow_vendored::date` library represents year numbers as a C short
and may silently wraparound its value
(but also throw an exception if the year has the value -32768).
@pitrou pitrou force-pushed the ARROW-12011-date-formatting branch from f7b8f97 to 4c6b981 Compare August 31, 2021 17:03
@pitrou
Copy link
Member Author

pitrou commented Sep 1, 2021

I will merge this, feel free to post any comments afterwards, though.

@pitrou pitrou closed this in d5e534a Sep 1, 2021
@pitrou pitrou deleted the ARROW-12011-date-formatting branch September 1, 2021 14:40
zeroshade pushed a commit to zeroshade/arrow that referenced this pull request Sep 12, 2021
…treme date values

The `arrow_vendored::date` library represents year numbers as a C short
and may silently wraparound its value
(but also throw an exception if the year has the value -32768).

Closes apache#10988 from pitrou/ARROW-12011-date-formatting

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
…treme date values

The `arrow_vendored::date` library represents year numbers as a C short
and may silently wraparound its value
(but also throw an exception if the year has the value -32768).

Closes apache#10988 from pitrou/ARROW-12011-date-formatting

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants