-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-7350: [Python] Decode parquet statistics as scalars #12902
ARROW-7350: [Python] Decode parquet statistics as scalars #12902
Conversation
Benchmark runs are scheduled for baseline = 5b2c0a0 and contender = aa641d5. aa641d5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
And thanks for the PR @wjones127 ! Unfortunately it seems this one caused some failures in one of the nightly integration builds: https://github.com/ursacomputing/crossbow/runs/6071471966?check_suite_focus=true From a quick look at the error message, this might be because we now return a datetime.date object instead of integer for date type? |
Hmm, it sure seems like it. Should we revert this? Or maybe we expose the 'old' value under |
We already have It's certainly also an option to see this as a "bug fix" (we shouldn't have used integers for the date type before) and ask kartothek to update their code. It's also for the lesser used date type (not for timestamp), and quite likely the behaviour here now also might have changes for other less used types that were not explicitly handled before (eg I don't how "interval" type would be handled?) |
Hmm, fair. In that case we should test all the data types then just to make the behavior explicit. |
FWIW I considered it a bug that date statistics returned an integer. That's why I added tests for dates and decimals. Would you like me to add tests for all other types in a new PR? |
Ah, I missed that you added @wjones127 would you want to open an issue with kartothek to warn them about the upcoming change? |
Thanks for opening that issue! On the short term, we should also fix our nightly builds (either temporarily disabling them altogether, or ideally on skipping those failing tests). Opened https://issues.apache.org/jira/browse/ARROW-16262 for tracking that |
…gration As discussed on #12902 (comment) there is an issue on the Kartothek integration JDASoftwareGroup/kartothek#515 This PR aims to skip the failing tests. Closes #12947 from raulcd/ARROW-16262 Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Thanks to Joris for pointing out we had this function in C++.