-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract Date32
parquet statistics as Date32Array
rather than Int32Array
#10593
Conversation
} | ||
Some(DataType::Date64) => { | ||
Some(ScalarValue::Date64(Some(i64::from(*s.$func()) * 24 * 60 * 60 * 1000))) | ||
} | ||
_ => Some(ScalarValue::Int32(Some(*s.$func()))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the root cause of the bug, is it something that needs to be addressed in a different PR @alamb ? "catch-all" pattern matching branches are evil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is likely what is masking all the bugs
I am not sure if it is worth explicitly listing all the types out here to be honest 🤔 I am hoping in some future PR to revamp how these statistics are done entirely (basically match on the arrow target type in the outer loop rather than the inner loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a PMC or a full-time committer, my review is not very valuable, but looks good to me.
Added a comment to the PR outside the review to track whether we are ok with a catch-all pattern matching statement or we want to track somewhere to make the patterns explicit
Your review is valuable in my opinion -- thank you @edmondop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @xinlifoobar -- this looks really nice 👌 I had only one question related to Date64 but otherwise this PR looks good to me
} | ||
Some(DataType::Date64) => { | ||
Some(ScalarValue::Date64(Some(i64::from(*s.$func()) * 24 * 60 * 60 * 1000))) | ||
} | ||
_ => Some(ScalarValue::Int32(Some(*s.$func()))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is likely what is masking all the bugs
I am not sure if it is worth explicitly listing all the types out here to be honest 🤔 I am hoping in some future PR to revamp how these statistics are done entirely (basically match on the arrow target type in the outer loop rather than the inner loop)
Date32Array
but returns Int32ArrayDate32
parquet statistics as Date32Array
rather than Int32Array
This file had some merge conflicts so I took the liberty of merging up from main and addressing the comment #10593 (comment) in 43cf435 Thank you again so much @xinlifoobar -- very much appreciated |
Thanks agian @xinlifoobar |
…32Array` (apache#10593) * Fixes bug expect `Date32Array` but returns Int32Array * Add round trip ut * Update arrow_statistics.rs * remove unreachable code --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…32Array` (apache#10593) * Fixes bug expect `Date32Array` but returns Int32Array * Add round trip ut * Update arrow_statistics.rs * remove unreachable code --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Closes #10587
Rationale for this change
This is to fix a bug when reading a Date32 or Date64 column from a parquet file, DataFusion currently returns an Int32 array
What changes are included in this PR?
Adds conversions in the
get_statistic
marco from Int32 to Date32 and Date64 respectively.Are these changes tested?
Yes
Are there any user-facing changes?