fix: Avoid panicing when stats are not available for a file group split#23277
fix: Avoid panicing when stats are not available for a file group split#23277mkleen wants to merge 6 commits into
Conversation
f8c0e85 to
cba5503
Compare
56c9cef to
3490ba9
Compare
notfilippo
left a comment
There was a problem hiding this comment.
Thanks! We are seeing this issue as well in our DF54 upgrade. The fix matches our patch downstream.
3490ba9 to
32f88d6
Compare
|
@comphead @geoffreyclaude @notfilippo Thanks for the review! I added all the feedback from you but i am having troubles to reproduce it in a plain slt. I will give it another try. The problem is the data creation. |
| { | ||
| Ok((partition_value.clone(), partition_value.clone())) | ||
| } else { | ||
| Err(plan_datafusion_err!("statistics not found")) |
There was a problem hiding this comment.
should this be an internal error (rather than a planning error)?
Also it might be nice to print out the requested statistics value (partition_value( and the toal number of columns in the error message to help debug issues
Err(plan_datafusion_err!("statistics not found for partition {partition_value}, expected at most {}", s.column_statistics.len() - 1))
Added as candidate for backport to 54.1.0 @notfilippo do you have other issues you have seen during upgrade that you think we should backport? |
@alamb thanks for the question! As of now nothing else surfaced. If something else comes up I'll update the backport issue thread or I'll create a new issue. |
e576273 to
6083050
Compare
6083050 to
279342b
Compare
Which issue does this PR close?
Rationale for this change
The query from the issue:
panics:
The underlying issue is that the current code panics when files are split by statistics and there are no statistics available for the column where the sort order is defined in this case
computed_bucket.What changes are included in this PR?
MinMaxStatisticsto check if there are stats available for a given columnAre these changes tested?
Yes
Are there any user-facing changes?
No