Skip to content

Refactor min/max value update in Parquet statistics#9120

Merged
alamb merged 2 commits intoapache:mainfrom
Weijun-H:refactor-stat
Feb 5, 2024
Merged

Refactor min/max value update in Parquet statistics#9120
alamb merged 2 commits intoapache:mainfrom
Weijun-H:refactor-stat

Conversation

@Weijun-H
Copy link
Copy Markdown
Member

@Weijun-H Weijun-H commented Feb 4, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added the core Core DataFusion crate label Feb 4, 2024
Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring 👍

Left some tips to see if it's possible to simplify it even further

{
if let Some(max_value) = &mut max_values[i] {
max_value
.update_batch(&[Arc::new(Float64Array::from(vec![Some(*s.max())]))])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.update_batch(&[Arc::new(Float64Array::from(vec![Some(*s.max())]))])
.update_batch(&[Arc::new(Float64Array::from(vec![*s.max()]))])

I think this also works? Goes for the other branches too

return;
}
ParquetStatistics::Boolean(s)
if DataType::Boolean == *fields[i].data_type() && s.has_min_max_set() =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to pull out the s.has_min_max_set() check from the branches to a once off check at the start of the function? I think ParquetStatistics has an equivalent has_min_max_set() 🤔

i: usize,
stat: &ParquetStatistics,
) {
if !stat.has_min_max_set() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@alamb alamb merged commit 35f481d into apache:main Feb 5, 2024
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 5, 2024

That is a really nice cleanup -- thank you @Weijun-H and @Jefffrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants