Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented May 29, 2024

Which issue does this PR close?

re #10626

Rationale for this change

I wanted to prototype another approach that @xinlifoobar and I were discussing on #10711 and the best way to do this is via code

What changes are included in this PR?

  1. Add iterators that convert iterators of ParquetStatistics into iterators of ValueStatistics<T>
  2. Update a few conversion routines to use the new iterators

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label May 29, 2024
/// * `$iterator_type` is the name of the iterator type (e.g. `BoolStatsIterator`)
/// * `$parquet_statistics_type` is the type of the statistics (e.g. `ParquetStatistics::Boolean`)
/// * `$stat_value_type` is the type of the statistics value (e.g. `bool`)
macro_rules! make_stats_iterator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an idea to create structs like BoolStatsIterator which will return an iterator of Option<ValueStatstics<bool>> and do the validitiy / type checking

let scalars = iterator
.map(|x| x.and_then(|s| get_statistic!(s, min, min_bytes, Some(data_type))));
collect_scalars(data_type, scalars)
match data_type {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of how they work -- you make BoolStatsIterator::new(iterator) which then returns a sequence of Option<&bool> and then the min/max can be extracted out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant