-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(parquet): Implement scan_efficiency_ratio metric for parquet reading
#18577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
80e79ea
212b124
9819401
9375ad8
d4231f8
62935c8
25ae3c1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,7 @@ | |
| use std::{borrow::Cow, sync::Arc}; | ||
|
|
||
| use crate::metrics::{ | ||
| value::{PruningMetrics, RatioMetrics}, | ||
| value::{PruningMetrics, RatioMergeStrategy, RatioMetrics}, | ||
| MetricType, | ||
| }; | ||
|
|
||
|
|
@@ -275,7 +275,17 @@ impl<'a> MetricBuilder<'a> { | |
| name: impl Into<Cow<'static, str>>, | ||
| partition: usize, | ||
| ) -> RatioMetrics { | ||
| let ratio_metrics = RatioMetrics::new(); | ||
| self.ratio_metrics_with_strategy(name, partition, RatioMergeStrategy::default()) | ||
| } | ||
|
|
||
| /// Consumes self and creates a new [`RatioMetrics`] with a specific merge strategy | ||
| pub fn ratio_metrics_with_strategy( | ||
| self, | ||
| name: impl Into<Cow<'static, str>>, | ||
| partition: usize, | ||
| merge_strategy: RatioMergeStrategy, | ||
| ) -> RatioMetrics { | ||
| let ratio_metrics = RatioMetrics::new().with_merge_strategy(merge_strategy); | ||
| self.with_partition(partition).build(MetricValue::Ratio { | ||
| name: name.into(), | ||
| ratio_metrics: ratio_metrics.clone(), | ||
|
Comment on lines
+282
to
291
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Originally, I tried but found the merge strategy wasn't working since the |
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure, would this technically be considered a breaking change? This struct is only constructed manually throughout the crate (no constructor).
ParquetFileMetricsis in thisParquetFileReaderstruct, butParquetFileReaderdidn't have a way to access the total file size, since thefile_sizefield inParquetObjectReaderwas private (and inside ofarrow-rs. This is the change that made the most sense to me, sinceCachedParquetFileReaderalready has this as a field.datafusion/datafusion/datasource-parquet/src/reader.rs
Lines 225 to 229 in f162fd3
Alternatively, maybe I could add a
file_size()getter to ParquetObjectReader inarrow-rs?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but here I think a breaking change is acceptable