Add metric category filtering for EXPLAIN ANALYZE#21160
Add metric category filtering for EXPLAIN ANALYZE#21160adriangb wants to merge 8 commits intoapache:mainfrom
Conversation
| /// | ||
| /// **Non-deterministic** — varies across runs even on the same hardware. | ||
| Timing, | ||
| } |
There was a problem hiding this comment.
Perhaps we can add another variant for all uncategorized metrics, so we can do
set datafusion.explain.analyze_category = 'rows, bytes, uncategorized' -- Only exclude `Timing` metrics categoryIntroduces `MetricCategory` (Rows, Bytes, Timing) so that EXPLAIN ANALYZE output can be narrowed to only deterministic metrics, which is especially useful in sqllogictest (.slt) files where timing values would otherwise require `<slt:ignore>` markers everywhere. Each `Metric` now optionally declares a category via `MetricBuilder::with_category()`. Well-known builder methods (`output_rows`, `elapsed_compute`, …) set the category automatically; custom counters/gauges default to "always included". A new session config `datafusion.explain.analyze_categories` accepts `all` (default), `none`, or a comma-separated list of `rows`, `bytes`, `timing` to control which categories appear. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a parquet table with multiple row groups and a TopK ORDER BY LIMIT query that triggers DynamicFilter pushdown. This makes the slt examples much more realistic — they show pruning metrics, row group statistics, and the resolved DynamicFilter predicate. Add a 'timing' category example that shows only elapsed_compute and metadata_load_time (with <slt:ignore> since they are non-deterministic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Opt in every custom counter/gauge created by DataFusion's own operators (parquet, file_stream, joins, aggregates, topk, unnest, buffer) so that category filtering works cleanly out of the box. For example `bytes_scanned` → Bytes, `pushdown_rows_pruned` → Rows, `peak_mem_used` → Bytes, `row_replacements` → Rows, etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
90963d4 to
b13a270
Compare
dc72f7a to
5cf7348
Compare
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] | ||
| pub enum ExplainAnalyzeLevel { | ||
| /// Show a compact view containing high-level metrics | ||
| pub enum MetricType { |
There was a problem hiding this comment.
@2010YOUY01 this is now a bigger diff because I refactored MetricType into here. It removes the redundancy with ExplainAnalyzeLevel (they were the same enum). I really wanted to have MetricCategory in this module (or inside datafusion-common so that it's available to this module) so that we can parse the strings directly into the enum variants and avoid carrying around Vec<String>s. I think this also applies to MetricType. Now both code paths are parallel, there is less duplicate code and validation happens earlier.
One thing to confirm is that we are okay with having MetricType and MetricCategory. They sound somewhat redundant but I think are actually orthogonal, unless we can say something like "all timing metrics are Dev" in which case one is a superset of the other. I'd like your input here since you added MetricType
There was a problem hiding this comment.
I agree the new category tags are necessary
2010YOUY01
left a comment
There was a problem hiding this comment.
Thanks, this feature LGTM overall. I have one suggestion in the above comment: consider allowing a metric to have multiple associated categories. Curious to hear your thoughts.
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] | ||
| pub enum ExplainAnalyzeLevel { | ||
| /// Show a compact view containing high-level metrics | ||
| pub enum MetricType { |
There was a problem hiding this comment.
I agree the new category tags are necessary
| /// | ||
| /// See [`MetricCategory`] for details on the determinism properties | ||
| /// of each category. | ||
| pub fn with_category(mut self, category: MetricCategory) -> Self { |
There was a problem hiding this comment.
I'm wondering if some metrics may need multiple categories.
Currently Timing/Bytes/Rows classify metrics by output type, but we could also group them by functional aspects like Pruning or Spilling. That would enable more flexible filtering, e.g.:
set datafusion.explain.analyze_categories = ['Timing', 'Spilling']to show metrics matching both tags.
If this direction is plausible, should we consider storing multiple categories (e.g. Vec<MetricCategory>) instead? To be more flexible for the future change.
There was a problem hiding this comment.
Could we add that as a followup feature? It wouldn't be breaking I think.
There was a problem hiding this comment.
That makes sense, if it won't need breaking changes.
Summary
MetricCategoryenum (Rows,Bytes,Timing) classifying metrics by what they measure and, critically, their determinism: rows/bytes are deterministic given the same plan+data; timing varies across runs.Metriccan now declare its category viaMetricBuilder::with_category(). Well-known builder methods (output_rows,elapsed_compute,output_bytes, etc.) set the category automatically. Custom counters/gauges default to "always included".datafusion.explain.analyze_categoriesacceptsall(default),none, or comma-separatedrows,bytes,timing.analyze_level(summary/dev) which controls verbosity.Motivation
Running
EXPLAIN ANALYZEin.slttests currently requires liberal use of<slt:ignore>for every non-deterministic timing metric. With this change, a test can simply:In particular, for dynamic filters we have relatively complex integration tests that exist mostly to assert the plan shapes and state of the dynamic filters after the plan has been executed. For example #21059. With this change I think most of those can be moved to SLT tests. I've also wanted to e.g. make assertions about pruning effectiveness without having timing information included.
Test plan
explain_analyze_categoriescovering all combos (rows, none, all, rows+bytes).slttests inexplain_analyze.sltforrows,none,rows,bytes, androwswith dev levelexplain_analyzeintegration tests pass (24/24)information_schemaslt updated for new config entrycore_integrationsuite passes (918 tests)🤖 Generated with Claude Code