Skip to content

DataFusion benchmarks should show executed plan with metrics after query completes #396

@andygrove

Description

@andygrove

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to be able to see metrics for a query plan after it is executed in the benchmarks. This is a convenient way to see where performance bottlenecks are in a query.

The following example shows metrics for SortExec but this is the only operator that we have implemented metrics for so far.

SortExec: [revenue DESC] metrics=[sortTime=56686,outputRows=5]
  MergeExec metrics=[]
    ProjectionExec: expr=[n_name, SUM(l_extendedprice Multiply Int64(1) Minus l_discount) as revenue] metrics=[]

Describe the solution you'd like

To produce the above example, I simply hacked the existing IndentVisitor as shown below, but this is not a good solution. It wasn't immediately clear to me how I could implement this to fit with the current design. Should there be a MetricsVisitor that we can somehow combine with the IndentVisitor? I also looked at adding a new variant to the DisplayFormatType variant but that required code changes in specific operators, so that didn't seem ideal.

fn pre_visit(
    &mut self,
    plan: &dyn ExecutionPlan,
) -> std::result::Result<bool, Self::Error> {
    write!(self.f, "{:indent$}", "", indent = self.indent * 2)?;
    plan.fmt_as(self.t, self.f)?;
    // BEGIN METRICS HACK
    let metrics_str = plan.metrics().iter()
        .map(|(k, v)| format!("{}={}", k, v.value()))
        .collect::<Vec<String>>();
    write!(self.f, " metrics=[{}]", metrics_str.join(","))?;
    // END METRICS HACK
    writeln!(self.f)?;
    self.indent += 1;
    Ok(true)
}

Describe alternatives you've considered
None

Additional context
None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions