ARROW-12432: [Rust] [DataFusion] Add metrics to SortExec #10078

andygrove · 2021-04-17T15:48:19Z

Add outputRows and sortTime metrics to SortExec.

Example output from Ballista:

SortExec { input: ProjectionExec { expr: [(Column { name: "l_shipmode" }, "l_shipmode"), (Column { name: "SUM(CASE WHEN 
  Metrics: sortTime=44444, outputRows=2

github-actions · 2021-04-17T15:48:39Z

https://issues.apache.org/jira/browse/ARROW-12432

andygrove · 2021-04-17T16:07:34Z

@returnString I'd like to hear more about your idea to use atomics here .. would you be interested in creating a follow-up PR to switch over?

codecov-commenter · 2021-04-17T17:14:54Z

Codecov Report

Merging #10078 (bf49dda) into master (7e3deb5) will increase coverage by 0.01%.
The diff coverage is 97.36%.

@@            Coverage Diff             @@
##           master   #10078      +/-   ##
==========================================
+ Coverage   78.92%   78.93%   +0.01%     
==========================================
  Files         286      286              
  Lines       64728    64758      +30     
==========================================
+ Hits        51088    51119      +31     
+ Misses      13640    13639       -1

Impacted Files	Coverage Δ
rust/datafusion/src/physical_plan/sort.rs	`92.19% <96.96%> (+0.66%)`	⬆️
...ust/datafusion/src/physical_plan/hash_aggregate.rs	`84.62% <100.00%> (+0.31%)`	⬆️
rust/datafusion/src/physical_plan/mod.rs	`87.09% <100.00%> (+0.88%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e3deb5...bf49dda. Read the comment docs.

alamb

I think the code looks ok except for the flatbuffer pin

As we develop the counter system I think it is worth considering if we can avoid the runtime string lookups, but I don't see that as a deal breaker.

Also, I agree that if someone (like @returnString ) has time to switch the Counters from using Mutex to using AtomicUsize or something the code will likely look much nicer

rust/arrow/Cargo.toml

alamb · 2021-04-18T10:26:49Z

rust/datafusion/src/physical_plan/mod.rs

 pub enum MetricType {
    /// Simple counter
    Counter,
+    /// Time in nanoseconds


It would probably help to explicitly mention if this was wall clock time, or cpu time . It looks like this PR saves wallclock time for sort

Both are probably interesting counters / metrics to eventually have

alamb · 2021-04-18T10:28:59Z

rust/datafusion/src/physical_plan/sort.rs

-        let result: Vec<RecordBatch> = collect(sort_exec).await?;
+        let result: Vec<RecordBatch> = collect(sort_exec.clone()).await?;
+        assert_eq!(sort_exec.metrics().get("outputRows").unwrap().value, 8);
        assert_eq!(result.len(), 1);


Maybe also assert that the time counter was greater than zero?

alamb · 2021-04-18T10:30:51Z

rust/datafusion/src/physical_plan/sort.rs

+            output_rows: SQLMetric::counter("outputRows"),
+            sort_time_nanos: SQLMetric::time_nanos("sortTime"),


Given Rust's focus on compile time type checking, what would you think about using typed counters rather than String keys?

So make the code look something like:

Suggested change

output_rows: SQLMetric::counter("outputRows"),

sort_time_nanos: SQLMetric::time_nanos("sortTime"),

output_rows: SQLMetric<OutputRows>::new(),

sort_time_nanos: SQLMetric<SortTime>::new(),

Does this imply adding an enum for the metrics? This might limit extensibility for users that want to add custom metrics.

returnString · 2021-04-18T10:41:25Z

I've got a few ideas for follow-up PRs:

mutex => atomics as mentioned
adding higher-level wrappers like an RAII method for recording time spent on actions
seeing how we might integrate these metrics with something like Prometheus

Will try and carve out some time either today or tomorrow to write these up in more depth.

All in all, really excited about getting decent observability 😀

alamb · 2021-04-18T10:51:49Z

Thank you @returnString !

BTW we (well really @jacobmarble) has spent a non trivial amount of time getting prometheus metrics generated in IOx -- from that experience (and the substantial dependency chain it brings) I would suggest DataFusion focus on a self contained way to get the metrics from planning and executing, and then leave it up to users of DataFusion to connect that to Promethus (or whatever other metric provider they want)

Dandandan

Looking really cool!

One thing I am wondering for the sake of keeping statistics for things like adaptive/dynamic query optimization is that maybe we should make some metrics / stats more static? The current "flexibility" by having them inside strings is good if for logging / debugging the metrics, but maybe would be nice if we can re-use them if we start having a dynamic way of changing the plan based on runtime statistics.

andygrove · 2021-04-18T13:05:40Z

Thanks for the feedback @alamb @Dandandan @returnString. This code was enough to help me track down an issue and demonstrate the metrics capability but it would be good to collaborate on a better design for this. We also need a way to accumulate values for these metrics across a distributed query so that we see totals per operator when looking at query plans in the UI. I'll address the smaller points here and merge this since we're about to move to the new repo and will follow up this week with a new issue and design doc for metrics.

Add metrics to SortExec

db18f79

github-actions bot added Component: Rust - DataFusion Component: Rust labels Apr 17, 2021

Improve ergonomics and add test for SortExec

0e375f2

andygrove requested review from alamb and jorgecarleitao and removed request for jorgecarleitao April 17, 2021 16:07

andygrove force-pushed the sortexec-metrics branch from 39106cf to 0e375f2 Compare April 17, 2021 16:31

pin flatbuffers

fb05d02

alamb approved these changes Apr 18, 2021

View reviewed changes

Dandandan approved these changes Apr 18, 2021

View reviewed changes

andygrove added 3 commits April 18, 2021 07:14

Merge remote-tracking branch 'apache/master' into sortexec-metrics

d5dd2bb

removed pinned flatbuffer version

471d07f

Address feedback

bf49dda

andygrove closed this in 9a4ef46 Apr 18, 2021

asfimport mentioned this pull request Apr 18, 2021

[Rust] [DataFusion] Add metrics for SortExec #28221

Closed

		output_rows: SQLMetric::counter("outputRows"),
		sort_time_nanos: SQLMetric::time_nanos("sortTime"),

ARROW-12432: [Rust] [DataFusion] Add metrics to SortExec #10078

ARROW-12432: [Rust] [DataFusion] Add metrics to SortExec #10078

Uh oh!

Conversation

andygrove commented Apr 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 17, 2021

Uh oh!

andygrove commented Apr 17, 2021

Uh oh!

codecov-commenter commented Apr 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb Apr 18, 2021

Choose a reason for hiding this comment

Uh oh!

alamb Apr 18, 2021

Choose a reason for hiding this comment

Uh oh!

alamb Apr 18, 2021

Choose a reason for hiding this comment

Uh oh!

andygrove Apr 18, 2021

Choose a reason for hiding this comment

Uh oh!

returnString commented Apr 18, 2021

Uh oh!

alamb commented Apr 18, 2021

Uh oh!

Dandandan left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Apr 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andygrove commented Apr 17, 2021 •

edited

Loading

codecov-commenter commented Apr 17, 2021 •

edited

Loading