Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11052: [Rust] [DataFusion] Implement metrics for HashJoinExec #9035

Closed
wants to merge 2 commits into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Dec 28, 2020

Implement metrics for HashJoinExec to help debug performance issues.

Example output:

[2020-12-28T20:46:28Z DEBUG datafusion::physical_plan::hash_join] Built build-side of hash join containing 3115341 rows in 9926 ms
[2020-12-28T20:46:36Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 388922 rows in 7994 ms
[2020-12-28T20:46:36Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 389294 rows in 7986 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 389558 rows in 8326 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 388841 rows in 8499 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 390056 rows in 8643 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18749999 rows and produced 144 output batches containing 389143 rows in 8711 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 389605 rows in 8712 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 389922 rows in 8877 ms

@andygrove
Copy link
Member Author

@jorgecarleitao fyi

@github-actions
Copy link

}
other => {
debug!(
"Processed {} probe-side input batches containing {} rows and \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make this a bit less verbose? Processed {} rows probe-side and {} output rows in {} ms or something like that?

@@ -186,8 +190,18 @@ impl ExecutionPlan for HashJoinExec {
})
.await?;

let num_rows: usize =
left_data.1.iter().map(|batch| batch.num_rows()).sum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to put this behind a flag so we don't do this in the macro so we don't compute the num_rows when not being logged?

self.num_input_rows += batch.num_rows();
match result {
Ok(ref batch) => {
self.join_time += start.elapsed().as_millis() as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid rounding errors by using start.elapsed().as_secs_f64() store join_time var as f64 and format with 0 digits afterwards.

@andygrove
Copy link
Member Author

I will rebase this and address feedback once #9048 is merged

@alamb
Copy link
Contributor

alamb commented Dec 31, 2020

The full set of Rust CI tests did not run on this PR :(

Can you please rebase this PR against apache/master to pick up the changes in #9056 so that they do?

I apologize for the inconvenience.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @andygrove

@jorgecarleitao
Copy link
Member

@Dandandan , Thanks a lot for the review. I am approving because even though I consider most of your comments relevant, I do not consider them blocking the merge.

For example, summing the number of rows of all batches is O(number_of_batches) and negligible when compared to the time it takes to execute the join, thus being too minor to block this.

@Dandandan
Copy link
Contributor

I agree, feel free to address the remaining feedback or to merge it @andygrove👍

@codecov-io
Copy link

codecov-io commented Jan 2, 2021

Codecov Report

Merging #9035 (36b0ebd) into master (4b7cdcb) will decrease coverage by 0.02%.
The diff coverage is 55.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9035      +/-   ##
==========================================
- Coverage   82.61%   82.59%   -0.03%     
==========================================
  Files         203      204       +1     
  Lines       50140    50169      +29     
==========================================
+ Hits        41422    41435      +13     
- Misses       8718     8734      +16     
Impacted Files Coverage Δ
rust/datafusion/src/physical_plan/hash_join.rs 86.32% <55.55%> (-3.21%) ⬇️
rust/arrow/src/csv/writer.rs 78.82% <0.00%> (-0.56%) ⬇️
rust/parquet/src/encodings/encoding.rs 95.24% <0.00%> (-0.20%) ⬇️
rust/arrow/src/util/serialization.rs 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b7cdcb...36b0ebd. Read the comment docs.

@andygrove andygrove closed this in fc43489 Jan 2, 2021
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Implement metrics for `HashJoinExec` to help debug performance issues.

Example output:

```
[2020-12-28T20:46:28Z DEBUG datafusion::physical_plan::hash_join] Built build-side of hash join containing 3115341 rows in 9926 ms
[2020-12-28T20:46:36Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 388922 rows in 7994 ms
[2020-12-28T20:46:36Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 389294 rows in 7986 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 389558 rows in 8326 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 388841 rows in 8499 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 390056 rows in 8643 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18749999 rows and produced 144 output batches containing 389143 rows in 8711 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 389605 rows in 8712 ms
[2020-12-28T20:46:37Z DEBUG datafusion::physical_plan::hash_join] Processed 144 probe-side input batches containing 18750000 rows and produced 144 output batches containing 389922 rows in 8877 ms
```

Closes apache#9035 from andygrove/join-metrics

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants