Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented May 23, 2021

Which issue does this PR close?

Adds metrics to RepartitionExec. Example output (with local hack to display metrics in query plan):

RepartitionExec: partitioning=Hash([Column { name: "o_custkey" }], 24) metrics=[repartitionTime=19390949,fetchTime=1354115582,sendTime=1642636]
  CoalesceBatchesExec: target_batch_size=4096 metrics=[]
    FilterExec: o_orderdate >= CAST(1994-01-01 AS Date32) AND o_orderdate < CAST(1995-01-01 AS Date32) metrics=[]
      RepartitionExec: partitioning=RoundRobinBatch(24) metrics=[fetchTime=56873223,sendTime=125282,repartitionTime=0]
        ParquetExec: batch_size=8192, limit=None, partitions=[/mnt/tpch/parquet-sf1//orders/part-0.parquet] metrics=[]

Closes #397 .

Rationale for this change

Help debug performance issues in queries.

What changes are included in this PR?

Adds metrics to RepartitionExec.

Are there any user-facing changes?

No. The metrics are not shown by default.

@andygrove andygrove requested a review from Dandandan May 23, 2021 15:53
Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think the time calculation for round robin repartition is missing.

@andygrove
Copy link
Member Author

Looks good. I think the time calculation for round robin repartition is missing.

The new metrics don't include the time for sending the resulting batches to the channels, so the only thing to measure for round-robin would be the time to execute let output_partition = counter % num_output_partitions so I figured that was not worth measuring.

I am now wondering if we should also measure time to send the results to the channel because if this is high it could indicate that upstream operators are not fetching data as fast as they could be. I will take a look at that next.

@codecov-commenter
Copy link

codecov-commenter commented May 23, 2021

Codecov Report

Merging #398 (affa192) into master (174226c) will decrease coverage by 0.00%.
The diff coverage is 70.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #398      +/-   ##
==========================================
- Coverage   74.94%   74.94%   -0.01%     
==========================================
  Files         146      146              
  Lines       24314    24344      +30     
==========================================
+ Hits        18223    18244      +21     
- Misses       6091     6100       +9     
Impacted Files Coverage Δ
datafusion/src/physical_plan/repartition.rs 82.45% <70.96%> (-1.89%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 174226c...affa192. Read the comment docs.

@andygrove
Copy link
Member Author

I added the sendTime metric and this includes the cost of round-robin.

@Dandandan
Copy link
Contributor

Looks good. I think the time calculation for round robin repartition is missing.

The new metrics don't include the time for sending the resulting batches to the channels, so the only thing to measure for round-robin would be the time to execute let output_partition = counter % num_output_partitions so I figured that was not worth measuring.

I am now wondering if we should also measure time to send the results to the channel because if this is high it could indicate that upstream operators are not fetching data as fast as they could be. I will take a look at that next.

Thanks, makes sense 👍

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition!

@Dandandan Dandandan merged commit aeed776 into apache:master May 23, 2021
@houqp houqp added api change Changes the API exposed to users of the crate datafusion labels Jul 29, 2021
@andygrove andygrove deleted the repart-metrics branch February 6, 2022 17:42
H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025
…e location (apache#398)

* checkpoint commit

* Introduce BaseSessionContext abstract class

* Introduce abstract methods for CRUD schema operations

* Clean up schema.rs file

* Introduce CRUD methods for table instances

* Add function to drop_table

* Add schema_name to drop_table function

* remove unused parameter in SqlTable new

* Update function to allow for modifying existing tables

* Add functionality for generating SqlTable information from input sources

* Add functionality for generating SqlTable information from input sources

* Adding a utility method to convert arrow type strings to DataType instances

* Add method to DataTypeMap for getting the DataType from an Arrow type string instance

* Adjust pytests

* Add back deprecated int96 parquet datatype
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement SQLMetrics for RepartitionExec

4 participants