-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add metrics to RepartitionExec #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think the time calculation for round robin repartition is missing.
The new metrics don't include the time for sending the resulting batches to the channels, so the only thing to measure for round-robin would be the time to execute I am now wondering if we should also measure time to send the results to the channel because if this is high it could indicate that upstream operators are not fetching data as fast as they could be. I will take a look at that next. |
Codecov Report
@@ Coverage Diff @@
## master #398 +/- ##
==========================================
- Coverage 74.94% 74.94% -0.01%
==========================================
Files 146 146
Lines 24314 24344 +30
==========================================
+ Hits 18223 18244 +21
- Misses 6091 6100 +9
Continue to review full report at Codecov.
|
|
I added the |
Thanks, makes sense 👍 |
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition!
…e location (apache#398) * checkpoint commit * Introduce BaseSessionContext abstract class * Introduce abstract methods for CRUD schema operations * Clean up schema.rs file * Introduce CRUD methods for table instances * Add function to drop_table * Add schema_name to drop_table function * remove unused parameter in SqlTable new * Update function to allow for modifying existing tables * Add functionality for generating SqlTable information from input sources * Add functionality for generating SqlTable information from input sources * Adding a utility method to convert arrow type strings to DataType instances * Add method to DataTypeMap for getting the DataType from an Arrow type string instance * Adjust pytests * Add back deprecated int96 parquet datatype
Which issue does this PR close?
Adds metrics to
RepartitionExec. Example output (with local hack to display metrics in query plan):Closes #397 .
Rationale for this change
Help debug performance issues in queries.
What changes are included in this PR?
Adds metrics to
RepartitionExec.Are there any user-facing changes?
No. The metrics are not shown by default.