-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54272][SQL] Add aggTime for SortAggregateExec #52968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@HyukjinKwon Could you help review metrics related. |
|
cc @cloud-fan |
| outputIter | ||
| } | ||
| } | ||
| aggTime += NANOSECONDS.toMillis(System.nanoTime() - beforeAgg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the right level to trace the agg time? I think the iterator is lazy, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also find. it's a little strange...If so HashAggregateExec also incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the difference is the TungstenAggregationIterator is not as lazy -- during it's init step it does the aggregation and whereas sortbasedaggregationiterator does the compute mostly inside of next
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, missing TungstenAggregationIterator will call processInputs during construction. So how about my current change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @cloud-fan @holdenk @dongjoon-hyun Could you take a look
|
This looks reasonable to me, I'd love @cloud-fan to sign-off though :) |
| test("SortAggregate metrics") { | ||
| // Force use SortAggregateExec instead of HashAggregateExec | ||
| withSQLConf("spark.sql.test.forceApplySortAggregate" -> "true", | ||
| SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: other tests in this suite do not turn off whole stage codegen, why it's necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since SortAggregateExec not support codegen then write this, remove it is ok too. remove this line.
|
thanks, merging to master! |
What changes were proposed in this pull request?
Add
aggTimemetrics forSortAggregateExecWhy are the changes needed?
Add more metrics
Does this PR introduce any user-facing change?
Yes the SQL metrics "time in aggregation build" itself on Spark UI.
How was this patch tested?
UT
Was this patch authored or co-authored using generative AI tooling?
No