Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Nov 10, 2025

What changes were proposed in this pull request?

Add aggTime metrics for SortAggregateExec

Why are the changes needed?

Add more metrics

Does this PR introduce any user-facing change?

Yes the SQL metrics "time in aggregation build" itself on Spark UI.

How was this patch tested?

UT

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 10, 2025
@AngersZhuuuu AngersZhuuuu changed the title SPARK-54272 Add aggTime for SortAggregateExec [SPARK-54272][SQL] Add aggTime for SortAggregateExec Nov 10, 2025
@AngersZhuuuu
Copy link
Contributor Author

@HyukjinKwon Could you help review metrics related.

@HyukjinKwon
Copy link
Member

cc @cloud-fan

outputIter
}
}
aggTime += NANOSECONDS.toMillis(System.nanoTime() - beforeAgg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the right level to trace the agg time? I think the iterator is lazy, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also find. it's a little strange...If so HashAggregateExec also incorrect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the difference is the TungstenAggregationIterator is not as lazy -- during it's init step it does the aggregation and whereas sortbasedaggregationiterator does the compute mostly inside of next

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, missing TungstenAggregationIterator will call processInputs during construction. So how about my current change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @cloud-fan @holdenk @dongjoon-hyun Could you take a look

@holdenk
Copy link
Contributor

holdenk commented Nov 27, 2025

This looks reasonable to me, I'd love @cloud-fan to sign-off though :)

test("SortAggregate metrics") {
// Force use SortAggregateExec instead of HashAggregateExec
withSQLConf("spark.sql.test.forceApplySortAggregate" -> "true",
SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: other tests in this suite do not turn off whole stage codegen, why it's necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SortAggregateExec not support codegen then write this, remove it is ok too. remove this line.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in a8482ad Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants