Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Use partial companion functions for distinct aggregation #4112

Merged
merged 3 commits into from
Dec 22, 2023

Conversation

rui-mo
Copy link
Contributor

@rui-mo rui-mo commented Dec 19, 2023

What changes were proposed in this pull request?

Flush of intermediate aggregation needs to be disabled if it is used for distinct computing. But disabling flushing is tricky because doing so will require enabling spilling. In this PR, we use partial companion functions instead to make sure that final aggregation returns "intermediate" results.

oap-project/velox#465

How was this patch tested?

Verfied on Jenkins.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@rui-mo rui-mo changed the title [VL] Fix distinct aggregation [VL] Use partial companion functions for distinct aggregation Dec 19, 2023
Comment on lines +177 to +184
super.modeToKeyWord(if (mixedPartialAndMerge) {
Partial
} else {
aggregateMode match {
case PartialMerge => Final
case _ => aggregateMode
}
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Final for all aggregate modes?

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rui-mo rui-mo merged commit ed84367 into apache:main Dec 22, 2023
10 of 15 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4112_time.csv log/native_master_12_21_2023_a24ddedc4_time.csv difference percentage
q1 33.45 33.45 0.003 100.01%
q2 25.09 25.89 0.798 103.18%
q3 38.68 38.17 -0.519 98.66%
q4 39.23 40.48 1.248 103.18%
q5 72.30 70.31 -1.987 97.25%
q6 7.11 7.04 -0.069 99.02%
q7 83.98 87.08 3.091 103.68%
q8 86.54 86.10 -0.436 99.50%
q9 127.81 126.84 -0.974 99.24%
q10 45.89 45.01 -0.884 98.07%
q11 20.47 19.76 -0.709 96.54%
q12 26.44 25.94 -0.497 98.12%
q13 45.87 45.62 -0.251 99.45%
q14 17.81 17.88 0.066 100.37%
q15 28.90 28.77 -0.125 99.57%
q16 15.34 15.88 0.532 103.47%
q17 104.17 102.51 -1.661 98.41%
q18 150.66 150.60 -0.051 99.97%
q19 13.90 13.64 -0.260 98.13%
q20 28.21 29.17 0.957 103.39%
q21 229.21 228.00 -1.218 99.47%
q22 14.20 14.05 -0.141 99.00%
total 1255.26 1252.17 -3.088 99.75%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants