Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28347: Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled. #5323

Merged
merged 6 commits into from
Jul 22, 2024

Conversation

JeongDaeKim
Copy link
Contributor

@JeongDaeKim JeongDaeKim commented Jun 26, 2024

What changes were proposed in this pull request?

UDAFs, collect_set and collect_list, won't work with complex types, if map-side aggregation is disabled. This PR will fix this bug.

Why are the changes needed?

When users disable the map-side aggregation feature("hive.map.aggr), collect_set doesn't work with complex types, such as struct.

Does this PR introduce any user-facing change?

No.

Is the change a dependency upgrade?

No.

How was this patch tested?

Tested with qtest (TestMiniTezCliDriver)

@JeongDaeKim JeongDaeKim changed the title HIVE-28347 Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled. HIVE-28347: Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled. Jun 26, 2024
//no map aggregation.
inputOI = parameters[0];
return ObjectInspectorFactory.getStandardListObjectInspector(
ObjectInspectorUtils.getStandardObjectInspector(inputOI));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this case with PARTIAL1?

if (m == Mode.PARTIAL1 || m == Mode.COMPLETE) {
  ...

@okumin
Copy link
Contributor

okumin commented Jun 26, 2024

@JeongDaeKim Taking a glance, your finding and fix seems to be very reasonable. Could you please regenerate the result file of udaf_collect_set_2.q?

@JeongDaeKim
Copy link
Contributor Author

@okumin Thank you for a quick review! 👍

Copy link
Contributor

@okumin okumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done three things.
First, I checked that the evaluator is likely correctly implemented now.
Second, I checked that we have new regrettion tests. Nice work.
Third, I ran the original udaf_collect_set_2.q with set hive.map.aggr = true or false. Both generated the same result set.

Then, the changes here look to me. Note that you also need +1 from a committer.

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@deniskuzZ
Copy link
Member

thanks @JeongDaeKim! I'll merge it with the next green build

Copy link

sonarcloud bot commented Jul 22, 2024

@deniskuzZ deniskuzZ merged commit 1969eda into apache:master Jul 22, 2024
5 checks passed
@JeongDaeKim JeongDaeKim deleted the HIVE-28347-4.0 branch July 22, 2024 22:07
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Aug 7, 2024
…hen map-side aggregation is disabled (Jeongdae Kim, reviewed by Shohei Okumiya, Denys Kuzmenko)

Closes apache#5323
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants