Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support max on single-level struct in aggregation context #4434

Merged

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Dec 28, 2021

[FEA] Support max on single-level struct in aggregation context #3541

This fixes #3541

Signed-off-by: Chong Gao res_life@163.com

Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

build

Copy link
Collaborator

@firestarman firestarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me, but a NIT.

@@ -1695,3 +1695,22 @@ def test_groupby_std_variance_partial_replace_fallback(data_gen,
exist_classes=','.join(exist_clz),
non_exist_classes=','.join(non_exist_clz),
conf=local_conf)

@ignore_order
@pytest.mark.parametrize('data_type', all_gen + [NullGen()], ids=idfn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize('data_type', all_gen + [NullGen()], ids=idfn)
@pytest.mark.parametrize('data_gen', all_gen + [NullGen()], ids=idfn)

@sameerz sameerz added the feature request New feature or request label Dec 29, 2021
Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

res-life commented Jan 4, 2022

build

1 similar comment
@res-life
Copy link
Collaborator Author

res-life commented Jan 4, 2022

build

@@ -15227,7 +15227,7 @@ are limited.
<td><b>NS</b></td>
<td><b>NS</b></td>
<td> </td>
<td><b>NS</b></td>
<td><em>PS<br/>UTC is only supported TZ for child TIMESTAMP;<br/>unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT</em></td>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says that it works for window operations too, but I see no tests for window operations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed the window support.

('a', StructGen([
('aa', data_gen),
('ab', data_gen)])),
('b', IntegerGen())]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it likely that b will have repeated data in it? That way the groupings are likely to have more than one thing in them to compare?

('b', RepeatSeqGen(IntegerGen(), length=20))]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, updated.

Copy link
Collaborator Author

@res-life res-life Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suported both Min and Max.
Removed the window support.
Updated test case.
Seems CUDF has a bug: rapidsai/cudf#8974 (comment). After confired, I'll file an issue.

@res-life
Copy link
Collaborator Author

res-life commented Jan 6, 2022

build

@res-life
Copy link
Collaborator Author

res-life commented Jan 7, 2022

Suggest wait, @ttnghia is working on the fix of rapidsai/cudf#8974 (comment)

Signed-off-by: Chong Gao <res_life@163.com>
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Jan 13, 2022
…tion/groupby (#10026)

This is another fix for NVIDIA/spark-rapids#4434, when the null order is wrongly handled if the input structs column does not have nulls at the top level but only has null at the children levels.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - MithunR (https://github.com/mythrocks)

URL: #10026
@sameerz sameerz added this to the Jan 10 - Jan 28 milestone Jan 13, 2022
Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

@firestarman @revans2 help to review, is not blocked now.

@res-life res-life merged commit a63b483 into NVIDIA:branch-22.02 Jan 15, 2022
@res-life res-life deleted the support-max-single-level-struct branch January 15, 2022 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
No open projects
Release 22.02
Awaiting triage
Development

Successfully merging this pull request may close these issues.

[FEA] Support max on single-level struct in aggregation context
4 participants