-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: eliminate cartesian product columns in pivot operator #15975
Conversation
2aebd16
to
cd2056e
Compare
Codecov Report
@@ Coverage Diff @@
## master #15975 +/- ##
==========================================
- Coverage 77.06% 76.83% -0.23%
==========================================
Files 988 988
Lines 52387 52397 +10
Branches 6626 6626
==========================================
- Hits 40370 40259 -111
- Misses 11793 11914 +121
Partials 224 224
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great solution! A few small comments, other than that LGTM!
493457c
to
2dc41f8
Compare
/testenv up |
@junlincc Ephemeral environment spinning up at http://34.219.46.208:8080. Credentials are |
# https://github.com/apache/superset/issues/15956 | ||
# https://github.com/pandas-dev/pandas/issues/18030 | ||
series_set = set() | ||
to_string_list: Callable[[List[Any]], List[str]] = lambda lst: [str(_) for _ in lst] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we even wrap the "_".join
inside the lambda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
338761c
to
424b047
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this! ❤️
One minor comment. Also just wondering - would dropna(how="all", ...)
on the final result work as a simpler solution?
# https://github.com/apache/superset/issues/15956 | ||
# https://github.com/pandas-dev/pandas/issues/18030 | ||
series_set = set() | ||
lst_to_str: Callable[[List[Any]], str] = lambda lst: "_".join(str(_) for _ in lst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit - I think you could just replace calls of this this with str()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of columns might integer or float, so needs to be converted. for instance:
>>> "".join([1,0])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but the join is not necessary - you can just call str()
on a list to get a (deterministic) string representation of the list.
In [1]: str([0,"x"])
Out[1]: "[0, 'x']"
tuple()
might be better (I assume the intention is to make the list hashable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice tips! changes done.
Ephemeral environment shutdown and build artifacts deleted. |
sorry, missing this review.
|
test cases added |
SUMMARY
closes: #15956
eliminate cartesian product columns in pivot operator
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
After
TESTING INSTRUCTIONS
added test in integrated test
ADDITIONAL INFORMATION