-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-12018] Initial implementation for melt #14689
Conversation
R: @TheNeuralBit could you review it, please? |
Run Python PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks great! Just a minor request
@@ -208,10 +208,10 @@ def test_dataframe_tests(self): | |||
], | |||
'pandas.core.frame.DataFrame.sort_index': ['*'], | |||
'pandas.core.frame.DataFrame.sort_values': ['*'], | |||
'pandas.core.frame.DataFrame.melt': ['*'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you modify this so it just skips the calls that don't work? Unfortunately that will be most of them, since the defiault is ignore_index=True
, but based on https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.melt.html it looks like we can at least run df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)
@@ -711,6 +711,7 @@ def test_top_level(self): | |||
wont_implement_ok={ | |||
'to_datetime': ['s.head()'], | |||
'to_pickle': ['*'], | |||
'melt': ['*'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here, but based on these examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, coming right up 👍
Run Python PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I just noticed one last thing. Otherwise this LGTM. Thank you!
lambda df: df.melt(ignore_index=False, **kwargs), [self._expr], | ||
requires_partition_by=partitionings.Arbitrary(), | ||
preserves_partition_by=partitionings.Singleton())) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I just noticed that this method is on DeferredDataFrameOrSeries
. But we don't want to define this on DeferredSeries
, just on DeferredDataFrame
. (Since pandas only supports it on DataFrames)
Could you move the method there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, sorry I didn't notice it before 😅
Adds implementation for
melt
toDeferredDataFrame
.ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.