Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12018] Initial implementation for melt #14689

Merged
merged 3 commits into from
Apr 30, 2021

Conversation

roger-mike
Copy link
Contributor

Adds implementation for melt to DeferredDataFrame.


ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@roger-mike
Copy link
Contributor Author

R: @TheNeuralBit could you review it, please?

@codecov
Copy link

codecov bot commented Apr 29, 2021

Codecov Report

Merging #14689 (50a26e2) into master (fd33f16) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #14689      +/-   ##
==========================================
- Coverage   83.63%   83.63%   -0.01%     
==========================================
  Files         440      440              
  Lines       58969    58975       +6     
==========================================
+ Hits        49320    49323       +3     
- Misses       9649     9652       +3     
Impacted Files Coverage Δ
.../srcs/sdks/python/apache_beam/io/gcp/bigtableio.py
...am/testing/benchmarks/chicago_taxi/process_tfma.py
...n/apache_beam/runners/dataflow/native_io/iobase.py
...beam/portability/api/beam_artifact_api_pb2_urns.py
.../srcs/sdks/python/apache_beam/coders/observable.py
...dks/python/apache_beam/examples/wordcount_xlang.py
...he_beam/testing/benchmarks/nexmark/nexmark_perf.py
...d/srcs/sdks/python/apache_beam/io/filebasedsink.py
...e_beam/portability/api/beam_runner_api_pb2_grpc.py
...hon/apache_beam/runners/direct/test_stream_impl.py
... and 870 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd33f16...50a26e2. Read the comment docs.

@TheNeuralBit
Copy link
Member

Run Python PreCommit

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks great! Just a minor request

@@ -208,10 +208,10 @@ def test_dataframe_tests(self):
],
'pandas.core.frame.DataFrame.sort_index': ['*'],
'pandas.core.frame.DataFrame.sort_values': ['*'],
'pandas.core.frame.DataFrame.melt': ['*']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you modify this so it just skips the calls that don't work? Unfortunately that will be most of them, since the defiault is ignore_index=True, but based on https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.melt.html it looks like we can at least run df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)

@@ -711,6 +711,7 @@ def test_top_level(self):
wont_implement_ok={
'to_datetime': ['s.head()'],
'to_pickle': ['*'],
'melt': ['*'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly here, but based on these examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, coming right up 👍

@roger-mike
Copy link
Contributor Author

Run Python PreCommit

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I just noticed one last thing. Otherwise this LGTM. Thank you!

lambda df: df.melt(ignore_index=False, **kwargs), [self._expr],
requires_partition_by=partitionings.Arbitrary(),
preserves_partition_by=partitionings.Singleton()))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I just noticed that this method is on DeferredDataFrameOrSeries. But we don't want to define this on DeferredSeries, just on DeferredDataFrame. (Since pandas only supports it on DataFrames)

Could you move the method there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry I didn't notice it before 😅

@TheNeuralBit TheNeuralBit merged commit f5f9898 into apache:master Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants