[BEAM-12018] Initial implementation for melt #14689

roger-mike · 2021-04-29T18:56:27Z

Adds implementation for melt to DeferredDataFrame.

`ValidatesRunner` compliance status (on master branch)

Lang	ULR	Samza	Twister2
Go	---	---	---
Java
Python	---	---	---
XLang		---	---

Examples testing status on various runners

Lang	ULR	Dataflow	Flink	Samza	Spark	Twister2
Go	---	---	---	---	---	---	---
Java	---		---	---	---	---	---
Python	---	---	---	---	---	---	---
XLang	---	---	---	---	---	---	---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go	Java	Python

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website	Whitespace	Typescript
Non-portable
Portable	---			---	---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

roger-mike · 2021-04-29T18:58:30Z

R: @TheNeuralBit could you review it, please?

codecov · 2021-04-29T19:14:44Z

Codecov Report

Merging #14689 (50a26e2) into master (fd33f16) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #14689      +/-   ##
==========================================
- Coverage   83.63%   83.63%   -0.01%     
==========================================
  Files         440      440              
  Lines       58969    58975       +6     
==========================================
+ Hits        49320    49323       +3     
- Misses       9649     9652       +3

Impacted Files	Coverage Δ
.../srcs/sdks/python/apache_beam/io/gcp/bigtableio.py
...am/testing/benchmarks/chicago_taxi/process_tfma.py
...n/apache_beam/runners/dataflow/native_io/iobase.py
...beam/portability/api/beam_artifact_api_pb2_urns.py
.../srcs/sdks/python/apache_beam/coders/observable.py
...dks/python/apache_beam/examples/wordcount_xlang.py
...he_beam/testing/benchmarks/nexmark/nexmark_perf.py
...d/srcs/sdks/python/apache_beam/io/filebasedsink.py
...e_beam/portability/api/beam_runner_api_pb2_grpc.py
...hon/apache_beam/runners/direct/test_stream_impl.py
... and 870 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd33f16...50a26e2. Read the comment docs.

TheNeuralBit · 2021-04-29T20:39:51Z

Run Python PreCommit

TheNeuralBit

Thanks this looks great! Just a minor request

TheNeuralBit · 2021-04-29T21:00:09Z

sdks/python/apache_beam/dataframe/pandas_doctests_test.py

@@ -208,10 +208,10 @@ def test_dataframe_tests(self):
            ],
            'pandas.core.frame.DataFrame.sort_index': ['*'],
            'pandas.core.frame.DataFrame.sort_values': ['*'],
+            'pandas.core.frame.DataFrame.melt': ['*']


Could you modify this so it just skips the calls that don't work? Unfortunately that will be most of them, since the defiault is ignore_index=True, but based on https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.melt.html it looks like we can at least run df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)

TheNeuralBit · 2021-04-29T21:00:19Z

sdks/python/apache_beam/dataframe/pandas_doctests_test.py

@@ -711,6 +711,7 @@ def test_top_level(self):
        wont_implement_ok={
            'to_datetime': ['s.head()'],
            'to_pickle': ['*'],
+            'melt': ['*'],


similarly here, but based on these examples

Sure, coming right up 👍

roger-mike · 2021-04-29T22:46:48Z

Run Python PreCommit

TheNeuralBit

Sorry I just noticed one last thing. Otherwise this LGTM. Thank you!

TheNeuralBit · 2021-04-29T23:43:03Z

sdks/python/apache_beam/dataframe/frames.py

+            lambda df: df.melt(ignore_index=False, **kwargs), [self._expr],
+            requires_partition_by=partitionings.Arbitrary(),
+            preserves_partition_by=partitionings.Singleton()))
+


Ah sorry I just noticed that this method is on DeferredDataFrameOrSeries. But we don't want to define this on DeferredSeries, just on DeferredDataFrame. (Since pandas only supports it on DataFrames)

Could you move the method there?

Done, sorry I didn't notice it before 😅

[BEAM-12018] Initial implementation for melt

058d67b

TheNeuralBit reviewed Apr 29, 2021

View reviewed changes

[BEAM-12018] Specified skipped tests

fffdb2c

roger-mike requested a review from TheNeuralBit April 29, 2021 23:31

TheNeuralBit approved these changes Apr 29, 2021

View reviewed changes

[BEAM-12018] Moved melt function to DeferredDataFrame

50a26e2

roger-mike requested a review from TheNeuralBit April 30, 2021 01:28

TheNeuralBit merged commit f5f9898 into apache:master Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-12018] Initial implementation for melt #14689

[BEAM-12018] Initial implementation for melt #14689

roger-mike commented Apr 29, 2021

roger-mike commented Apr 29, 2021

codecov bot commented Apr 29, 2021 •

edited

Loading

TheNeuralBit commented Apr 29, 2021

TheNeuralBit left a comment

TheNeuralBit Apr 29, 2021

TheNeuralBit Apr 29, 2021

roger-mike Apr 29, 2021

roger-mike commented Apr 29, 2021

TheNeuralBit left a comment

TheNeuralBit Apr 29, 2021

roger-mike Apr 30, 2021

[BEAM-12018] Initial implementation for melt #14689

[BEAM-12018] Initial implementation for melt #14689

Conversation

roger-mike commented Apr 29, 2021

ValidatesRunner compliance status (on master branch)

Examples testing status on various runners

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

roger-mike commented Apr 29, 2021

codecov bot commented Apr 29, 2021 • edited Loading

Codecov Report

TheNeuralBit commented Apr 29, 2021

TheNeuralBit left a comment

Choose a reason for hiding this comment

TheNeuralBit Apr 29, 2021

Choose a reason for hiding this comment

TheNeuralBit Apr 29, 2021

Choose a reason for hiding this comment

roger-mike Apr 29, 2021

Choose a reason for hiding this comment

roger-mike commented Apr 29, 2021

TheNeuralBit left a comment

Choose a reason for hiding this comment

TheNeuralBit Apr 29, 2021

Choose a reason for hiding this comment

roger-mike Apr 30, 2021

Choose a reason for hiding this comment

`ValidatesRunner` compliance status (on master branch)

codecov bot commented Apr 29, 2021 •

edited

Loading