[BEAM-5319] Partially port runners #6451

RobbeSneyders · 2018-09-20T20:16:46Z

This is is part of a series of PRs with goal to make Apache Beam PY3 compatible. The proposal with the outlined approach has been documented here.

Fixing the tests for Python 3 requires a lot of cross-package fixes. I'm therefore submitting this PR with the failing tests skipped during Python 3 testing. This will allow us to build on these fixes while porting the other packages. Since the remaining errors might be caused by other packages, porting of these packages might already solve some of these errors.

In a later stage, these errors should be un-skipped and fixed if they are still failing.

@tvalentyn @Fematich @charlesccychen @aaltay

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Flink	Gearpump	Samza	Spark
Go	---	---	---	---	---	---
Java
Python	---		---	---	---	---

RobbeSneyders · 2018-09-20T20:17:53Z

sdks/python/apache_beam/runners/common.py

@@ -716,7 +717,7 @@ def _reraise_augmented(self, exn):
          traceback.format_exception_only(type(exn), exn)[-1].strip()
          + step_annotation)
      new_exn._tagged_with_step = True
-    raise_(type(new_exn), new_exn, original_traceback)
+    raise_with_traceback(new_exn)


This was changed since raise_ creates a new exception object, which removes the assigned attributes.

RobbeSneyders · 2018-09-20T20:22:45Z

sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor_test.py

@@ -60,15 +60,13 @@ class DummySource(iobase.BoundedSource):

    self.pipeline.visit(self.visitor)

-    root_transforms = sorted(
-        [t.transform for t in self.visitor.root_transforms])
+    root_transforms = [t.transform for t in self.visitor.root_transforms]


Objects don't have a builtin __cmp__ method in Python 3 and can't be sorted.

The sorting here only seems to happen to be able to check equality of tests. The same effect should be achieved by using assertCountEqual which checks the number of times each object occurs in a list.

RobbeSneyders · 2018-09-20T20:31:56Z

12 tests were skipped, but the errors all seem related to each other and to the skipped tests in the unpackaged modules. Hopefully the origin of the errors will be more clear while porting other packages.

Rebasing on top of master also introduced some new errors, which I still need to take a look at, but I wanted to submit these changes for review already.

RobbeSneyders · 2018-09-21T22:34:10Z

Rebased and all tests except for the 12 skipped ones are passing again.

RobbeSneyders · 2018-09-23T17:26:18Z

sdks/python/apache_beam/runners/worker/opcounters_test.py

@@ -160,6 +161,8 @@ def test_update_multiple(self):
    total_size += coder.estimate_size(value)
    self.verify_counters(opcounts, 3, (float(total_size) / 3))

+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '


This test fails with seed 1717, but not with 1718, so the results don't seem consistent.

However, the random generation is consistent for the same integer seed between Python 2 and 3. So python 3 does seem to have an effect on the tested path.

RobbeSneyders · 2018-09-24T18:32:31Z

Rebased on top of #6471 and skipped last failing test so this can be merged.

RobbeSneyders · 2018-09-24T18:50:39Z

sdks/python/apache_beam/runners/portability/stager_test.py

@@ -222,6 +223,8 @@ def test_with_requirements_file_and_cache(self):
    self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'abc.txt')))
    self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'def.txt')))

+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
+                                             'fixed on Python 3')


This test does not fail on its own or when only running the runners tests. It only fails when running the full test suite, so it seems like there is a conflict with another test.

Filed https://issues.apache.org/jira/browse/BEAM-5502 for this one. Please add a TODO with the Jira in the comment.

tvalentyn · 2018-09-20T20:32:40Z

sdks/python/apache_beam/io/textio.py

@@ -147,7 +147,7 @@ def display_data(self):

  def read_records(self, file_name, range_tracker):
    start_offset = range_tracker.start_position()
-    read_buffer = _TextSource.ReadBuffer('', 0)
+    read_buffer = _TextSource.ReadBuffer(b'', 0)


Should we change line 86 as well?

You're right.
I haven't focused too much on the io files yet, since this package still needs to be ported. However, since it is related to this change, I will already add it.

tvalentyn · 2018-09-25T18:55:23Z

sdks/python/apache_beam/runners/interactive/interactive_runner_test.py

@@ -42,6 +43,8 @@ def printer(elem):

 class InteractiveRunnerTest(unittest.TestCase):

+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '


If the issues in interactive runner tests are specific to the interactive runner itself and are non-trivial to fix, we can also try to get original author involved, and track in a separate Jira.

All of the skipped tests except for the one in stager_test.py seem to be failing due to the same underlying bug, so it should not be specific to the interactive runner.

tvalentyn · 2018-09-25T19:30:12Z

sdks/python/apache_beam/runners/portability/stager_test.py

@@ -222,6 +223,8 @@ def test_with_requirements_file_and_cache(self):
    self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'abc.txt')))
    self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'def.txt')))

+  @unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
+                                             'fixed on Python 3')


Filed https://issues.apache.org/jira/browse/BEAM-5502 for this one. Please add a TODO with the Jira in the comment.

tvalentyn · 2018-09-25T19:31:37Z

sdks/python/apache_beam/runners/interactive/cache_manager.py

@@ -30,6 +30,14 @@
 from apache_beam.io import filesystems
 from apache_beam.transforms import combiners

+try:                    # Python 3


Is there a reason not to use

from future.moves.urllib.parse import quote from future.moves.urllib.parse import unquote

for consistency with dataflow_runner.py?

The input of the unquote() call here is a bytes object, while it is a str in the dataflow_runner.

urllib.parse.unquote requires a str on Python 3, so this will still not work if we give it a bytes object?

Ah, nevermind, urllib.parse.unquote_to_bytes can accept string or bytes.

I also couldn't just use urllib.parse.unquote_to_bytes on both Python 2 and 3, since it returnes a future.builtins.newbytes instance on Python 2.

I see. I didn't find unquote_to_bytes in future.moves.urllib, so I thought it wasn't even implemented.

tvalentyn

Thanks, @RobbeSneyders!

aaltay · 2018-09-25T21:53:50Z

sdks/python/apache_beam/runners/direct/consumer_tracking_pipeline_visitor_test.py

@@ -45,6 +45,10 @@ class ConsumerTrackingPipelineVisitorTest(unittest.TestCase):
  def setUp(self):
    self.pipeline = Pipeline(DirectRunner())
    self.visitor = ConsumerTrackingPipelineVisitor()
+    try:                    # Python 2
+      self.assertCountEqual = self.assertItemsEqual


Would not this raise an AttributeError in any case?
python 2 does not have the first method, python 3 does not have the second one.

Nevermind, on py2, this will define assertCountEqual.

…pache#6451) On Python 2.7, fall back to 'collections'. Closes apache#6450.

RobbeSneyders commented Sep 20, 2018

View reviewed changes

RobbeSneyders force-pushed the runners branch from a353c8a to b823144 Compare September 20, 2018 20:29

RobbeSneyders force-pushed the runners branch from b823144 to 3877d21 Compare September 21, 2018 22:33

RobbeSneyders changed the title ~~Partially port runners~~ [BEAM-5319] Partially port runners Sep 21, 2018

RobbeSneyders added 2 commits September 22, 2018 21:16

Remove standard_library.import_aliases() calls

c3b2af6

Partially port runners

74804c7

RobbeSneyders force-pushed the runners branch from 3877d21 to 74804c7 Compare September 23, 2018 17:05

RobbeSneyders commented Sep 23, 2018

View reviewed changes

tvalentyn mentioned this pull request Sep 24, 2018

Remove standard_library.import_aliases() calls #6471

Merged

RobbeSneyders commented Sep 24, 2018

View reviewed changes

Skip failing test_with_setup_file from stagertest

c8149bc

RobbeSneyders force-pushed the runners branch from 514b26d to c8149bc Compare September 24, 2018 19:04

RobbeSneyders mentioned this pull request Sep 25, 2018

[BEAM-1251] Upgrade pylint version for py27-lint3 #6489

Merged

Add pylint ignore to be compatible with changes in apache#6489

59529cf

tvalentyn reviewed Sep 25, 2018

View reviewed changes

Address PR comments

fbcd755

tvalentyn approved these changes Sep 25, 2018

View reviewed changes

aaltay reviewed Sep 25, 2018

View reviewed changes

aaltay approved these changes Sep 25, 2018

View reviewed changes

aaltay merged commit 50111d5 into apache:master Sep 25, 2018

splovyt pushed a commit to splovyt/beam that referenced this pull request Oct 1, 2018

[BEAM-5319] Partially port runners (apache#6451)

3a85a21

pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023

Import stdlib ABCs from 'collections.abc' rather than 'collections'. (a…

8f6f2ae

…pache#6451) On Python 2.7, fall back to 'collections'. Closes apache#6450.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-5319] Partially port runners #6451

[BEAM-5319] Partially port runners #6451

RobbeSneyders commented Sep 20, 2018

RobbeSneyders Sep 20, 2018

RobbeSneyders Sep 20, 2018

RobbeSneyders commented Sep 20, 2018

RobbeSneyders commented Sep 21, 2018

RobbeSneyders Sep 23, 2018

RobbeSneyders commented Sep 24, 2018 •

edited

Loading

RobbeSneyders Sep 24, 2018

tvalentyn Sep 25, 2018

tvalentyn Sep 20, 2018

RobbeSneyders Sep 25, 2018

tvalentyn Sep 25, 2018

RobbeSneyders Sep 25, 2018

tvalentyn Sep 25, 2018

tvalentyn Sep 25, 2018

RobbeSneyders Sep 25, 2018

tvalentyn Sep 25, 2018

tvalentyn Sep 25, 2018

RobbeSneyders Sep 26, 2018

tvalentyn Sep 26, 2018

tvalentyn left a comment

aaltay Sep 25, 2018

aaltay Sep 25, 2018

		@@ -42,6 +43,8 @@ def printer(elem):

		class InteractiveRunnerTest(unittest.TestCase):

		@unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '

[BEAM-5319] Partially port runners #6451

[BEAM-5319] Partially port runners #6451

Conversation

RobbeSneyders commented Sep 20, 2018

Post-Commit Tests Status (on master branch)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RobbeSneyders commented Sep 20, 2018

RobbeSneyders commented Sep 21, 2018

Choose a reason for hiding this comment

RobbeSneyders commented Sep 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvalentyn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RobbeSneyders commented Sep 24, 2018 •

edited

Loading