Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5319] Partially port runners #6451

Merged
merged 5 commits into from
Sep 25, 2018
Merged

Conversation

RobbeSneyders
Copy link
Contributor

This is is part of a series of PRs with goal to make Apache Beam PY3 compatible. The proposal with the outlined approach has been documented here.

Fixing the tests for Python 3 requires a lot of cross-package fixes. I'm therefore submitting this PR with the failing tests skipped during Python 3 testing. This will allow us to build on these fixes while porting the other packages. Since the remaining errors might be caused by other packages, porting of these packages might already solve some of these errors.

In a later stage, these errors should be un-skipped and fixed if they are still failing.

@tvalentyn @Fematich @charlesccychen @aaltay

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

@@ -716,7 +717,7 @@ def _reraise_augmented(self, exn):
traceback.format_exception_only(type(exn), exn)[-1].strip()
+ step_annotation)
new_exn._tagged_with_step = True
raise_(type(new_exn), new_exn, original_traceback)
raise_with_traceback(new_exn)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed since raise_ creates a new exception object, which removes the assigned attributes.

@@ -60,15 +60,13 @@ class DummySource(iobase.BoundedSource):

self.pipeline.visit(self.visitor)

root_transforms = sorted(
[t.transform for t in self.visitor.root_transforms])
root_transforms = [t.transform for t in self.visitor.root_transforms]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects don't have a builtin __cmp__ method in Python 3 and can't be sorted.

The sorting here only seems to happen to be able to check equality of tests. The same effect should be achieved by using assertCountEqual which checks the number of times each object occurs in a list.

@RobbeSneyders
Copy link
Contributor Author

12 tests were skipped, but the errors all seem related to each other and to the skipped tests in the unpackaged modules. Hopefully the origin of the errors will be more clear while porting other packages.

Rebasing on top of master also introduced some new errors, which I still need to take a look at, but I wanted to submit these changes for review already.

@RobbeSneyders
Copy link
Contributor Author

Rebased and all tests except for the 12 skipped ones are passing again.

@RobbeSneyders RobbeSneyders changed the title Partially port runners [BEAM-5319] Partially port runners Sep 21, 2018
@@ -160,6 +161,8 @@ def test_update_multiple(self):
total_size += coder.estimate_size(value)
self.verify_counters(opcounts, 3, (float(total_size) / 3))

@unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails with seed 1717, but not with 1718, so the results don't seem consistent.

However, the random generation is consistent for the same integer seed between Python 2 and 3. So python 3 does seem to have an effect on the tested path.

@RobbeSneyders
Copy link
Contributor Author

RobbeSneyders commented Sep 24, 2018

Rebased on top of #6471 and skipped last failing test so this can be merged.

@@ -222,6 +223,8 @@ def test_with_requirements_file_and_cache(self):
self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'abc.txt')))
self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'def.txt')))

@unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
'fixed on Python 3')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not fail on its own or when only running the runners tests. It only fails when running the full test suite, so it seems like there is a conflict with another test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed https://issues.apache.org/jira/browse/BEAM-5502 for this one. Please add a TODO with the Jira in the comment.

@@ -147,7 +147,7 @@ def display_data(self):

def read_records(self, file_name, range_tracker):
start_offset = range_tracker.start_position()
read_buffer = _TextSource.ReadBuffer('', 0)
read_buffer = _TextSource.ReadBuffer(b'', 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change line 86 as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.
I haven't focused too much on the io files yet, since this package still needs to be ported. However, since it is related to this change, I will already add it.

@@ -42,6 +43,8 @@ def printer(elem):

class InteractiveRunnerTest(unittest.TestCase):

@unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the issues in interactive runner tests are specific to the interactive runner itself and are non-trivial to fix, we can also try to get original author involved, and track in a separate Jira.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the skipped tests except for the one in stager_test.py seem to be failing due to the same underlying bug, so it should not be specific to the interactive runner.

@@ -222,6 +223,8 @@ def test_with_requirements_file_and_cache(self):
self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'abc.txt')))
self.assertTrue(os.path.isfile(os.path.join(staging_dir, 'def.txt')))

@unittest.skipIf(sys.version_info[0] == 3, 'This test still needs to be '
'fixed on Python 3')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed https://issues.apache.org/jira/browse/BEAM-5502 for this one. Please add a TODO with the Jira in the comment.

@@ -30,6 +30,14 @@
from apache_beam.io import filesystems
from apache_beam.transforms import combiners

try: # Python 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use

from future.moves.urllib.parse import quote
from future.moves.urllib.parse import unquote

for consistency with dataflow_runner.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input of the unquote() call here is a bytes object, while it is a str in the dataflow_runner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urllib.parse.unquote requires a str on Python 3, so this will still not work if we give it a bytes object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nevermind, urllib.parse.unquote_to_bytes can accept string or bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also couldn't just use urllib.parse.unquote_to_bytes on both Python 2 and 3, since it returnes a future.builtins.newbytes instance on Python 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I didn't find unquote_to_bytes in future.moves.urllib, so I thought it wasn't even implemented.

Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @RobbeSneyders!

@@ -45,6 +45,10 @@ class ConsumerTrackingPipelineVisitorTest(unittest.TestCase):
def setUp(self):
self.pipeline = Pipeline(DirectRunner())
self.visitor = ConsumerTrackingPipelineVisitor()
try: # Python 2
self.assertCountEqual = self.assertItemsEqual
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not this raise an AttributeError in any case?
python 2 does not have the first method, python 3 does not have the second one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, on py2, this will define assertCountEqual.

@aaltay aaltay merged commit 50111d5 into apache:master Sep 25, 2018
splovyt pushed a commit to splovyt/beam that referenced this pull request Oct 1, 2018
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants