[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772

holdenk · 2017-08-27T08:05:14Z

This is a WIP pull request that is the result of running the Python 2 to 3 migration and then manually fixing the errors that broke in Python 2.7 support after the auto migration. This does not mean the code is ready for Python 3 after this -- although I am working on a follow up for that in BEAM-1373.

Most of the 2/3 compatibility is handled with the future library, although in some places where we interact with libraries that use six to achieve 2/3 compatibility I've used six as well so that we can inter-operate.

This pull request is rather large, but because it involves changing the types of many things it's would be challenging to isolate this into smaller chunks could prove to be difficult.

This PR is tagged as WIP since I still need to go through it again by hand and review the automatic changes (just because the tests pass with the fixes doesn't mean its all good), but if some of the other folks who care about BEAM on Py3 are around I'd appreciate some idea if this is going in the direction we want. There are also a large number of style issues to fix from automated tooling as well.

In addition to the automated futurization isort & limited autopep8 was also applied (and then manually modified since we need to do some silly things to work between 2/3 with keeping raw & compatible objects around)

…quash of a somewhat long process to fix all of these: Start manually fixing up the errors. Explicitly depend on the future package Attempt to fix encoder issues List was doing nothing Use has_attr func correctly More fixes Set default encoding to latin-1 in some places since otherwise we run into issues with using strings to hold codepoint above 128. Switch coder_impl to check for basestring Future holden says thanks for basestirng Use Bytes IO for avroio Try and fix assertItemsEqual py2 again be explicit about newint sys + b string sys import Explicit encode before putting into a pb2 Use BytesIO Add missing import Fix newint tuple magic force positions to int Add ignore unicode prefix annotation from Spark Try assertItemsEqual again Stream change More coder fixes Rewrite exception messages for newint to int so we can be consistent between Py2 and Py3 with the type inferance try converting to a string early idk Progress around type hints being finicky now that we have future Move _rewrite_typehint_string around, try and cleanup some builtin imports Tentative: use basestring for str in Py2 which is a bit sketchy Remove from builtins import str and minor fixes Fix some more coding issues Add an explicit test for the new coder, include the new method from the changed inheritance, add workaround for unicode. Second missing next Change some ptransform tests Fix the unicode workaround import (oops) Fix raising the error during retry Ok so it "works" for the one test we've been iterating on but has lots of print debugging cause ohgod Remove a bunch of cruft debugging added in and some things that took us down the wrong path as well with intermediate helper transform type annotations Keep repr as is Make SequenceTypeConstraint an instance of IndexableTypeConstraint. Add explicit tests in trivial_inference_test for indexing. Now at 1260 tests passing and 256 skipped :) Small gcp related fixes gcsio test fixes gcsio and pylint fixes Import order fix autopep8 W391,W293,W291,E306,E305,E304,E303 We don't support 3.4 yet Fix workaround for object magic Ok so the underlying transfer library is using six so lets use it too Ok make the tests more like normal Add in missing return and factor out format result since long line

aaltay · 2017-08-27T18:59:26Z

Thank you @holdenk! This is a great start. I will make a pass over the PR but given the size I would cc a few people who might be interested:

cc: @chamikaramj @robertwb - for any general comments about the direction
cc: @katsiapis - could you recommend a reviewer to generally look at the py2/3 conversions here.

holdenk · 2017-08-28T01:28:13Z

I'm also happy to try breaking it up some if its too big (although as mentioned I'm afraid this might introduce some problems as if we change only some of the types in each part).

aaltay · 2017-08-28T21:03:13Z

@holdenk What do you think about moving this to a branch? My reasoning is:

Any PR review will be hard, especially for a non-complete transformation. Similarly not-committing will be hard.
Even after full completion, we may not want to release changes as broad as this to the next available release, and allow some time to bake in a branch.
A branch will not be affected by changes in the master at random times, we can control when to pull from master.

/cc: @aaltay

robertwb

Thank you for attacking this!

This is a rather large change, and it's worth thinking about ways to split this up.

Changes such as autopep8 and input sorting and other cleanup could be done as a separate PR. It may make sense to submit these first, in which case you could run them after the other changes and the diff would be much more relevant to the Py3 conversion.
It seems most of the changes could be done file-by-file, or package-by-package, to ease reviewing. Of course with this kind of change if the PRs are too small the overhead becomes too large, but maybe aim for <1K lines per change. Another (preferable in by book) option is to split it up into the "no brainer" changes and then do a second pass for the more interesting/complicated ones.
I do have to say I'm wary of using a branch for this; these are the kinds of changes that tend to have a high probability of becoming unmergeable quickly. I don't see this as being a particular risky change (unless things prove differently in the trivial vs. non-trivial split suggested above, in which case I'd want to get at least the trivial changes in master.)
Short of __future__ imports, what can we do to avoid regressions (until we can actually run full, or at lest a subset of, tests in Python3)?

robertwb · 2017-08-28T21:14:27Z

sdks/python/apache_beam/coders/coder_impl.py

-    out.write_bigendian_uint64(self._from_normal_time(value.end.micros / 1000))
-    out.write_var_int64(span_micros / 1000)
+    out.write_bigendian_uint64(self._from_normal_time(
+        old_div(value.end.micros, 1000)))


Prefer // to old_div if the types are integral (as they are here), and plain old / for floats.

holdenk · 2017-08-29T19:42:52Z

@robertwb

Thanks! I'll start with an isort PR & autopep8 PRs and then we can do per-directory for the other changes. I agree that a long running branch is not a good solution, since this involves a large number of small changes to pretty much everything it probably only takes about a day for it to become unmergeable.

holdenk · 2017-08-30T07:41:12Z

The isort/autopep8 pre-staged changes are now live at #3785
Once that's in I'll break this up into per-component PRs.

holdenk · 2017-08-30T07:45:36Z

As far as avoiding regressions go, there isn't really a lot of options -- really I think the solution is to get to the point where Python three is running -- if not officially supported as soon as possible.

cclauss · 2018-02-06T13:19:27Z

Where did this PR get to? What is the right path to Python 3 compatibility?

python2 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
- Has a couple of undefined names but...
python3 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
- Has 15 syntax errors and 97 undefined names

aaltay · 2018-02-09T17:08:14Z

@cclauss I do not know the status of this PR. How about we first address issues with running flake8 with python2? Once those are resolved, we can target the python3 related issues.

cclauss · 2018-02-09T17:26:43Z

Once #4561 lands, the codebase will pass:

python2 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

cclauss · 2018-06-01T14:35:28Z

My sense is that most of this has been fixed in other PRs that have already been merged and in these four open PRs:

Given the number of conflicts, does it make sense to close this PR or at least rebase it?

stale · 2018-07-31T14:56:43Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

holdenk · 2018-08-06T03:14:27Z

This PR isn't going anywhere since it's been forked out into a large number of smaller JIRAs/PRs :) Really excited to see Py3 support happen in this project :)

holdenk added 3 commits August 27, 2017 00:14

Initial first pass with futurize

897fbac

Minor cleanups with coders_tests and add a note about the tostringcoder

bd7e623

holdenk added 3 commits August 27, 2017 12:25

Add in missing test_utils

d1b11c8

Fix more pylint issues

7d71f58

Remove deprecated use of filter and replace with for

43ad4fa

holdenk added 3 commits August 27, 2017 22:38

Style fixes

ee53265

We expected future related things to have warnings/errors in pydoc

bdffa73

Skip future warnings in earlier sphinx build as well

b29aff0

robertwb reviewed Aug 28, 2017

View reviewed changes

kennknowles assigned kennknowles and unassigned kennknowles Jan 25, 2018

stale bot added the stale label Jul 31, 2018

holdenk closed this Aug 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772

[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772

holdenk commented Aug 27, 2017

aaltay commented Aug 27, 2017

holdenk commented Aug 28, 2017

aaltay commented Aug 28, 2017

robertwb left a comment

robertwb Aug 28, 2017

holdenk commented Aug 29, 2017

holdenk commented Aug 30, 2017

holdenk commented Aug 30, 2017

cclauss commented Feb 6, 2018 •

edited

Loading

aaltay commented Feb 9, 2018

cclauss commented Feb 9, 2018 •

edited

Loading

cclauss commented Jun 1, 2018 •

edited

Loading

stale bot commented Jul 31, 2018

holdenk commented Aug 6, 2018

[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772

[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772

Conversation

holdenk commented Aug 27, 2017

aaltay commented Aug 27, 2017

holdenk commented Aug 28, 2017

aaltay commented Aug 28, 2017

robertwb left a comment

Choose a reason for hiding this comment

robertwb Aug 28, 2017

Choose a reason for hiding this comment

holdenk commented Aug 29, 2017

holdenk commented Aug 30, 2017

holdenk commented Aug 30, 2017

cclauss commented Feb 6, 2018 • edited Loading

aaltay commented Feb 9, 2018

cclauss commented Feb 9, 2018 • edited Loading

cclauss commented Jun 1, 2018 • edited Loading

stale bot commented Jul 31, 2018

holdenk commented Aug 6, 2018

cclauss commented Feb 6, 2018 •

edited

Loading

cclauss commented Feb 9, 2018 •

edited

Loading

cclauss commented Jun 1, 2018 •

edited

Loading