-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772
[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772
Conversation
…quash of a somewhat long process to fix all of these: Start manually fixing up the errors. Explicitly depend on the future package Attempt to fix encoder issues List was doing nothing Use has_attr func correctly More fixes Set default encoding to latin-1 in some places since otherwise we run into issues with using strings to hold codepoint above 128. Switch coder_impl to check for basestring Future holden says thanks for basestirng Use Bytes IO for avroio Try and fix assertItemsEqual py2 again be explicit about newint sys + b string sys import Explicit encode before putting into a pb2 Use BytesIO Add missing import Fix newint tuple magic force positions to int Add ignore unicode prefix annotation from Spark Try assertItemsEqual again Stream change More coder fixes Rewrite exception messages for newint to int so we can be consistent between Py2 and Py3 with the type inferance try converting to a string early idk Progress around type hints being finicky now that we have future Move _rewrite_typehint_string around, try and cleanup some builtin imports Tentative: use basestring for str in Py2 which is a bit sketchy Remove from builtins import str and minor fixes Fix some more coding issues Add an explicit test for the new coder, include the new method from the changed inheritance, add workaround for unicode. Second missing next Change some ptransform tests Fix the unicode workaround import (oops) Fix raising the error during retry Ok so it "works" for the one test we've been iterating on but has lots of print debugging cause ohgod Remove a bunch of cruft debugging added in and some things that took us down the wrong path as well with intermediate helper transform type annotations Keep repr as is Make SequenceTypeConstraint an instance of IndexableTypeConstraint. Add explicit tests in trivial_inference_test for indexing. Now at 1260 tests passing and 256 skipped :) Small gcp related fixes gcsio test fixes gcsio and pylint fixes Import order fix autopep8 W391,W293,W291,E306,E305,E304,E303 We don't support 3.4 yet Fix workaround for object magic Ok so the underlying transfer library is using six so lets use it too Ok make the tests more like normal Add in missing return and factor out format result since long line
Thank you @holdenk! This is a great start. I will make a pass over the PR but given the size I would cc a few people who might be interested: cc: @chamikaramj @robertwb - for any general comments about the direction |
I'm also happy to try breaking it up some if its too big (although as mentioned I'm afraid this might introduce some problems as if we change only some of the types in each part). |
@holdenk What do you think about moving this to a branch? My reasoning is:
/cc: @aaltay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for attacking this!
This is a rather large change, and it's worth thinking about ways to split this up.
-
Changes such as autopep8 and input sorting and other cleanup could be done as a separate PR. It may make sense to submit these first, in which case you could run them after the other changes and the diff would be much more relevant to the Py3 conversion.
-
It seems most of the changes could be done file-by-file, or package-by-package, to ease reviewing. Of course with this kind of change if the PRs are too small the overhead becomes too large, but maybe aim for <1K lines per change. Another (preferable in by book) option is to split it up into the "no brainer" changes and then do a second pass for the more interesting/complicated ones.
-
I do have to say I'm wary of using a branch for this; these are the kinds of changes that tend to have a high probability of becoming unmergeable quickly. I don't see this as being a particular risky change (unless things prove differently in the trivial vs. non-trivial split suggested above, in which case I'd want to get at least the trivial changes in master.)
-
Short of
__future__
imports, what can we do to avoid regressions (until we can actually run full, or at lest a subset of, tests in Python3)?
out.write_bigendian_uint64(self._from_normal_time(value.end.micros / 1000)) | ||
out.write_var_int64(span_micros / 1000) | ||
out.write_bigendian_uint64(self._from_normal_time( | ||
old_div(value.end.micros, 1000))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer //
to old_div
if the types are integral (as they are here), and plain old /
for floats.
Thanks! I'll start with an isort PR & autopep8 PRs and then we can do per-directory for the other changes. I agree that a long running branch is not a good solution, since this involves a large number of small changes to pretty much everything it probably only takes about a day for it to become unmergeable. |
The isort/autopep8 pre-staged changes are now live at #3785 |
As far as avoiding regressions go, there isn't really a lot of options -- really I think the solution is to get to the point where Python three is running -- if not officially supported as soon as possible. |
Where did this PR get to? What is the right path to Python 3 compatibility?
|
@cclauss I do not know the status of this PR. How about we first address issues with running flake8 with python2? Once those are resolved, we can target the python3 related issues. |
Once #4561 lands, the codebase will pass:
|
My sense is that most of this has been fixed in other PRs that have already been merged and in these four open PRs:
Given the number of conflicts, does it make sense to close this PR or at least rebase it? |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
This PR isn't going anywhere since it's been forked out into a large number of smaller JIRAs/PRs :) Really excited to see Py3 support happen in this project :) |
This is a WIP pull request that is the result of running the Python 2 to 3 migration and then manually fixing the errors that broke in Python 2.7 support after the auto migration. This does not mean the code is ready for Python 3 after this -- although I am working on a follow up for that in BEAM-1373.
Most of the 2/3 compatibility is handled with the future library, although in some places where we interact with libraries that use six to achieve 2/3 compatibility I've used six as well so that we can inter-operate.
This pull request is rather large, but because it involves changing the types of many things it's would be challenging to isolate this into smaller chunks could prove to be difficult.
This PR is tagged as WIP since I still need to go through it again by hand and review the automatic changes (just because the tests pass with the fixes doesn't mean its all good), but if some of the other folks who care about BEAM on Py3 are around I'd appreciate some idea if this is going in the direction we want. There are also a large number of style issues to fix from automated tooling as well.
In addition to the automated futurization isort & limited autopep8 was also applied (and then manually modified since we need to do some silly things to work between 2/3 with keeping raw & compatible objects around)