Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772

Conversation

holdenk
Copy link
Contributor

@holdenk holdenk commented Aug 27, 2017

This is a WIP pull request that is the result of running the Python 2 to 3 migration and then manually fixing the errors that broke in Python 2.7 support after the auto migration. This does not mean the code is ready for Python 3 after this -- although I am working on a follow up for that in BEAM-1373.

Most of the 2/3 compatibility is handled with the future library, although in some places where we interact with libraries that use six to achieve 2/3 compatibility I've used six as well so that we can inter-operate.

This pull request is rather large, but because it involves changing the types of many things it's would be challenging to isolate this into smaller chunks could prove to be difficult.

This PR is tagged as WIP since I still need to go through it again by hand and review the automatic changes (just because the tests pass with the fixes doesn't mean its all good), but if some of the other folks who care about BEAM on Py3 are around I'd appreciate some idea if this is going in the direction we want. There are also a large number of style issues to fix from automated tooling as well.

In addition to the automated futurization isort & limited autopep8 was also applied (and then manually modified since we need to do some silly things to work between 2/3 with keeping raw & compatible objects around)

…quash of a somewhat long process to fix all of these:

Start manually fixing up the errors.

Explicitly depend on the future package

Attempt to fix encoder issues

List was doing nothing

Use has_attr func correctly

More fixes

Set default encoding to latin-1 in some places since otherwise we run into issues with using strings to hold codepoint above 128. Switch coder_impl to check for basestring

Future holden says thanks for basestirng

Use Bytes IO for avroio

Try and fix assertItemsEqual py2 again

be explicit about newint

sys + b string

sys import

Explicit encode before putting into a pb2

Use BytesIO

Add missing import

Fix newint tuple magic

force positions to int

Add ignore unicode prefix annotation from Spark

Try assertItemsEqual again

Stream change

More coder fixes

Rewrite exception messages for newint to int so we can be consistent between Py2 and Py3 with the type inferance

try converting to a string early idk

Progress around type hints being finicky now that we have future

Move _rewrite_typehint_string around, try and cleanup some builtin imports

Tentative: use basestring for str in Py2 which is a bit sketchy

Remove from builtins import str and minor fixes

Fix some more coding issues

Add an explicit test for the new coder, include the new method from the changed inheritance, add workaround for unicode.

Second missing next

Change some ptransform tests

Fix the unicode workaround import (oops)

Fix raising the error during retry

Ok so it "works" for the one test we've been iterating on but has lots of print debugging cause ohgod

Remove a bunch of cruft debugging added in and some things that took us down the wrong path as well with intermediate helper transform type annotations

Keep repr as is

Make SequenceTypeConstraint an instance of IndexableTypeConstraint. Add explicit tests in trivial_inference_test for indexing. Now at 1260 tests passing and 256 skipped :)

Small gcp related fixes

gcsio test fixes

gcsio and pylint fixes

Import order fix

autopep8 W391,W293,W291,E306,E305,E304,E303

We don't support 3.4 yet

Fix workaround for object magic

Ok so the underlying transfer library is using six so lets use it too

Ok make the tests more like normal

Add in missing return and factor out format result since long line
@aaltay
Copy link
Member

aaltay commented Aug 27, 2017

Thank you @holdenk! This is a great start. I will make a pass over the PR but given the size I would cc a few people who might be interested:

cc: @chamikaramj @robertwb - for any general comments about the direction
cc: @katsiapis - could you recommend a reviewer to generally look at the py2/3 conversions here.

@holdenk
Copy link
Contributor Author

holdenk commented Aug 28, 2017

I'm also happy to try breaking it up some if its too big (although as mentioned I'm afraid this might introduce some problems as if we change only some of the types in each part).

@aaltay
Copy link
Member

aaltay commented Aug 28, 2017

@holdenk What do you think about moving this to a branch? My reasoning is:

  • Any PR review will be hard, especially for a non-complete transformation. Similarly not-committing will be hard.
  • Even after full completion, we may not want to release changes as broad as this to the next available release, and allow some time to bake in a branch.
  • A branch will not be affected by changes in the master at random times, we can control when to pull from master.

/cc: @aaltay

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for attacking this!

This is a rather large change, and it's worth thinking about ways to split this up.

  • Changes such as autopep8 and input sorting and other cleanup could be done as a separate PR. It may make sense to submit these first, in which case you could run them after the other changes and the diff would be much more relevant to the Py3 conversion.

  • It seems most of the changes could be done file-by-file, or package-by-package, to ease reviewing. Of course with this kind of change if the PRs are too small the overhead becomes too large, but maybe aim for <1K lines per change. Another (preferable in by book) option is to split it up into the "no brainer" changes and then do a second pass for the more interesting/complicated ones.

  • I do have to say I'm wary of using a branch for this; these are the kinds of changes that tend to have a high probability of becoming unmergeable quickly. I don't see this as being a particular risky change (unless things prove differently in the trivial vs. non-trivial split suggested above, in which case I'd want to get at least the trivial changes in master.)

  • Short of __future__ imports, what can we do to avoid regressions (until we can actually run full, or at lest a subset of, tests in Python3)?

out.write_bigendian_uint64(self._from_normal_time(value.end.micros / 1000))
out.write_var_int64(span_micros / 1000)
out.write_bigendian_uint64(self._from_normal_time(
old_div(value.end.micros, 1000)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer // to old_div if the types are integral (as they are here), and plain old / for floats.

@holdenk
Copy link
Contributor Author

holdenk commented Aug 29, 2017

@robertwb

Thanks! I'll start with an isort PR & autopep8 PRs and then we can do per-directory for the other changes. I agree that a long running branch is not a good solution, since this involves a large number of small changes to pretty much everything it probably only takes about a day for it to become unmergeable.

@holdenk
Copy link
Contributor Author

holdenk commented Aug 30, 2017

The isort/autopep8 pre-staged changes are now live at #3785
Once that's in I'll break this up into per-component PRs.

@holdenk
Copy link
Contributor Author

holdenk commented Aug 30, 2017

As far as avoiding regressions go, there isn't really a lot of options -- really I think the solution is to get to the point where Python three is running -- if not officially supported as soon as possible.

@cclauss
Copy link

cclauss commented Feb 6, 2018

Where did this PR get to? What is the right path to Python 3 compatibility?

  • python2 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
    • Has a couple of undefined names but...
  • python3 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
    • Has 15 syntax errors and 97 undefined names

@aaltay
Copy link
Member

aaltay commented Feb 9, 2018

@cclauss I do not know the status of this PR. How about we first address issues with running flake8 with python2? Once those are resolved, we can target the python3 related issues.

@cclauss
Copy link

cclauss commented Feb 9, 2018

Once #4561 lands, the codebase will pass:

  • python2 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

@cclauss
Copy link

cclauss commented Jun 1, 2018

My sense is that most of this has been fixed in other PRs that have already been merged and in these four open PRs:

Given the number of conflicts, does it make sense to close this PR or at least rebase it?

@stale
Copy link

stale bot commented Jul 31, 2018

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jul 31, 2018
@holdenk
Copy link
Contributor Author

holdenk commented Aug 6, 2018

This PR isn't going anywhere since it's been forked out into a large number of smaller JIRAs/PRs :) Really excited to see Py3 support happen in this project :)

@holdenk holdenk closed this Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants