Port to python3. #231

sontek · 2015-08-20T19:12:53Z

This makes all but one of the tests pass so far. I think its the closest to a workable version of py3 (based on looking at the other PRs) can someone like @emmett9001 @mwhooker or @kbourgoin review it?

sontek · 2015-08-20T19:13:23Z

This requires this PR to be merged: Parsely/testinstances#4 since testinstances wasn't python3 compatible either

sontek · 2015-08-20T19:42:23Z

I also added support for py26 support to testinstances because pykafka had py26 in its tox but the code currently isn't compatible with py26 (it uses dict comprehensions). I'm not really interested in 2.6 support but I left it in there in case someone just didn't realize it was broken.

sontek · 2015-08-20T19:44:06Z

You can see the successful py27 run:

https://travis-ci.org/Parsely/pykafka/jobs/76518663

once the testinstances stuff is merged and released we can re-run the build for py34

emmettbutler · 2015-08-20T23:13:33Z

Hi @sontek, thanks for the pull request. I'll be able to take a deeper look at this tomorrow. We've already decided not to support 2.6, so the fact that it's still in tox is an error. We're very close to merging a complete overhaul of the Producer class, after which supporting python 3 will be my primary focus. I think it's important to get that PR merged first so we can update this branch to make it py3 compatible. We also need to figure out a reasonable workflow for running tests under 2.7 and 3 to make sure future code remains compatible with both, so if you have any thoughts on that, they're quite welcome.

emmettbutler · 2015-08-21T21:20:13Z

Now that #277 has been merged, this branch can be updated to make Producer compatible with python 3. @sontek, are you interested in helping out with that?

sontek · 2015-08-21T21:44:23Z

@emmett9001 Yeah, I'll fix it up

vortec · 2015-08-26T07:31:19Z

Thanks @sontek for your work!

a couple more python3 compat uses

sontek · 2015-08-27T22:59:53Z

@vortec @emmett9001 Just pushed the fix that fixes everything from that latest big merge. Its back down to just the same test failure on gzip as it was having before. So its pretty close to being ready

sontek · 2015-08-27T23:00:08Z

tests/pykafka/test_protocol.py::TestFetchAPI::test_gzip_decompression FAILED

Thats the only failure I'm seeing

yungchin · 2015-08-27T23:08:24Z

Wow - that's a massive amount of work, kudos!!
Only had a brief look at the failing test just yet - I think it might be that msg needs to be bytes, not str?

sontek · 2015-08-27T23:25:08Z

@yungchin Yeah but I built everything else to support not passing bytes and having it just figure it out, thats the one test that fails with those checks in place. Switching it to bytes fixes it but I feel like we shouldn't have to =)

emmettbutler · 2015-08-27T23:28:20Z

@yungchin @sontek how are you running the tests under multiple python interpreters? tox fails for me once it tries to install python-snappy under 3.4. Ever run into this? https://gist.github.com/emmett9001/ac5dae34fc6239f4dafd

sontek · 2015-08-27T23:30:22Z

@emmett9001 I'm just running tox -e py34 whats the error you are getting with snappy?

$ mktmpenv -p /usr/bin/python3.4 
Running virtualenv with interpreter /usr/bin/python3.4
Using base prefix '/usr'
New python executable in tmp-5da056596403ca4a/bin/python3.4
Also creating executable in tmp-5da056596403ca4a/bin/python
Installing setuptools, pip...done.
This is a temporary environment. It will be deleted when you run 'deactivate'.
(tmp-5da056596403ca4a)~/venvs/tmp-5da056596403ca4a

$ pip install python-snappy
You are using pip version 6.0.8, however version 7.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting python-snappy
  Using cached python-snappy-0.5.tar.gz
Installing collected packages: python-snappy
  Running setup.py install for python-snappy
    building '_snappy' extension
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -I/home/sontek/venvs/tmp-5da056596403ca4a/include/python3.4m -c snappymodule.cc -o build/temp.linux-x86_64-3.4/snappymodule.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -I/home/sontek/venvs/tmp-5da056596403ca4a/include/python3.4m -c crc32c.c -o build/temp.linux-x86_64-3.4/crc32c.o
    x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.4/snappymodule.o build/temp.linux-x86_64-3.4/crc32c.o -lsnappy -o build/lib.linux-x86_64-3.4/_snappy.cpython-34m.so
Successfully installed python-snappy-0.5

emmettbutler · 2015-08-28T00:50:44Z

@sontek This looks great to me. I've left a few comments, but generally I think this work is really sorely needed and you've done a nice thorough job implementing it. Thanks!

I've managed to get all of the tests passing under 2.7, 3.4 and pypy on my machine. I think I'll need to add a note in the readme or in CONTRIBUTING about the preferred way to run tests - merging this will mean that all pull requests in the future will need to be tested against all three interpreters. The same goes for testinstances, which I see you've also taken into account.

@kbourgoin take a look at this if you'd like. I think this is at worst neutral for users who don't care about versions other than 2.7, and it actively helps out other users. Importantly, it doesn't make anyone's user experience worse.

yungchin · 2015-08-28T11:55:57Z

@yungchin Yeah, the problem is backwards compatibility. All these
functions are exposed and probably used in clients already and they already
allow strings... so we should probably keep the compatibility and announce
it will be going away

I don't think I share the compatibility concern: we previously didn't
support py3, so these methods were only ever promised to handle py2 strs,
ie bytes, and we wouldn't be breaking any promise if we explicitly
changed that to mean bytes. A user who's porting to py3 will expect to
have to do some work when they were previously passing in str, so they
won't be surprised to have to swap to bytes (and that is presuming anyone
was calling protocol methods directly at all).

In my opinion, pykafka shouldn't be concerned with what kind of payloads it
is moving around. The kafka protocol treats messages as opaques. Pretty
much the only API point where it would make sense to me to optionally
accept unicode is Producer.produce().

If we convert protocol methods and the like to optionally accept unicode
(which if they're used for their intended purposes, they will never get to
see any of), I'm afraid I don't see much upside to that, and a lot of extra
if hasattr(value, 'encode') checks going on inside potentially tight
loops. More importantly though, sticking unicode handling under the covers
just makes me worry that we'd need to do a lot more testing, to make sure
we still handle binary payloads correctly.

I do very much apologise for showing up nagging about stuff at such a late
stage in this pullreq cycle, I should have stayed in the loop on this and
started whining sooner. I can see and very much appreciate how much careful
work already went into converting all those methods to swing both ways.

yungchin · 2015-08-28T12:24:21Z

pykafka/balancedconsumer.py

This might be more readable if the lambda just returns a plain tuple, without any bytes or str coercions.

yungchin · 2015-08-28T14:03:36Z

Ok, have finally made it to the end of that massive diff. Major kudos for all the work.

Running through it in more detail, I realised I was very ignorant when I said only produce() should accept unicode: it would make a lot of sense of course to also let users specify topic names and consumer groups as unicode, and not slam them for forgetting a b all the time. But still only in the consumer and producer interfaces, imho. If those classes make sure the names are valid for kafka and would convert them to bytes before saving them on self, we could avoid having all the other coercions while packing message sets, I'd hope.

Thanks!!

yungchin · 2015-08-28T14:05:53Z

tests/pykafka/test_balancedconsumer.py

Right, so re "topic names as unicode": these tests should probably be required to work without this b here.

sontek · 2015-08-28T16:03:25Z

@yungchin I 100% agree with you, I didn't want to support it everywhere but though backwards compatibility within all the APIs would be important but I definitely prefer only doing it at the producer/consumer level so I'll go through and change that up.

sontek · 2015-08-28T16:04:48Z

There is actually one backwards incompatibility I introduced in this PR which is the CRC32 check, the original code wasn't doing the bitshifting which is necessary to work across all the different runtimes:

https://github.com/Parsely/pykafka/pull/231/files#diff-d239c7fe52131e016d3cf2f1ef348427R207

This also makes it so we are getting a long back and not an int so the struct unpacking is assuming long now.

yungchin · 2015-08-28T21:55:27Z

pykafka/protocol.py

Ah thanks for the pointer back here. I missed the change from !i to !I (ie signed vs unsigned?) when I was going over this the first time. I really don't understand the finer points here, but I suppose if it still works it means kafka is happy to accept the messages so it's probably all good.

Just to be sure though, let's ping the author :)
@kbourgoin

I think this would only not work during the upgrade. i.e if you have data produced from the old client I don't believe the new client would be able to process it properly because the data types are different

Looking at the protocol spec, it defines this field as a signed int32. So I get the feeling that reverting this to !i would be safer?

(Btw, not really in relation to the project in this pullreq, but when I git grep for "crc32" I only find this line here - does that mean we never check the crc when unpacking messages for consume()?)

I changed it because the tests were larger than a signed int, you can see it doing something like:

>>> from binascii import crc32 >>> crc32(b'hello') & 0xffffffff 907060870 >>> 2147483647 > crc32(b'hello bar') & 0xffffffff False

but even we were able to change it back we should have the backwards compatibility issue because the previous code wasn't doing the bit shifting so we'd have different results between python2 and python3:

Python2:

>>> crc32(b'hello bar') -1737004441 >>> crc32(b'hello bar') & 0xffffffff 2557962855

Python3:

>>> from binascii import crc32 >>> crc32(b'hello bar') 2557962855 >>> crc32(b'hello bar') & 0xffffffff 2557962855

Ah, now I get how it works. Thanks for explaining all this so patiently.

Yeah, so with those examples I understand that packing it into the struct as if it were an unsigned field is all fine - it's still the same bit pattern. Thanks!

emmettbutler · 2015-09-01T18:24:15Z

@sontek Looks like there are some more changes you'd like to make here? I'm interested in getting this merged as soon as possible, since it enables bugfixes on master to benefit from the testing setup and py3 support. The i/I backwards incompatibility is ok in my opinion, since folks who are upgrading can easily start consuming from the latest offset and skip all of the old incompatible data.

sontek · 2015-09-01T19:43:52Z

@emmett9001 I'd like to move it to not doing as many get_bytes calls and force bytes to be passed in to everything outside of producer/consumer but I probably wont have time to do it until Thursday (or this weekend). I'm ok with it being merged as-is or if you want to wait I'll get to it eventually

emmettbutler · 2015-09-01T20:46:26Z

Closing this in favor of #246. With that pull request, @yungchin and I will be able to make progress on some of the last changes we've talked about here. @sontek, if you have more changes you'd like to make, you can open a new pull request against the sontek_port_to_pyhon34 branch.

sontek added 3 commits August 19, 2015 16:40

Started porting everything to py3

868656d

More work towards getting py34 compliant

eac55b9

Got all but 1 test passing

8bfeeea

sontek added 5 commits August 20, 2015 12:17

Run tox on travis

da1f2ec

Don't use sudo

a9edcce

use the addons infra

4fce8ac

Need kazaa for test instance

17c0428

Include tox

5fc4d72

Give pypy mock

fac6b88

emmettbutler mentioned this pull request Aug 20, 2015

Python 3 Support #120

Closed

vortec and others added 4 commits August 27, 2015 15:51

a couple more python3 compat uses

64fd8b0

Merge pull request #1 from vortec/port_to_python34

35b0a19

a couple more python3 compat uses

Merge remote-tracking branch 'origin/master' into port_to_python34

a7768b6

Fixed merge conflicts and updated the py3 support

1bf5419

sontek added 2 commits August 27, 2015 16:05

remove py26

1d37b1d

Missed py26 on travis

e6c2b7d

make gzip test pass

e0d80fc

emmettbutler added enhancement needs review labels Aug 28, 2015

emmettbutler added this to the 2.1.0 milestone Aug 28, 2015

emmettbutler self-assigned this Aug 28, 2015

yungchin reviewed Aug 28, 2015
View reviewed changes

rduplain mentioned this pull request Sep 1, 2015

Fully support Python 3. Parsely/birding#1

Open

rduplain added in progress and removed needs review labels Sep 1, 2015

emmettbutler mentioned this pull request Sep 1, 2015

[sontek] Port to python 3.4 #246

Merged

emmettbutler closed this Sep 1, 2015

amontalenti removed the in progress label Sep 1, 2015

emmettbutler mentioned this pull request Dec 23, 2015

Unicode support #405

Closed

Port to python3. #231

Port to python3. #231

Uh oh!

Conversation

sontek commented Aug 20, 2015

Uh oh!

sontek commented Aug 20, 2015

Uh oh!

sontek commented Aug 20, 2015

Uh oh!

sontek commented Aug 20, 2015

Uh oh!

emmettbutler commented Aug 20, 2015

Uh oh!

emmettbutler commented Aug 21, 2015

Uh oh!

sontek commented Aug 21, 2015

Uh oh!

vortec commented Aug 26, 2015

Uh oh!

sontek commented Aug 27, 2015

Uh oh!

sontek commented Aug 27, 2015

Uh oh!

yungchin commented Aug 27, 2015

Uh oh!

sontek commented Aug 27, 2015

Uh oh!

emmettbutler commented Aug 27, 2015

Uh oh!

sontek commented Aug 27, 2015

Uh oh!

emmettbutler commented Aug 28, 2015

Uh oh!

yungchin commented Aug 28, 2015

Uh oh!

yungchin Aug 28, 2015

Choose a reason for hiding this comment

Uh oh!

yungchin commented Aug 28, 2015

Uh oh!

yungchin Aug 28, 2015

Choose a reason for hiding this comment

Uh oh!

sontek commented Aug 28, 2015

Uh oh!

sontek commented Aug 28, 2015

Uh oh!

yungchin Aug 28, 2015

Choose a reason for hiding this comment

Uh oh!

sontek Aug 28, 2015

Choose a reason for hiding this comment

Uh oh!

yungchin Aug 28, 2015

Choose a reason for hiding this comment

Uh oh!

sontek Aug 28, 2015

Choose a reason for hiding this comment

Uh oh!

yungchin Aug 28, 2015

Choose a reason for hiding this comment

Uh oh!

emmettbutler commented Sep 1, 2015

Uh oh!

sontek commented Sep 1, 2015

Uh oh!

emmettbutler commented Sep 1, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants