-
Notifications
You must be signed in to change notification settings - Fork 224
Port to python3. #231
Port to python3. #231
Conversation
This requires this PR to be merged: Parsely/testinstances#4 since testinstances wasn't python3 compatible either |
I also added support for py26 support to testinstances because |
You can see the successful py27 run: https://travis-ci.org/Parsely/pykafka/jobs/76518663 once the |
Hi @sontek, thanks for the pull request. I'll be able to take a deeper look at this tomorrow. We've already decided not to support 2.6, so the fact that it's still in |
@emmett9001 Yeah, I'll fix it up |
Thanks @sontek for your work! |
@vortec @emmett9001 Just pushed the fix that fixes everything from that latest big merge. Its back down to just the same test failure on gzip as it was having before. So its pretty close to being ready |
Thats the only failure I'm seeing |
Wow - that's a massive amount of work, kudos!! |
@yungchin Yeah but I built everything else to support not passing bytes and having it just figure it out, thats the one test that fails with those checks in place. Switching it to bytes fixes it but I feel like we shouldn't have to =) |
@yungchin @sontek how are you running the tests under multiple python interpreters? |
@emmett9001 I'm just running
|
@sontek This looks great to me. I've left a few comments, but generally I think this work is really sorely needed and you've done a nice thorough job implementing it. Thanks! I've managed to get all of the tests passing under 2.7, 3.4 and pypy on my machine. I think I'll need to add a note in the readme or in @kbourgoin take a look at this if you'd like. I think this is at worst neutral for users who don't care about versions other than 2.7, and it actively helps out other users. Importantly, it doesn't make anyone's user experience worse. |
In my opinion, pykafka shouldn't be concerned with what kind of payloads it If we convert protocol methods and the like to optionally accept unicode I do very much apologise for showing up nagging about stuff at such a late |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be more readable if the lambda just returns a plain tuple, without any bytes or str coercions.
Ok, have finally made it to the end of that massive diff. Major kudos for all the work. Running through it in more detail, I realised I was very ignorant when I said only Thanks!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so re "topic names as unicode": these tests should probably be required to work without this b
here.
@yungchin I 100% agree with you, I didn't want to support it everywhere but though backwards compatibility within all the APIs would be important but I definitely prefer only doing it at the producer/consumer level so I'll go through and change that up. |
There is actually one backwards incompatibility I introduced in this PR which is the CRC32 check, the original code wasn't doing the bitshifting which is necessary to work across all the different runtimes: https://github.com/Parsely/pykafka/pull/231/files#diff-d239c7fe52131e016d3cf2f1ef348427R207 This also makes it so we are getting a long back and not an int so the struct unpacking is assuming long now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for the pointer back here. I missed the change from !i
to !I
(ie signed vs unsigned?) when I was going over this the first time. I really don't understand the finer points here, but I suppose if it still works it means kafka is happy to accept the messages so it's probably all good.
Just to be sure though, let's ping the author :)
@kbourgoin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would only not work during the upgrade. i.e if you have data produced from the old client I don't believe the new client would be able to process it properly because the data types are different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the protocol spec, it defines this field as a signed int32. So I get the feeling that reverting this to !i
would be safer?
(Btw, not really in relation to the project in this pullreq, but when I git grep
for "crc32" I only find this line here - does that mean we never check the crc when unpacking messages for consume()
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it because the tests were larger than a signed int, you can see it doing something like:
>>> from binascii import crc32
>>> crc32(b'hello') & 0xffffffff
907060870
>>> 2147483647 > crc32(b'hello bar') & 0xffffffff
False
but even we were able to change it back we should have the backwards compatibility issue because the previous code wasn't doing the bit shifting so we'd have different results between python2 and python3:
Python2:
>>> crc32(b'hello bar')
-1737004441
>>> crc32(b'hello bar') & 0xffffffff
2557962855
Python3:
>>> from binascii import crc32
>>> crc32(b'hello bar')
2557962855
>>> crc32(b'hello bar') & 0xffffffff
2557962855
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I get how it works. Thanks for explaining all this so patiently.
Yeah, so with those examples I understand that packing it into the struct as if it were an unsigned field is all fine - it's still the same bit pattern. Thanks!
@sontek Looks like there are some more changes you'd like to make here? I'm interested in getting this merged as soon as possible, since it enables bugfixes on master to benefit from the testing setup and py3 support. The |
@emmett9001 I'd like to move it to not doing as many |
Closing this in favor of #246. With that pull request, @yungchin and I will be able to make progress on some of the last changes we've talked about here. @sontek, if you have more changes you'd like to make, you can open a new pull request against the sontek_port_to_pyhon34 branch. |
This makes all but one of the tests pass so far. I think its the closest to a workable version of py3 (based on looking at the other PRs) can someone like @emmett9001 @mwhooker or @kbourgoin review it?