Skip to content
This repository was archived by the owner on Mar 24, 2021. It is now read-only.

Conversation

sontek
Copy link
Contributor

@sontek sontek commented Aug 20, 2015

This makes all but one of the tests pass so far. I think its the closest to a workable version of py3 (based on looking at the other PRs) can someone like @emmett9001 @mwhooker or @kbourgoin review it?

@sontek
Copy link
Contributor Author

sontek commented Aug 20, 2015

This requires this PR to be merged: Parsely/testinstances#4 since testinstances wasn't python3 compatible either

@sontek
Copy link
Contributor Author

sontek commented Aug 20, 2015

I also added support for py26 support to testinstances because pykafka had py26 in its tox but the code currently isn't compatible with py26 (it uses dict comprehensions). I'm not really interested in 2.6 support but I left it in there in case someone just didn't realize it was broken.

@sontek
Copy link
Contributor Author

sontek commented Aug 20, 2015

You can see the successful py27 run:

https://travis-ci.org/Parsely/pykafka/jobs/76518663

once the testinstances stuff is merged and released we can re-run the build for py34

@emmettbutler
Copy link
Contributor

Hi @sontek, thanks for the pull request. I'll be able to take a deeper look at this tomorrow. We've already decided not to support 2.6, so the fact that it's still in tox is an error. We're very close to merging a complete overhaul of the Producer class, after which supporting python 3 will be my primary focus. I think it's important to get that PR merged first so we can update this branch to make it py3 compatible. We also need to figure out a reasonable workflow for running tests under 2.7 and 3 to make sure future code remains compatible with both, so if you have any thoughts on that, they're quite welcome.

@emmettbutler emmettbutler mentioned this pull request Aug 20, 2015
@emmettbutler
Copy link
Contributor

Now that #277 has been merged, this branch can be updated to make Producer compatible with python 3. @sontek, are you interested in helping out with that?

@sontek
Copy link
Contributor Author

sontek commented Aug 21, 2015

@emmett9001 Yeah, I'll fix it up

@vortec
Copy link
Contributor

vortec commented Aug 26, 2015

Thanks @sontek for your work!

@sontek
Copy link
Contributor Author

sontek commented Aug 27, 2015

@vortec @emmett9001 Just pushed the fix that fixes everything from that latest big merge. Its back down to just the same test failure on gzip as it was having before. So its pretty close to being ready

@sontek
Copy link
Contributor Author

sontek commented Aug 27, 2015

tests/pykafka/test_protocol.py::TestFetchAPI::test_gzip_decompression FAILED

Thats the only failure I'm seeing

@yungchin
Copy link
Contributor

Wow - that's a massive amount of work, kudos!!
Only had a brief look at the failing test just yet - I think it might be that msg needs to be bytes, not str?

@sontek
Copy link
Contributor Author

sontek commented Aug 27, 2015

@yungchin Yeah but I built everything else to support not passing bytes and having it just figure it out, thats the one test that fails with those checks in place. Switching it to bytes fixes it but I feel like we shouldn't have to =)

@emmettbutler
Copy link
Contributor

@yungchin @sontek how are you running the tests under multiple python interpreters? tox fails for me once it tries to install python-snappy under 3.4. Ever run into this? https://gist.github.com/emmett9001/ac5dae34fc6239f4dafd

@sontek
Copy link
Contributor Author

sontek commented Aug 27, 2015

@emmett9001 I'm just running tox -e py34 whats the error you are getting with snappy?

$ mktmpenv -p /usr/bin/python3.4 
Running virtualenv with interpreter /usr/bin/python3.4
Using base prefix '/usr'
New python executable in tmp-5da056596403ca4a/bin/python3.4
Also creating executable in tmp-5da056596403ca4a/bin/python
Installing setuptools, pip...done.
This is a temporary environment. It will be deleted when you run 'deactivate'.
(tmp-5da056596403ca4a)~/venvs/tmp-5da056596403ca4a

$ pip install python-snappy
You are using pip version 6.0.8, however version 7.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting python-snappy
  Using cached python-snappy-0.5.tar.gz
Installing collected packages: python-snappy
  Running setup.py install for python-snappy
    building '_snappy' extension
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -I/home/sontek/venvs/tmp-5da056596403ca4a/include/python3.4m -c snappymodule.cc -o build/temp.linux-x86_64-3.4/snappymodule.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -I/home/sontek/venvs/tmp-5da056596403ca4a/include/python3.4m -c crc32c.c -o build/temp.linux-x86_64-3.4/crc32c.o
    x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.4/snappymodule.o build/temp.linux-x86_64-3.4/crc32c.o -lsnappy -o build/lib.linux-x86_64-3.4/_snappy.cpython-34m.so
Successfully installed python-snappy-0.5

@emmettbutler
Copy link
Contributor

@sontek This looks great to me. I've left a few comments, but generally I think this work is really sorely needed and you've done a nice thorough job implementing it. Thanks!

I've managed to get all of the tests passing under 2.7, 3.4 and pypy on my machine. I think I'll need to add a note in the readme or in CONTRIBUTING about the preferred way to run tests - merging this will mean that all pull requests in the future will need to be tested against all three interpreters. The same goes for testinstances, which I see you've also taken into account.

@kbourgoin take a look at this if you'd like. I think this is at worst neutral for users who don't care about versions other than 2.7, and it actively helps out other users. Importantly, it doesn't make anyone's user experience worse.

@emmettbutler emmettbutler added this to the 2.1.0 milestone Aug 28, 2015
@emmettbutler emmettbutler self-assigned this Aug 28, 2015
@yungchin
Copy link
Contributor

@yungchin Yeah, the problem is backwards compatibility. All these
functions are exposed and probably used in clients already and they already
allow strings... so we should probably keep the compatibility and announce
it will be going away

I don't think I share the compatibility concern: we previously didn't
support py3, so these methods were only ever promised to handle py2 strs,
ie bytes, and we wouldn't be breaking any promise if we explicitly
changed that to mean bytes. A user who's porting to py3 will expect to
have to do some work when they were previously passing in str, so they
won't be surprised to have to swap to bytes (and that is presuming anyone
was calling protocol methods directly at all).

In my opinion, pykafka shouldn't be concerned with what kind of payloads it
is moving around. The kafka protocol treats messages as opaques. Pretty
much the only API point where it would make sense to me to optionally
accept unicode is Producer.produce().

If we convert protocol methods and the like to optionally accept unicode
(which if they're used for their intended purposes, they will never get to
see any of), I'm afraid I don't see much upside to that, and a lot of extra
if hasattr(value, 'encode') checks going on inside potentially tight
loops. More importantly though, sticking unicode handling under the covers
just makes me worry that we'd need to do a lot more testing, to make sure
we still handle binary payloads correctly.

I do very much apologise for showing up nagging about stuff at such a late
stage in this pullreq cycle, I should have stayed in the loop on this and
started whining sooner. I can see and very much appreciate how much careful
work already went into converting all those methods to swing both ways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more readable if the lambda just returns a plain tuple, without any bytes or str coercions.

@yungchin
Copy link
Contributor

Ok, have finally made it to the end of that massive diff. Major kudos for all the work.

Running through it in more detail, I realised I was very ignorant when I said only produce() should accept unicode: it would make a lot of sense of course to also let users specify topic names and consumer groups as unicode, and not slam them for forgetting a b all the time. But still only in the consumer and producer interfaces, imho. If those classes make sure the names are valid for kafka and would convert them to bytes before saving them on self, we could avoid having all the other coercions while packing message sets, I'd hope.

Thanks!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so re "topic names as unicode": these tests should probably be required to work without this b here.

@sontek
Copy link
Contributor Author

sontek commented Aug 28, 2015

@yungchin I 100% agree with you, I didn't want to support it everywhere but though backwards compatibility within all the APIs would be important but I definitely prefer only doing it at the producer/consumer level so I'll go through and change that up.

@sontek
Copy link
Contributor Author

sontek commented Aug 28, 2015

There is actually one backwards incompatibility I introduced in this PR which is the CRC32 check, the original code wasn't doing the bitshifting which is necessary to work across all the different runtimes:

https://github.com/Parsely/pykafka/pull/231/files#diff-d239c7fe52131e016d3cf2f1ef348427R207

This also makes it so we are getting a long back and not an int so the struct unpacking is assuming long now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the pointer back here. I missed the change from !i to !I (ie signed vs unsigned?) when I was going over this the first time. I really don't understand the finer points here, but I suppose if it still works it means kafka is happy to accept the messages so it's probably all good.

Just to be sure though, let's ping the author :)
@kbourgoin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would only not work during the upgrade. i.e if you have data produced from the old client I don't believe the new client would be able to process it properly because the data types are different

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the protocol spec, it defines this field as a signed int32. So I get the feeling that reverting this to !i would be safer?

(Btw, not really in relation to the project in this pullreq, but when I git grep for "crc32" I only find this line here - does that mean we never check the crc when unpacking messages for consume()?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it because the tests were larger than a signed int, you can see it doing something like:

>>> from binascii import crc32
>>> crc32(b'hello') & 0xffffffff
907060870
>>> 2147483647 > crc32(b'hello bar') & 0xffffffff
False

but even we were able to change it back we should have the backwards compatibility issue because the previous code wasn't doing the bit shifting so we'd have different results between python2 and python3:

Python2:

>>> crc32(b'hello bar')
-1737004441
>>> crc32(b'hello bar') & 0xffffffff
2557962855

Python3:

>>> from binascii import crc32
>>> crc32(b'hello bar')
2557962855
>>> crc32(b'hello bar') & 0xffffffff
2557962855

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I get how it works. Thanks for explaining all this so patiently.

Yeah, so with those examples I understand that packing it into the struct as if it were an unsigned field is all fine - it's still the same bit pattern. Thanks!

@emmettbutler
Copy link
Contributor

@sontek Looks like there are some more changes you'd like to make here? I'm interested in getting this merged as soon as possible, since it enables bugfixes on master to benefit from the testing setup and py3 support. The i/I backwards incompatibility is ok in my opinion, since folks who are upgrading can easily start consuming from the latest offset and skip all of the old incompatible data.

@sontek
Copy link
Contributor Author

sontek commented Sep 1, 2015

@emmett9001 I'd like to move it to not doing as many get_bytes calls and force bytes to be passed in to everything outside of producer/consumer but I probably wont have time to do it until Thursday (or this weekend). I'm ok with it being merged as-is or if you want to wait I'll get to it eventually

@emmettbutler
Copy link
Contributor

Closing this in favor of #246. With that pull request, @yungchin and I will be able to make progress on some of the last changes we've talked about here. @sontek, if you have more changes you'd like to make, you can open a new pull request against the sontek_port_to_pyhon34 branch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants