New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cython-based message deserialization (or, how to speed up my python) #382
Conversation
@aholmberg @thobbs What you guys think about this? |
@sontek Looks pretty good to me. I've been looking at the work in progress so there's not too much surprising. Will be reviewing and testing here today. |
# checking. Only string objects are supported; no Unicode objects | ||
# should be passed. | ||
|
||
from cassandra.buffer cimport Buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do when we're already in the module?
Thanks very much for this contribution. It looks good overall. I put a few minor questions and comments in the diff. Some other general stuff (I can handle on commit if you don't care to):
|
The default is to use objparser.ListParser | ||
""" | ||
# TODO: It may be cleaner to turn ProtocolHandler and ResultMessage into | ||
# TODO: instances and use methods instead of class methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably decide on this before introducing the API. I can do the refactor if you think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably nice for dynamic instantiations of ProtocolHandler such as the Cython one, as the alternative is to use a closure which feels a little out of place. Are you planning to expose ProtocolHandler publicly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's exposed to the degree that it's specified in this config parameter. This pattern was chosen for simplicity there, and to match the connection_class parameter, which supports a similar pluggable approach using class types.
I could be convinced either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Either way works for me, do whatever feels best, as the closure
itself isn't a problem. I'll remove the leftover TODOs.
On Tue, Aug 11, 2015 at 9:32 PM Adam Holmberg notifications@github.com
wrote:
In cassandra/protocol.py
#382 (comment):
- There are three Cython-based protocol handlers (least to most performant):
1. objparser.ListParser
this parser decodes result messages into a list of tuples
2. objparser.LazyParser
this parser decodes result messages lazily by returning an iterator
3. numpyparser.NumPyParser
this parser decodes result messages into NumPy arrays
- The default is to use objparser.ListParser
- """
TODO: It may be cleaner to turn ProtocolHandler and ResultMessage into
TODO: instances and use methods instead of class methods
Well, it's exposed to the degree that it's specified in this config
parameter
https://github.com/markflorisson/python-driver/blob/9240c71c02c27a6f20c73870a92ccdafab238df7/cassandra/cluster.py#L1524.
This pattern was chosen for simplicity there, and to match the
connection_class parameter, which supports a similar pluggable approach
using class types.
I could be convinced either way.—
Reply to this email directly or view it on GitHub
https://github.com/datastax/python-driver/pull/382/files#r36794804.
I added some API documentation (maybe we want some more in the official docs?) and did some cleanups, thanks for all the suggestions @aholmberg. |
Is this intended to work with python 2.7? Building locally fails for me on this version, but succeeds with python3. |
@coldeasy yes, it's intended to work with Python 2.7. That's the primary runtime I've been using to test. What are the symptoms on failure? |
Ah looks like I had an old version (0.19.2) of cython installed. Installing latest (0.23) fixed the issue. For reference, this was the error: $ python2 setup.py build
Compiling cassandra/cython_marshal.pyx because it changed.
Compiling cassandra/cython_utils.pyx because it changed.
Compiling cassandra/deserializers.pyx because it changed.
Compiling cassandra/ioutils.pyx because it changed.
Compiling cassandra/numpy_parser.pyx because it changed.
Compiling cassandra/obj_parser.pyx because it changed.
Compiling cassandra/parsing.pyx because it changed.
Compiling cassandra/row_parser.pyx because it changed.
Cythonizing cassandra/cython_marshal.pyx
Error compiling Cython file:
------------------------------------------------------------
...
return varint_unpack_py3(to_bytes(term))
else:
return varint_unpack_py2(to_bytes(term))
# TODO: Optimize these two functions
cdef varint_unpack_py3(bytes term):
^
------------------------------------------------------------
cassandra/cython_marshal.pyx:107:5: closures inside cdef functions not yet supported |
Thanks for following up. On Tue, Aug 11, 2015 at 4:47 PM, Colin Deasy notifications@github.com
|
+1 here. Will merge pending input from our Test Eng stakeholders. I have a small tweak I would like to introduce to facilitate tox unit testing, but that does not need to be done on this PR. Thanks again for the contribution! |
This PR adds accelerated versions of ProtocolHandler for message deserialization using Cython. By default the CythonProtocolHandler will be used if python-driver is compiled with Cython.
This work has been generously supported by SurveyMonkey.