Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython-based message deserialization (or, how to speed up my python) #382

Merged
merged 70 commits into from Aug 14, 2015

Conversation

markflorisson
Copy link
Contributor

This PR adds accelerated versions of ProtocolHandler for message deserialization using Cython. By default the CythonProtocolHandler will be used if python-driver is compiled with Cython.

This work has been generously supported by SurveyMonkey.

@sontek
Copy link
Contributor

sontek commented Aug 10, 2015

@aholmberg @thobbs What you guys think about this?

@aholmberg
Copy link
Contributor

@sontek Looks pretty good to me. I've been looking at the work in progress so there's not too much surprising. Will be reviewing and testing here today.

# checking. Only string objects are supported; no Unicode objects
# should be passed.

from cassandra.buffer cimport Buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do when we're already in the module?

@aholmberg
Copy link
Contributor

Thanks very much for this contribution. It looks good overall. I put a few minor questions and comments in the diff.

Some other general stuff (I can handle on commit if you don't care to):

  • Possibly rename modules to include "_" for two-word names.
  • Prepend copyright header in all files
  • Add API docs for ProtocolHandler classes

The default is to use objparser.ListParser
"""
# TODO: It may be cleaner to turn ProtocolHandler and ResultMessage into
# TODO: instances and use methods instead of class methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably decide on this before introducing the API. I can do the refactor if you think it's worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably nice for dynamic instantiations of ProtocolHandler such as the Cython one, as the alternative is to use a closure which feels a little out of place. Are you planning to expose ProtocolHandler publicly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's exposed to the degree that it's specified in this config parameter. This pattern was chosen for simplicity there, and to match the connection_class parameter, which supports a similar pluggable approach using class types.
I could be convinced either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Either way works for me, do whatever feels best, as the closure
itself isn't a problem. I'll remove the leftover TODOs.

On Tue, Aug 11, 2015 at 9:32 PM Adam Holmberg notifications@github.com
wrote:

In cassandra/protocol.py
#382 (comment):

  • There are three Cython-based protocol handlers (least to most performant):
  •    1. objparser.ListParser
    
  •        this parser decodes result messages into a list of tuples
    
  •    2. objparser.LazyParser
    
  •        this parser decodes result messages lazily by returning an iterator
    
  •    3. numpyparser.NumPyParser
    
  •        this parser decodes result messages into NumPy arrays
    
  • The default is to use objparser.ListParser
  • """
  • TODO: It may be cleaner to turn ProtocolHandler and ResultMessage into

  • TODO: instances and use methods instead of class methods

Well, it's exposed to the degree that it's specified in this config
parameter
https://github.com/markflorisson/python-driver/blob/9240c71c02c27a6f20c73870a92ccdafab238df7/cassandra/cluster.py#L1524.
This pattern was chosen for simplicity there, and to match the
connection_class parameter, which supports a similar pluggable approach
using class types.
I could be convinced either way.


Reply to this email directly or view it on GitHub
https://github.com/datastax/python-driver/pull/382/files#r36794804.

@markflorisson
Copy link
Contributor Author

I added some API documentation (maybe we want some more in the official docs?) and did some cleanups, thanks for all the suggestions @aholmberg.

@coldeasy
Copy link
Contributor

Is this intended to work with python 2.7? Building locally fails for me on this version, but succeeds with python3.

@aholmberg
Copy link
Contributor

@coldeasy yes, it's intended to work with Python 2.7. That's the primary runtime I've been using to test. What are the symptoms on failure?

@coldeasy
Copy link
Contributor

Ah looks like I had an old version (0.19.2) of cython installed. Installing latest (0.23) fixed the issue. For reference, this was the error:

$ python2 setup.py build
Compiling cassandra/cython_marshal.pyx because it changed.
Compiling cassandra/cython_utils.pyx because it changed.
Compiling cassandra/deserializers.pyx because it changed.
Compiling cassandra/ioutils.pyx because it changed.
Compiling cassandra/numpy_parser.pyx because it changed.
Compiling cassandra/obj_parser.pyx because it changed.
Compiling cassandra/parsing.pyx because it changed.
Compiling cassandra/row_parser.pyx because it changed.
Cythonizing cassandra/cython_marshal.pyx

Error compiling Cython file:
------------------------------------------------------------
...
        return varint_unpack_py3(to_bytes(term))
    else:
        return varint_unpack_py2(to_bytes(term))

# TODO: Optimize these two functions
cdef varint_unpack_py3(bytes term):
    ^
------------------------------------------------------------

cassandra/cython_marshal.pyx:107:5: closures inside cdef functions not yet supported

@aholmberg
Copy link
Contributor

Thanks for following up.

On Tue, Aug 11, 2015 at 4:47 PM, Colin Deasy notifications@github.com
wrote:

Ah looks like I had an old version (0.19.2) of cython installed.
Installing latest (0.23) fixed the issue. For reference, this was the error:

$ python2 setup.py build
Compiling cassandra/cython_marshal.pyx because it changed.
Compiling cassandra/cython_utils.pyx because it changed.
Compiling cassandra/deserializers.pyx because it changed.
Compiling cassandra/ioutils.pyx because it changed.
Compiling cassandra/numpy_parser.pyx because it changed.
Compiling cassandra/obj_parser.pyx because it changed.
Compiling cassandra/parsing.pyx because it changed.
Compiling cassandra/row_parser.pyx because it changed.
Cythonizing cassandra/cython_marshal.pyx

Error compiling Cython file:

...
return varint_unpack_py3(to_bytes(term))
else:
return varint_unpack_py2(to_bytes(term))

TODO: Optimize these two functions

cdef varint_unpack_py3(bytes term):

^

cassandra/cython_marshal.pyx:107:5: closures inside cdef functions not yet supported


Reply to this email directly or view it on GitHub
#382 (comment)
.

http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature

@aholmberg
Copy link
Contributor

+1 here. Will merge pending input from our Test Eng stakeholders.

I have a small tweak I would like to introduce to facilitate tox unit testing, but that does not need to be done on this PR.

Thanks again for the contribution!

@aholmberg aholmberg merged commit ccc7c8b into datastax:master Aug 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants