Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert code to Cython #24

Open
hltbra opened this issue May 24, 2019 · 2 comments
Open

Convert code to Cython #24

hltbra opened this issue May 24, 2019 · 2 comments

Comments

@hltbra
Copy link
Contributor

hltbra commented May 24, 2019

I ran a few experiments with Cython and saw a significant performance increase by mostly annotating a few files with static types.

I thought the asyncore code was the problem, but glued the Redis ae.c/anet.c libraries with Python but didn't see the performance improvement I was expecting. There's room for improvement with the way I am creating the parsers and buffers, but I think Cython would be a lot simpler. The C code can be found at https://gist.github.com/hltbra/1f2266b96c50700cc17a10115634e55f

@hltbra
Copy link
Contributor Author

hltbra commented May 29, 2019

Experiment I ran annotating parser.pyx off the experiment/lmdb branch:

# parser.pyx

cdef class Parser(object):

    cdef bytearray _buffer
    cdef int _buffer_pos
    cdef object _read_fn
    cdef int MAX_BUFSIZE
    cdef str CRLF
    cdef int CRLF_LEN

    def __cinit__(self, read_fn):
        self._buffer = bytearray()
        self._buffer_pos = 0
        self._read_fn = read_fn
        self.MAX_BUFSIZE = 1024 * 1024
        self.CRLF = '\r\n'
        self.CRLF_LEN = 2

    cdef bytearray _readline(self):
        cdef bytearray result
        cdef bytearray buffer
        cdef int crlf_position

        buffer = self._buffer[self._buffer_pos:]
        crlf_position = buffer.find(self.CRLF)
        if crlf_position == -1:
            raise StopIteration()
        result = buffer[:crlf_position]
        self._buffer_pos += crlf_position + self.CRLF_LEN
        return result

    cdef void _read_into_buffer(self):
        cdef bytes data

        # FIXME: implement a maximum size for the buffer to prevent a crash due to bad clients
        data = self._read_fn(self.MAX_BUFSIZE)
        self._buffer.extend(data)

    cdef bytearray _read(self, int n_bytes):
        cdef bytearray result
        cdef bytearray buffer

        buffer = self._buffer[self._buffer_pos:]
        if len(buffer) < n_bytes:
            raise StopIteration()
        result = buffer[:n_bytes]
        # FIXME: ensure self.CRLF is next
        self._buffer_pos += n_bytes + self.CRLF_LEN
        return result

    def get_instructions(self):
        cdef bytearray instructions
        cdef int array_length
        cdef list instruction_set
        cdef bytearray line
        cdef str instruction

        self._read_into_buffer()
        while self._buffer:
            self._buffer_pos = 0
            instructions = self._readline()
            # the Redis protocol says that all commands are arrays, however,
            # Redis's own tests have commands like PING being sent as a Simple String
            if instructions.startswith('+'):
                self._buffer = self._buffer[self._buffer_pos:]
                yield str(instructions[1:].strip()).split()
            elif instructions.startswith('*'):
                # array of instructions
                array_length = int(instructions[1:])  # skip '*' char
                instruction_set = []
                for _ in range(array_length):
                    line = self._readline()
                    str_len = int(line[1:])  # skip '$' char
                    instruction = str(self._read(str_len))
                    instruction_set.append(instruction)
                self._buffer = self._buffer[self._buffer_pos:]
                yield instruction_set
            else:
                # inline instructions, saw them in the Redis tests
                for line in instructions.split(self.CRLF):
                    self._buffer = self._buffer[self._buffer_pos:]
                    yield str(line.strip()).split()
# setup.py

from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize(["dredis/parser.pyx"])
)

Before Cython:

$ ./redis-benchmark -r 10000 -n 10000 -q -t ping
PING_INLINE: 18903.59 requests per second
PING_BULK: 17035.78 requests per second

After Cython:

$ ./redis-benchmark -r 10000 -n 10000 -q -t ping
PING_INLINE: 20325.20 requests per second
PING_BULK: 19880.71 requests per second

Takeaway: ~7% improvement by annotating only the parser file.

@hltbra
Copy link
Contributor Author

hltbra commented May 29, 2019

Sorry, I had cProfile enabled and it messed the results. Here's it corrected (without cProfile):

# before cython

$ ./redis-benchmark -r 10000 -n 10000 -q -t ping
PING_INLINE: 26525.20 requests per second
PING_BULK: 23980.82 requests per second
# after cython
$ ./redis-benchmark -r 10000 -n 10000 -q -t ping
PING_INLINE: 27777.78 requests per second
PING_BULK: 27472.53 requests per second

~4-5% improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant