Fast xoring in python #686

socketpair · 2015-12-17T09:23:27Z

instead of

def _websocket_mask_python(mask, data):
    return bytes(b ^ mask[i % 4] for i, b in enumerate(data))

please use this:

native_byteorder = sys.byteorder

def _websocket_mask_python(mask, data):
    assert len(mask) == 4
    datalen = len(data)
    if datalen == 0:
        return b''  #  everything work without this, but may be changed later in python.
    data = int.from_bytes(data, native_byteorder)
    mask = int.from_bytes(mask * (datalen // 4) + mask[: datalen % 4], native_byteorder)
    return (data ^ mask).to_bytes(datalen, native_byteorder)

The text was updated successfully, but these errors were encountered:

kxepal · 2015-12-17T09:43:33Z

Why?

socketpair · 2015-12-17T09:43:43Z

this is MUCH faster

kxepal · 2015-12-17T09:43:49Z

How much?

jettify · 2015-12-17T09:44:49Z

Could you elaborate on this? Could you post benchmark results or insights why this works faster?

socketpair · 2015-12-17T09:44:55Z

depending on datasize, on my computer, from 1.1 to 31 times. The most "optimal" size for benchmarking is data size of 3 MB

socketpair · 2015-12-17T09:45:17Z

#!/usr/bin/python3.5
import sys
import itertools
from time import monotonic

native_byteorder = sys.byteorder

def _websocket_mask_python(mask, data):
    assert len(mask) == 4
    datalen = len(data)
    if datalen == 0:
        return b''  #  everything work without this, but may be changed later in python.
    data = int.from_bytes(data, native_byteorder)
    mask = int.from_bytes(mask * (datalen // 4) + mask[: datalen % 4], native_byteorder)
    #mask = int.from_bytes(itertools.islice(itertools.cycle(mask), 0, datalen), native_byteorder)
    return (data ^ mask).to_bytes(datalen, native_byteorder)

def _websocket_mask_python1(mask, data):
    return bytes(b ^ mask[i % 4] for i, b in enumerate(data))


data = b'f' * (1024*1024*3)

a = monotonic()
res = _websocket_mask_python(b'1234', data)
b = monotonic()

a1 = monotonic()
res1 = _websocket_mask_python1(b'1234', data)
b1 = monotonic()

print (res==res1, (b1-a1)/(b-a))

socketpair · 2015-12-17T09:46:47Z

it is faster even on one-byte message :)

kxepal · 2015-12-17T09:56:26Z

Looks not oblivious why more operations(and code) is faster than shorter version:

Python 3.4.3 (default, Nov 12 2015, 20:43:56)
...
In [14]: %timeit websocket_mask(b'1234', data)
1 loops, best of 3: 967 ms per loop

In [15]: %timeit websocket_mask2(b'1234', data)
10 loops, best of 3: 22.6 ms per loop

Do you have an explanation for this?

socketpair · 2015-12-17T10:01:23Z

sure, original implementation is executed in python, while my implementation is executed in C.

asvetlov · 2015-12-17T10:03:29Z

I don't care too much because cythonized version 7 times faster than your optimized python one.

You may make a pull request though.

kxepal · 2015-12-17T10:05:38Z

@socketpair But enumeration and bytes operations are not made in Python, right? Math is the same. What I see in yours is more function/method calls which means more work to do. Still curious.

socketpair · 2015-12-17T10:05:41Z

@asvetlov about cython, what you compare with what ?

jashandeep-sohi · 2015-12-17T10:06:37Z

@kxepal b ^ mask[i % 4] for i, b in enumerate(data) creates a new int for every byte.
@socketpair's version just creates one long int from the bytes and xors them.

socketpair · 2015-12-17T10:08:03Z

This is a rule: block operations are always faster than byte-by-byte.

kxepal · 2015-12-17T10:09:55Z

True, thanks!

socketpair · 2015-12-17T10:10:54Z

would this be merged so ?

socketpair · 2015-12-17T10:13:31Z

also, constructing big integer is cheap operation. xoring two bigintegers with same size is cheap too.

The most time (as I think) this function spent in constructing mask...I tried 4 approaches, maybe you know faster one ?

asvetlov · 2015-12-17T10:15:28Z

@socketpair I've compared your _websocket_mask_python with aiohttp.websocket._websocket_mask_python and aiohttp.websocket._websocket_mask_cython.
The last is 7 faster than your code.

Sorry, I cannot merge still nonexistent pull request :)

socketpair · 2015-12-17T10:53:22Z

will make pull request. Also I have improved performance of cython-based implementation :)

socketpair · 2015-12-17T11:00:28Z

I have installed pytest from pip...

AttributeError: module 'pytest' has no attribute 'raises_regexp' :(

asvetlov · 2015-12-17T12:35:14Z

Please install pytest-raisesregexp manually.
I don't want adding it to requirements.txt because I have a wish to drop raises_regexp usage at all -- the librry is not so useful as I expected.

asvetlov · 2015-12-17T13:13:06Z

Fixed by 66c4234

Weboscket XOR performance improved. Fixes #686

socketpair · 2015-12-18T13:57:18Z

oops :) race-condition :) :) I have some small improvements on that, but applied 15 second later then you merge PR

socketpair · 2015-12-18T14:03:27Z

Also, you have replaced size_t with uintmax_t it is not right.

uintmax_t is the largest unsigned type provided by the implementation.
size_t is the type of the result of the sizeof operator, big enough to hold the size of any object

And also, you did not import size_t :(. Please ask me to change something, instead of silently fixing in that in own commits.

socketpair · 2015-12-18T14:07:27Z

Rebased and updated branch

asvetlov · 2015-12-18T14:19:12Z

size_t == 8 for 64-bit systems.
uintmax_t is an alias for unsigned long int on 64-bit OS.
For 32-bit OS it's still defined and is an alias for unsigned long long int.
In other words it's always uint64 on both 32bit and 64bit systems.

asvetlov · 2015-12-18T14:24:29Z

If your latest update are missed in master branch -- please create new PR.
But I don't buy your latest changes.

I wrote about uintmax_t above.
Regarding forcing little endianess: I believe keeping native byteorder makes sense. Keep in mind hypothetical machine with big endian order. Forcing to reconvert all orders looks like performance hurt.

socketpair · 2015-12-18T14:37:56Z

I have check Python sources. endiannes in that function just change order in which bytes are read from bytearray, and nothing more. Since little-endiannes mean reading from begin to end, this is more optimal.

Since no interpretation to actual number values is done (i.e. no reconversion), there will be no performance hurt, but instead, since memory read left to right, performance improvements will be acieved :)

asvetlov · 2015-12-18T14:48:01Z

Memory is read by cache lines, not byte-by-byte.
I don't think it worth committing anyway.

socketpair · 2015-12-18T14:56:14Z

OK, agree, changes are not so significant

socketpair · 2016-01-08T16:39:49Z

Same issue: http://bugs.python.org/issue19251

cowlicks · 2016-02-18T15:46:59Z

FYI I wrote a patch for that python issue @socketpair linked to, to allow things like b'abc' ^ b'xyz'. But it needs more feedback if it is to be included.

lock · 2019-10-29T13:01:35Z

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

asvetlov closed this as completed in 8448545 Dec 18, 2015

asvetlov added a commit that referenced this issue Dec 18, 2015

Merge pull request #687 from socketpair/fastxor

88d006c

Weboscket XOR performance improved. Fixes #686

lock bot added the outdated label Oct 29, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast xoring in python #686

Fast xoring in python #686

socketpair commented Dec 17, 2015

kxepal commented Dec 17, 2015

socketpair commented Dec 17, 2015

kxepal commented Dec 17, 2015

jettify commented Dec 17, 2015

socketpair commented Dec 17, 2015

socketpair commented Dec 17, 2015

socketpair commented Dec 17, 2015

kxepal commented Dec 17, 2015

socketpair commented Dec 17, 2015

asvetlov commented Dec 17, 2015

kxepal commented Dec 17, 2015

socketpair commented Dec 17, 2015

jashandeep-sohi commented Dec 17, 2015

socketpair commented Dec 17, 2015

kxepal commented Dec 17, 2015

socketpair commented Dec 17, 2015

socketpair commented Dec 17, 2015

asvetlov commented Dec 17, 2015

socketpair commented Dec 17, 2015

socketpair commented Dec 17, 2015

asvetlov commented Dec 17, 2015

asvetlov commented Dec 17, 2015

socketpair commented Dec 18, 2015

socketpair commented Dec 18, 2015

socketpair commented Dec 18, 2015

asvetlov commented Dec 18, 2015

asvetlov commented Dec 18, 2015

socketpair commented Dec 18, 2015

asvetlov commented Dec 18, 2015

socketpair commented Dec 18, 2015

socketpair commented Jan 8, 2016

cowlicks commented Feb 18, 2016

lock bot commented Oct 29, 2019

Fast xoring in python #686

Fast xoring in python #686

Comments

socketpair commented Dec 17, 2015

kxepal commented Dec 17, 2015

socketpair commented Dec 17, 2015

kxepal commented Dec 17, 2015

jettify commented Dec 17, 2015

socketpair commented Dec 17, 2015

socketpair commented Dec 17, 2015

socketpair commented Dec 17, 2015

kxepal commented Dec 17, 2015

socketpair commented Dec 17, 2015

asvetlov commented Dec 17, 2015

kxepal commented Dec 17, 2015

socketpair commented Dec 17, 2015

jashandeep-sohi commented Dec 17, 2015

socketpair commented Dec 17, 2015

kxepal commented Dec 17, 2015

socketpair commented Dec 17, 2015

socketpair commented Dec 17, 2015

asvetlov commented Dec 17, 2015

socketpair commented Dec 17, 2015

socketpair commented Dec 17, 2015

asvetlov commented Dec 17, 2015

asvetlov commented Dec 17, 2015

socketpair commented Dec 18, 2015

socketpair commented Dec 18, 2015

socketpair commented Dec 18, 2015

asvetlov commented Dec 18, 2015

asvetlov commented Dec 18, 2015

socketpair commented Dec 18, 2015

asvetlov commented Dec 18, 2015

socketpair commented Dec 18, 2015

socketpair commented Jan 8, 2016

cowlicks commented Feb 18, 2016

lock bot commented Oct 29, 2019