Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of socket code #535

Conversation

petedmarsh
Copy link
Contributor

@petedmarsh petedmarsh commented Aug 22, 2023

This fixes an admitedly rare-in-practice issue where complete messages get stuck in a buffer but also speeds up the receiving of data from the socket in all cases.

The current socket code waits until the last received data ends with a linefeed. It's not gauarenteed that each messsage sent will be read by a single call to socket.recv - the message could be larger than the buffer, or several small messages could be read together. It's possible for complete messages to be stuck in the buffer until a call to recv returns bytes that end with a linefeed.

For the sake of illustrating the issue imagine the buffer is 16 bytes. We can .recv a full 16 bytes of data containing one complete and one partial message. (Note it's possible for this to happen even if the amount of received data is smaller than the buffer).

part = self._socket.recv(16)
part == b'{"a":"b"}\r\n{"d":'
# now part will be appended to data and {"a":"b"} will not be
# pushed to self._data immediately

This issue is fixed by using socket.makefile as it will split the data received on each linefeed.

self._socket_file is read-only, it could be made writable too and the places where writes are made to the socket could be changed to use it too, this would work. However, there's no issue reading from the socket using self._socket_file only and writing using the raw socket, and there's nothing to be gained by writing via self._socket_file so that has not been changed.

Benchmark

https://gist.github.com/petedmarsh/0a2775ec706156b892d40a67cb017bef

Results (Python 3.11.3)
                                                   Benchmarks, repeat=5, number=5
┌─────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────────────┬─────────────────┬─────────────────┐
│                                       Benchmark │ Min     │ Max     │ Mean    │ Min (+)         │ Max (+)         │ Mean (+)        │
├─────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────────────┼─────────────────┼─────────────────┤
│ Original Client Code vs socket.makefile version │ 5.195   │ 5.965   │ 5.641   │ 3.827 (1.4x)    │ 3.832 (1.6x)    │ 3.830 (1.5x)    │
└─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘

This fixes an admitedly rare-in-practice issue where complete messages
get stuck in a buffer but also speeds up the receiving of data from
the socket in all cases.

The current socket code waits until the last received data ends with a
linefeed. It's not gauarenteed that each messsage sent will be read by
a single call to `socket.recv` - the message could be larger than the
buffer, or several small messages could be read together. It's possible
for complete messages to be stuck in the buffer until a call to `recv`
returns bytes that end with a linefeed.

For the sake of illustrating the issue imagine the buffer is 16 bytes.
We can `.recv` a full 16 bytes of data containing one complete and one
partial message. (Note it's possible for this to happen even if the
amount of received data is smaller than the buffer).

    part = self._socket.recv(16)
    part == b'{"a":"b"}\r\n{"d":'
    # now part will be appended to data and {"a":"b"} will not be
    # pushed to self._data immediately

This issue is fixed by using socket.makefile as it will split the data
received on each linefeed.

`self._socket_file` is read-only, it could be made writable too and the
places where writes are made to the socket could be changed to use it
too, this would work. However, there's no issue reading from the socket
using `self._socket_file` only and writing using the raw socket, and
there's nothing to be gained by writing via `self._socket_file` so that
has not been changed.

Benchmark

https://gist.github.com/petedmarsh/0a2775ec706156b892d40a67cb017bef

    Results (Python 3.11.3)
                                                       Benchmarks, repeat=5, number=5
    ┌─────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────────────┬─────────────────┬─────────────────┐
    │                                       Benchmark │ Min     │ Max     │ Mean    │ Min (+)         │ Max (+)         │ Mean (+)        │
    ├─────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────────────┼─────────────────┼─────────────────┤
    │ Original Client Code vs socket.makefile version │ 5.195   │ 5.965   │ 5.641   │ 3.827 (1.4x)    │ 3.832 (1.6x)    │ 3.830 (1.5x)    │
    └─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘
@petedmarsh petedmarsh force-pushed the improve-performance-of-socket-code branch from 765fce1 to c5a3ad4 Compare August 22, 2023 12:16
data += part.decode(self.__encoding)
return data
# strip off trailing \r\n without allocating new bytearray
view = memoryview(message)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking this is not required, both json and orjson will happily parse JSON messages with trailing whitespace:

import json
import orjson

print(json.loads(b'{"a":"b"}\r\n'))
print(orjson.loads(b'{"a":"b"}\r\n'))

However currently self._data(...) is being called with the trailing whitespace stripped due to the previous received_data_raw.split(...). So I put this in place to make it trivial to compare the data being sent to self._data(...) and to avoid cryptically slightly changing behaviour.

@liampauling
Copy link
Member

Oh god, this is a scary PR, you on the slack yet?

@petedmarsh
Copy link
Contributor Author

petedmarsh commented Aug 22, 2023 via email

@liampauling
Copy link
Member

@mberk any thoughts on this?

@mberk
Copy link
Contributor

mberk commented Aug 23, 2023

Looks neat

I agree, it's pretty terrifying to be modifying this code. However, the changes logically make sense and the tests - particularly the benchmark - give confidence that everything will continue to work as expected

Some live testing by some brave volunteers would further add to the confidence in the changes

@petedmarsh
Copy link
Contributor Author

If it helps I have been running this patch myself with no issues

@liampauling liampauling changed the base branch from master to release/2.19.0 September 7, 2023 12:27
@liampauling liampauling merged commit 7b6a493 into betcode-org:release/2.19.0 Sep 7, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants