-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of socket code #535
Improve performance of socket code #535
Conversation
This fixes an admitedly rare-in-practice issue where complete messages get stuck in a buffer but also speeds up the receiving of data from the socket in all cases. The current socket code waits until the last received data ends with a linefeed. It's not gauarenteed that each messsage sent will be read by a single call to `socket.recv` - the message could be larger than the buffer, or several small messages could be read together. It's possible for complete messages to be stuck in the buffer until a call to `recv` returns bytes that end with a linefeed. For the sake of illustrating the issue imagine the buffer is 16 bytes. We can `.recv` a full 16 bytes of data containing one complete and one partial message. (Note it's possible for this to happen even if the amount of received data is smaller than the buffer). part = self._socket.recv(16) part == b'{"a":"b"}\r\n{"d":' # now part will be appended to data and {"a":"b"} will not be # pushed to self._data immediately This issue is fixed by using socket.makefile as it will split the data received on each linefeed. `self._socket_file` is read-only, it could be made writable too and the places where writes are made to the socket could be changed to use it too, this would work. However, there's no issue reading from the socket using `self._socket_file` only and writing using the raw socket, and there's nothing to be gained by writing via `self._socket_file` so that has not been changed. Benchmark https://gist.github.com/petedmarsh/0a2775ec706156b892d40a67cb017bef Results (Python 3.11.3) Benchmarks, repeat=5, number=5 ┌─────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────────────┬─────────────────┬─────────────────┐ │ Benchmark │ Min │ Max │ Mean │ Min (+) │ Max (+) │ Mean (+) │ ├─────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────────────┼─────────────────┼─────────────────┤ │ Original Client Code vs socket.makefile version │ 5.195 │ 5.965 │ 5.641 │ 3.827 (1.4x) │ 3.832 (1.6x) │ 3.830 (1.5x) │ └─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘
765fce1
to
c5a3ad4
Compare
data += part.decode(self.__encoding) | ||
return data | ||
# strip off trailing \r\n without allocating new bytearray | ||
view = memoryview(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking this is not required, both json
and orjson
will happily parse JSON messages with trailing whitespace:
import json
import orjson
print(json.loads(b'{"a":"b"}\r\n'))
print(orjson.loads(b'{"a":"b"}\r\n'))
However currently self._data(...)
is being called with the trailing whitespace stripped due to the previous received_data_raw.split(...)
. So I put this in place to make it trivial to compare the data being sent to self._data(...)
and to avoid cryptically slightly changing behaviour.
Oh god, this is a scary PR, you on the slack yet? |
I'm not on slack atm, but I can reply here or via email until I get round
to it.
The benchmark I linked includes a check that what's sent to `_data` is the
same which is of some comfort.
It's possible there is some new error condition thst I have missed but I
haven't been able to find one from reading the docs.
…On Tue, 22 Aug 2023, 14:38 Liam, ***@***.***> wrote:
Oh god, this is a scary PR, you on the slack yet?
—
Reply to this email directly, view it on GitHub
<#535 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJDC6UX2RB6V5IKUXSFEP3XWSR35ANCNFSM6AAAAAA3Z2IK2Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@mberk any thoughts on this? |
Looks neat I agree, it's pretty terrifying to be modifying this code. However, the changes logically make sense and the tests - particularly the benchmark - give confidence that everything will continue to work as expected Some live testing by some brave volunteers would further add to the confidence in the changes |
If it helps I have been running this patch myself with no issues |
This fixes an admitedly rare-in-practice issue where complete messages get stuck in a buffer but also speeds up the receiving of data from the socket in all cases.
The current socket code waits until the last received data ends with a linefeed. It's not gauarenteed that each messsage sent will be read by a single call to
socket.recv
- the message could be larger than the buffer, or several small messages could be read together. It's possible for complete messages to be stuck in the buffer until a call torecv
returns bytes that end with a linefeed.For the sake of illustrating the issue imagine the buffer is 16 bytes. We can
.recv
a full 16 bytes of data containing one complete and one partial message. (Note it's possible for this to happen even if the amount of received data is smaller than the buffer).This issue is fixed by using socket.makefile as it will split the data received on each linefeed.
self._socket_file
is read-only, it could be made writable too and the places where writes are made to the socket could be changed to use it too, this would work. However, there's no issue reading from the socket usingself._socket_file
only and writing using the raw socket, and there's nothing to be gained by writing viaself._socket_file
so that has not been changed.Benchmark
https://gist.github.com/petedmarsh/0a2775ec706156b892d40a67cb017bef