Improve performance of socket code #535

petedmarsh · 2023-08-22T12:15:18Z

This fixes an admitedly rare-in-practice issue where complete messages get stuck in a buffer but also speeds up the receiving of data from the socket in all cases.

The current socket code waits until the last received data ends with a linefeed. It's not gauarenteed that each messsage sent will be read by a single call to socket.recv - the message could be larger than the buffer, or several small messages could be read together. It's possible for complete messages to be stuck in the buffer until a call to recv returns bytes that end with a linefeed.

For the sake of illustrating the issue imagine the buffer is 16 bytes. We can .recv a full 16 bytes of data containing one complete and one partial message. (Note it's possible for this to happen even if the amount of received data is smaller than the buffer).

part = self._socket.recv(16)
part == b'{"a":"b"}\r\n{"d":'
# now part will be appended to data and {"a":"b"} will not be
# pushed to self._data immediately

This issue is fixed by using socket.makefile as it will split the data received on each linefeed.

self._socket_file is read-only, it could be made writable too and the places where writes are made to the socket could be changed to use it too, this would work. However, there's no issue reading from the socket using self._socket_file only and writing using the raw socket, and there's nothing to be gained by writing via self._socket_file so that has not been changed.

Benchmark

https://gist.github.com/petedmarsh/0a2775ec706156b892d40a67cb017bef

Results (Python 3.11.3)
                                                   Benchmarks, repeat=5, number=5
┌─────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────────────┬─────────────────┬─────────────────┐
│                                       Benchmark │ Min     │ Max     │ Mean    │ Min (+)         │ Max (+)         │ Mean (+)        │
├─────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────────────┼─────────────────┼─────────────────┤
│ Original Client Code vs socket.makefile version │ 5.195   │ 5.965   │ 5.641   │ 3.827 (1.4x)    │ 3.832 (1.6x)    │ 3.830 (1.5x)    │
└─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘

This fixes an admitedly rare-in-practice issue where complete messages get stuck in a buffer but also speeds up the receiving of data from the socket in all cases. The current socket code waits until the last received data ends with a linefeed. It's not gauarenteed that each messsage sent will be read by a single call to `socket.recv` - the message could be larger than the buffer, or several small messages could be read together. It's possible for complete messages to be stuck in the buffer until a call to `recv` returns bytes that end with a linefeed. For the sake of illustrating the issue imagine the buffer is 16 bytes. We can `.recv` a full 16 bytes of data containing one complete and one partial message. (Note it's possible for this to happen even if the amount of received data is smaller than the buffer). part = self._socket.recv(16) part == b'{"a":"b"}\r\n{"d":' # now part will be appended to data and {"a":"b"} will not be # pushed to self._data immediately This issue is fixed by using socket.makefile as it will split the data received on each linefeed. `self._socket_file` is read-only, it could be made writable too and the places where writes are made to the socket could be changed to use it too, this would work. However, there's no issue reading from the socket using `self._socket_file` only and writing using the raw socket, and there's nothing to be gained by writing via `self._socket_file` so that has not been changed. Benchmark https://gist.github.com/petedmarsh/0a2775ec706156b892d40a67cb017bef Results (Python 3.11.3) Benchmarks, repeat=5, number=5 ┌─────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────────────┬─────────────────┬─────────────────┐ │ Benchmark │ Min │ Max │ Mean │ Min (+) │ Max (+) │ Mean (+) │ ├─────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────────────┼─────────────────┼─────────────────┤ │ Original Client Code vs socket.makefile version │ 5.195 │ 5.965 │ 5.641 │ 3.827 (1.4x) │ 3.832 (1.6x) │ 3.830 (1.5x) │ └─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘

petedmarsh · 2023-08-22T12:37:32Z

betfairlightweight/streaming/betfairstream.py

-            data += part.decode(self.__encoding)
-        return data
+        # strip off trailing \r\n without allocating new bytearray
+        view = memoryview(message)


Strictly speaking this is not required, both json and orjson will happily parse JSON messages with trailing whitespace:

import json import orjson print(json.loads(b'{"a":"b"}\r\n')) print(orjson.loads(b'{"a":"b"}\r\n'))

However currently self._data(...) is being called with the trailing whitespace stripped due to the previous received_data_raw.split(...). So I put this in place to make it trivial to compare the data being sent to self._data(...) and to avoid cryptically slightly changing behaviour.

liampauling · 2023-08-22T12:38:10Z

Oh god, this is a scary PR, you on the slack yet?

petedmarsh · 2023-08-22T12:47:55Z

I'm not on slack atm, but I can reply here or via email until I get round to it. The benchmark I linked includes a check that what's sent to `_data` is the same which is of some comfort. It's possible there is some new error condition thst I have missed but I haven't been able to find one from reading the docs.

…

On Tue, 22 Aug 2023, 14:38 Liam, ***@***.***> wrote: Oh god, this is a scary PR, you on the slack yet? — Reply to this email directly, view it on GitHub <#535 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJDC6UX2RB6V5IKUXSFEP3XWSR35ANCNFSM6AAAAAA3Z2IK2Q> . You are receiving this because you authored the thread.Message ID: ***@***.***>

liampauling · 2023-08-23T07:29:16Z

@mberk any thoughts on this?

mberk · 2023-08-23T14:37:21Z

Looks neat

I agree, it's pretty terrifying to be modifying this code. However, the changes logically make sense and the tests - particularly the benchmark - give confidence that everything will continue to work as expected

Some live testing by some brave volunteers would further add to the confidence in the changes

petedmarsh · 2023-08-23T18:53:27Z

If it helps I have been running this patch myself with no issues

petedmarsh force-pushed the improve-performance-of-socket-code branch from 765fce1 to c5a3ad4 Compare August 22, 2023 12:16

petedmarsh commented Aug 22, 2023

View reviewed changes

liampauling changed the base branch from master to release/2.19.0 September 7, 2023 12:27

liampauling merged commit 7b6a493 into betcode-org:release/2.19.0 Sep 7, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of socket code #535

Improve performance of socket code #535

petedmarsh commented Aug 22, 2023 •

edited

Loading

petedmarsh Aug 22, 2023

liampauling commented Aug 22, 2023

petedmarsh commented Aug 22, 2023 via email

liampauling commented Aug 23, 2023

mberk commented Aug 23, 2023

petedmarsh commented Aug 23, 2023

Improve performance of socket code #535

Improve performance of socket code #535

Conversation

petedmarsh commented Aug 22, 2023 • edited Loading

petedmarsh Aug 22, 2023

Choose a reason for hiding this comment

liampauling commented Aug 22, 2023

petedmarsh commented Aug 22, 2023 via email

liampauling commented Aug 23, 2023

mberk commented Aug 23, 2023

petedmarsh commented Aug 23, 2023

petedmarsh commented Aug 22, 2023 •

edited

Loading