New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace C code with Python. #347
Conversation
I'm quite curious about this. I originally had a Python-only implementation, and then rewrote in C the parts I identified as hotspots, and the performance gap was significant, but maybe something has changed inside In any case thanks a lot for looking into this, it's quite an undertaking! |
I'd be happy to gather more numbers about performance if it helps getting this merged. Do you have any particular benchmark in mind? |
My go-to "high level" benchmark would be to run the HTTP/3 server and client against each other and repeatedly transfer 10-50MB files. You can use the A more focused performance test would be to repeatedly run the AEAD / header protection code against some pre-determined 1500 byte payloads (possibly from the test suite). This will tell us what overhead we pay for the switch to Python. Final thoughts: using cryptography's "Binding" object is probably going to break somewhere down the line, as there seems to be a focus on moving away from OpenSSL and re-implementing primitives in Rust. We might have to move to cryptography's more high-level interfaces, but this is going to mean additional cost. PS: you're going to want to rebase on top of main, import setuptools
setuptools.setup() |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #347 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 22 24 +2
Lines 4553 4807 +254
==========================================
+ Hits 4553 4807 +254
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
It seems that [1] https://cryptography.io/en/latest/openssl/ |
Performance MeasurementsThree different speed comparison will be drawn: AEAD encrypt/decrypt operations, buffer pull/push operations and data transfer. AEAD operationsAveraged over 60000 repetitions, random data. Native
Python
Buffer operationsAveraged over 100000 repetitions, random data and random position. Native
Python
Data TransferAveraged over 10 repetitions, random data. Native
Python
ConclusionWhile native operations are faster, the Python-only code is still performing comparably fast. Since the buffer and crypto operation only differ in the 10^-7 and 10^-6 range, transferring 10MB of data is more or less the same, as these operations aren't called millions of times in a row to really accumulate. And as the 1MB sample - where the native solution is slower - shows, other more dominating factors (e.g. packet loss) come into play. Test ProgramsFor reference and to reproduce results, definitely not the cleanest code ;) AEADimport random
import time
from aioquic.quic.crypto import CryptoContext, CIPHER_SUITES
from aioquic.quic.packet import QuicProtocolVersion
import pandas
REPETITIONS=60000
PACKET_SIZES=[1, 10, 100, 1000]
CID_LEN=8
result: list[tuple[str, int, float, float]] = []
for cipher in CIPHER_SUITES:
for size in PACKET_SIZES:
ctx = CryptoContext()
ctx.setup(cipher, random.randbytes(32), QuicProtocolVersion.VERSION_1)
cid = random.randbytes(CID_LEN)
payload = random.randbytes(size)
encrypt = 0
decrypt = 0
for n in range(1, REPETITIONS):
header = b"\x41" + cid + int.to_bytes(n, 2, byteorder='big')
encrypt -= time.perf_counter()
data = ctx.encrypt_packet(header, payload, n)
encrypt += time.perf_counter()
decrypt -= time.perf_counter()
test_header, test_payload, _, _ = ctx.decrypt_packet(data, 1 + CID_LEN, n)
decrypt += time.perf_counter()
assert header == test_header
assert payload == test_payload
result.append((CIPHER_SUITES[cipher][0].decode(), size, encrypt / REPETITIONS, decrypt / REPETITIONS))
df = pandas.DataFrame(result, columns=['cipher', 'payload_size', 'avg_encrypt_time', 'avg_decrypt_time'])
df.to_csv('result.csv', index=False) Bufferimport random
import time
from aioquic.buffer import Buffer
import pandas
REPETITIONS = 100000
SIZE = 1000
orig_data = random.randbytes(SIZE)
buffer = Buffer(data=orig_data)
pull8 = 0
push8 = 0
pull16 = 0
push16 = 0
pull32 = 0
push32 = 0
pull64 = 0
push64 = 0
pull_var = 0
push_var = 0
for _ in range(REPETITIONS):
pos = random.randint(0, SIZE - 1)
buffer.seek(pos)
pull8 -= time.perf_counter()
val = buffer.pull_uint8()
pull8 += time.perf_counter()
buffer.seek(pos)
push8 -= time.perf_counter()
buffer.push_uint8(val)
push8 += time.perf_counter()
pos = random.randint(0, SIZE - 2)
buffer.seek(pos)
pull16 -= time.perf_counter()
val = buffer.pull_uint16()
pull16 += time.perf_counter()
buffer.seek(pos)
push16 -= time.perf_counter()
buffer.push_uint16(val)
push16 += time.perf_counter()
pos = random.randint(0, SIZE - 4)
buffer.seek(pos)
pull32 -= time.perf_counter()
val = buffer.pull_uint32()
pull32 += time.perf_counter()
buffer.seek(pos)
push32 -= time.perf_counter()
buffer.push_uint32(val)
push32 += time.perf_counter()
pos = random.randint(0, SIZE - 8)
buffer.seek(pos)
pull64 -= time.perf_counter()
val = buffer.pull_uint64()
pull64 += time.perf_counter()
buffer.seek(pos)
push64 -= time.perf_counter()
buffer.push_uint64(val)
push64 += time.perf_counter()
buffer.seek(SIZE)
assert buffer.data == orig_data
for _ in range(REPETITIONS):
pos = random.randint(0, SIZE - 8)
buffer.seek(pos)
pull_var -= time.perf_counter()
val = buffer.pull_uint_var()
pull_var += time.perf_counter()
buffer.seek(pos)
push_var -= time.perf_counter()
buffer.push_uint_var(val)
push_var += time.perf_counter()
# here the data can actually be different
df = pandas.DataFrame(
[
("8", push8 / REPETITIONS, pull8 / REPETITIONS),
("16", push16 / REPETITIONS, pull16 / REPETITIONS),
("32", push32 / REPETITIONS, pull32 / REPETITIONS),
("64", push64 / REPETITIONS, pull64 / REPETITIONS),
("var", push_var / REPETITIONS, pull_var / REPETITIONS),
],
columns=["bits", "push", "pull"],
)
df.to_csv("result.csv", index=False) Data Transferimport asyncio
from collections import deque
import random
import ssl
import time
import os
from typing import Optional, cast
from aioquic.asyncio.protocol import QuicConnectionProtocol
from aioquic.asyncio.server import QuicServer, serve
from aioquic.asyncio.client import connect
from aioquic.h3.connection import H3_ALPN, H3Connection
from aioquic.h3.events import DataReceived, H3Event, Headers, HeadersReceived
from aioquic.quic.configuration import QuicConfiguration
from aioquic.quic.events import ConnectionTerminated, ProtocolNegotiated, QuicEvent
import pandas
class ServerProtocol(QuicConnectionProtocol):
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
self._http: Optional[H3Connection] = None
def http_event_received(self, event: H3Event) -> None:
if isinstance(event, HeadersReceived):
assert self._http is not None
assert event.stream_ended
method = None
path = None
for name, value in event.headers:
if name == b":method":
method = value.decode()
elif name == b":path":
path = value.decode()
assert method == "GET"
assert path is not None
data = random.randbytes(int(path))
self._http.send_headers(
stream_id=event.stream_id,
headers=[
(b":status", b"200"),
(b"x-start", str(time.perf_counter()).encode()),
],
end_stream=False,
)
self._http.send_data(
stream_id=event.stream_id,
data=data,
end_stream=True,
)
def quic_event_received(self, event: QuicEvent) -> None:
if isinstance(event, ProtocolNegotiated):
assert event.alpn_protocol in H3_ALPN
self._http = H3Connection(self._quic)
if self._http is not None:
for http_event in self._http.handle_event(event):
self.http_event_received(http_event)
class ClientProtocol(QuicConnectionProtocol):
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
self._http = H3Connection(self._quic)
self._requests: dict[
int, tuple[Optional[float], bytearray, asyncio.Future[tuple[float, bytes]]]
] = {}
def http_event_received(self, event: H3Event) -> None:
if isinstance(event, (HeadersReceived, DataReceived)):
assert event.stream_id in self._requests
start, data, waiter = self._requests[event.stream_id]
if isinstance(event, HeadersReceived):
assert not event.stream_ended
assert start is None
status = None
start = None
for name, value in event.headers:
if name == b":status":
status = value
elif name == b"x-start":
start = float(value.decode())
assert status == b"200"
self._requests[event.stream_id] = start, data, waiter
else:
assert start is not None
data.extend(event.data)
if event.stream_ended:
del self._requests[event.stream_id]
waiter.set_result((time.perf_counter() - start, bytes(data)))
def quic_event_received(self, event: QuicEvent) -> None:
if isinstance(event, ConnectionTerminated):
for _, _, waiter in self._requests.values():
waiter.set_exception(Exception(event.reason_phrase))
self._requests.clear()
else:
for http_event in self._http.handle_event(event):
self.http_event_received(http_event)
async def request(self, size: int) -> tuple[float, bytes]:
stream_id = self._quic.get_next_available_stream_id()
self._http.send_headers(
stream_id=stream_id,
headers=[
(b":authority", AUTHORITY.encode()),
(b":method", b"GET"),
(b":path", str(size).encode()),
],
end_stream=True,
)
waiter = self._loop.create_future()
self._requests[stream_id] = None, bytearray(), waiter
self.transmit()
return await asyncio.shield(waiter)
HOST = "127.0.0.1"
AUTHORITY = "test"
PORT = 7733
REPETITIONS = 10
async def start_server() -> QuicServer:
def abspath(path: str) -> os.PathLike:
return os.path.join(os.path.dirname(__file__), path) # type: ignore
config = QuicConfiguration(
is_client=False,
alpn_protocols=H3_ALPN,
server_name=AUTHORITY,
)
config.load_cert_chain(
certfile=abspath("ssl_cert.pem"),
keyfile=abspath("ssl_key.pem"),
)
return await serve(
host=HOST,
port=PORT,
configuration=config,
create_protocol=ServerProtocol,
)
async def main():
server = await start_server()
async with connect(
host=HOST,
port=PORT,
configuration=QuicConfiguration(
is_client=True,
alpn_protocols=H3_ALPN,
verify_mode=ssl.CERT_NONE,
),
create_protocol=ClientProtocol,
) as proto:
client = cast(ClientProtocol, proto)
times = []
for size in (1, 1000, 1000000, 10000000):
print(f"Measure size {size}...")
total = 0
for _ in range(REPETITIONS):
duration, result = await client.request(size)
total += duration
assert len(result) == size
times.append((size, total / REPETITIONS))
print("Closing...")
client.close()
server.close()
df = pandas.DataFrame(times, columns=["size", "duration"])
df.to_csv("result.csv", index=False)
asyncio.run(main()) |
While I agree in principle, that using an internal API is a bad idea, I'd say in this case - and with the set of bindings that are being used - it's okay-ish. These bindings are (and have to be) used by |
FWIW, I did draft a version of this that uses the higher-level I would love to see at least a python-only-option, because for my use-case (web-platform-test test runner), performance is effectively not a concern. |
Since we have had Py_LIMITED_API wheels for a while I'm going to close this. |
This PR is an alternative approach to #342 to achieve universal wheels.
All C code is replaced by Python, using
memoryview
forBuffer
andcryptography.hazmat.bindings.openssl.binding
forAEAD
andHeaderProtection
.Also some additional checks for existing buffer overrun issues in the C code have been implemented.
As far as performance is concerned, running the full test suite is even a bit fast:
C code: 123.015s
Python code: 121.727s