Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace C code with Python. #347

Closed
wants to merge 15 commits into from
Closed

Conversation

meitinger
Copy link
Contributor

This PR is an alternative approach to #342 to achieve universal wheels.
All C code is replaced by Python, using memoryview for Buffer and cryptography.hazmat.bindings.openssl.binding for AEAD and HeaderProtection.
Also some additional checks for existing buffer overrun issues in the C code have been implemented.

As far as performance is concerned, running the full test suite is even a bit fast:
C code: 123.015s
Python code: 121.727s

@jlaine
Copy link
Contributor

jlaine commented Apr 2, 2023

I'm quite curious about this. I originally had a Python-only implementation, and then rewrote in C the parts I identified as hotspots, and the performance gap was significant, but maybe something has changed inside cryptography. I will need some actual numbers especially for AEAD part, as the test suite runtime is not a good proxy for real-world performance.

In any case thanks a lot for looking into this, it's quite an undertaking!

@meitinger
Copy link
Contributor Author

I'd be happy to gather more numbers about performance if it helps getting this merged. Do you have any particular benchmark in mind?
Otherwise I could just measure big file transfer times?

@jlaine
Copy link
Contributor

jlaine commented Apr 3, 2023

My go-to "high level" benchmark would be to run the HTTP/3 server and client against each other and repeatedly transfer 10-50MB files. You can use the /10000000 endpoint to get a 10MB file. This should give you an overview of some "real world" performance, and how much the crypto code contributes to actual times.

A more focused performance test would be to repeatedly run the AEAD / header protection code against some pre-determined 1500 byte payloads (possibly from the test suite). This will tell us what overhead we pay for the switch to Python.

Final thoughts: using cryptography's "Binding" object is probably going to break somewhere down the line, as there seems to be a focus on moving away from OpenSSL and re-implementing primitives in Rust. We might have to move to cryptography's more high-level interfaces, but this is going to mean additional cost.

PS: you're going to want to rebase on top of main, setup.py has shrunk a lot. So much so, that setup.py after your PR might be as simple as:

import setuptools

setuptools.setup()

@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (5757081) 100.00% compared to head (17971b7) 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #347    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           22        24     +2     
  Lines         4553      4807   +254     
==========================================
+ Hits          4553      4807   +254     
Impacted Files Coverage Δ
src/aioquic/_buffer.py 100.00% <100.00%> (ø)
src/aioquic/_crypto.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@GalaxySnail
Copy link
Contributor

It seems that cryptography.hazmat.bindings is considered as internal API of cryptography [1][2], is it a good idea to use this API?

[1] https://cryptography.io/en/latest/openssl/
[2] https://cryptography.io/en/latest/changelog/#v40-0-0

@meitinger
Copy link
Contributor Author

Performance Measurements

Three different speed comparison will be drawn: AEAD encrypt/decrypt operations, buffer pull/push operations and data transfer.

AEAD operations

Averaged over 60000 repetitions, random data.

Native

cipher payload_size encrypt_time decrypt_time
aes-128-ecb 1 2.93E-06 6.38E-06
aes-128-ecb 10 3.35E-06 6.81E-06
aes-128-ecb 100 3.32E-06 6.23E-06
aes-128-ecb 1000 5.70E-06 1.09E-05
aes-256-ecb 1 5.26E-06 8.95E-06
aes-256-ecb 10 5.07E-06 7.61E-06
aes-256-ecb 100 2.14E-06 4.14E-06
aes-256-ecb 1000 3.09E-06 5.09E-06
chacha20 1 1.77E-06 3.31E-06
chacha20 10 2.05E-06 3.60E-06
chacha20 100 2.60E-06 3.92E-06
chacha20 1000 2.71E-06 3.85E-06

Python

cipher payload_size encrypt_time decrypt_time
aes-128-ecb 1 1.19E-05 1.28E-05
aes-128-ecb 10 1.04E-05 1.10E-05
aes-128-ecb 100 1.06E-05 1.13E-05
aes-128-ecb 1000 1.15E-05 1.20E-05
aes-256-ecb 1 1.02E-05 1.09E-05
aes-256-ecb 10 1.04E-05 1.10E-05
aes-256-ecb 100 1.06E-05 1.11E-05
aes-256-ecb 1000 1.15E-05 1.22E-05
chacha20 1 1.14E-05 1.22E-05
chacha20 10 1.16E-05 1.22E-05
chacha20 100 1.22E-05 1.28E-05
chacha20 1000 1.32E-05 1.37E-05

Buffer operations

Averaged over 100000 repetitions, random data and random position.

Native

bits push pull
8 3.35E-07 2.98E-07
16 3.16E-07 2.96E-07
32 3.12E-07 2.97E-07
64 3.25E-07 3.14E-07
var 2.87E-07 2.74E-07

Python

bits push pull
8 4.93E-07 5.02E-07
16 5.97E-07 6.04E-07
32 5.93E-07 6.05E-07
64 6.22E-07 6.53E-07
var 7.43E-07 8.25E-07

Data Transfer

Averaged over 10 repetitions, random data.

Native

size duration
1 0.002004
1000 0.001790
1000000 0.840696
10000000 7.705281

Python

size duration
1 0.002544
1000 0.003650
1000000 0.786645
10000000 7.716463

Conclusion

While native operations are faster, the Python-only code is still performing comparably fast. Since the buffer and crypto operation only differ in the 10^-7 and 10^-6 range, transferring 10MB of data is more or less the same, as these operations aren't called millions of times in a row to really accumulate. And as the 1MB sample - where the native solution is slower - shows, other more dominating factors (e.g. packet loss) come into play.

Test Programs

For reference and to reproduce results, definitely not the cleanest code ;)

AEAD

import random
import time

from aioquic.quic.crypto import CryptoContext, CIPHER_SUITES
from aioquic.quic.packet import QuicProtocolVersion

import pandas

REPETITIONS=60000
PACKET_SIZES=[1, 10, 100, 1000]
CID_LEN=8

result: list[tuple[str, int, float, float]] = []
for cipher in CIPHER_SUITES:
    for size in PACKET_SIZES:
        ctx = CryptoContext()
        ctx.setup(cipher, random.randbytes(32), QuicProtocolVersion.VERSION_1)
        cid = random.randbytes(CID_LEN)
        payload = random.randbytes(size)
        encrypt = 0
        decrypt = 0
        for n in range(1, REPETITIONS):
            header = b"\x41" + cid + int.to_bytes(n, 2, byteorder='big')
            encrypt -= time.perf_counter()
            data = ctx.encrypt_packet(header, payload, n)
            encrypt += time.perf_counter()
            decrypt -= time.perf_counter()
            test_header, test_payload, _, _ = ctx.decrypt_packet(data, 1 + CID_LEN, n)
            decrypt += time.perf_counter()
            assert header == test_header
            assert payload == test_payload
        result.append((CIPHER_SUITES[cipher][0].decode(), size, encrypt / REPETITIONS, decrypt / REPETITIONS))
df = pandas.DataFrame(result, columns=['cipher', 'payload_size', 'avg_encrypt_time', 'avg_decrypt_time'])
df.to_csv('result.csv', index=False)

Buffer

import random
import time

from aioquic.buffer import Buffer

import pandas

REPETITIONS = 100000
SIZE = 1000

orig_data = random.randbytes(SIZE)
buffer = Buffer(data=orig_data)
pull8 = 0
push8 = 0
pull16 = 0
push16 = 0
pull32 = 0
push32 = 0
pull64 = 0
push64 = 0
pull_var = 0
push_var = 0
for _ in range(REPETITIONS):
    pos = random.randint(0, SIZE - 1)
    buffer.seek(pos)
    pull8 -= time.perf_counter()
    val = buffer.pull_uint8()
    pull8 += time.perf_counter()
    buffer.seek(pos)
    push8 -= time.perf_counter()
    buffer.push_uint8(val)
    push8 += time.perf_counter()
    pos = random.randint(0, SIZE - 2)
    buffer.seek(pos)
    pull16 -= time.perf_counter()
    val = buffer.pull_uint16()
    pull16 += time.perf_counter()
    buffer.seek(pos)
    push16 -= time.perf_counter()
    buffer.push_uint16(val)
    push16 += time.perf_counter()
    pos = random.randint(0, SIZE - 4)
    buffer.seek(pos)
    pull32 -= time.perf_counter()
    val = buffer.pull_uint32()
    pull32 += time.perf_counter()
    buffer.seek(pos)
    push32 -= time.perf_counter()
    buffer.push_uint32(val)
    push32 += time.perf_counter()
    pos = random.randint(0, SIZE - 8)
    buffer.seek(pos)
    pull64 -= time.perf_counter()
    val = buffer.pull_uint64()
    pull64 += time.perf_counter()
    buffer.seek(pos)
    push64 -= time.perf_counter()
    buffer.push_uint64(val)
    push64 += time.perf_counter()

buffer.seek(SIZE)
assert buffer.data == orig_data

for _ in range(REPETITIONS):
    pos = random.randint(0, SIZE - 8)
    buffer.seek(pos)
    pull_var -= time.perf_counter()
    val = buffer.pull_uint_var()
    pull_var += time.perf_counter()
    buffer.seek(pos)
    push_var -= time.perf_counter()
    buffer.push_uint_var(val)
    push_var += time.perf_counter()
# here the data can actually be different


df = pandas.DataFrame(
    [
        ("8", push8 / REPETITIONS, pull8 / REPETITIONS),
        ("16", push16 / REPETITIONS, pull16 / REPETITIONS),
        ("32", push32 / REPETITIONS, pull32 / REPETITIONS),
        ("64", push64 / REPETITIONS, pull64 / REPETITIONS),
        ("var", push_var / REPETITIONS, pull_var / REPETITIONS),
    ],
    columns=["bits", "push", "pull"],
)
df.to_csv("result.csv", index=False)

Data Transfer

import asyncio
from collections import deque
import random
import ssl
import time
import os
from typing import Optional, cast

from aioquic.asyncio.protocol import QuicConnectionProtocol
from aioquic.asyncio.server import QuicServer, serve
from aioquic.asyncio.client import connect
from aioquic.h3.connection import H3_ALPN, H3Connection
from aioquic.h3.events import DataReceived, H3Event, Headers, HeadersReceived
from aioquic.quic.configuration import QuicConfiguration
from aioquic.quic.events import ConnectionTerminated, ProtocolNegotiated, QuicEvent

import pandas


class ServerProtocol(QuicConnectionProtocol):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self._http: Optional[H3Connection] = None

    def http_event_received(self, event: H3Event) -> None:
        if isinstance(event, HeadersReceived):
            assert self._http is not None
            assert event.stream_ended
            method = None
            path = None
            for name, value in event.headers:
                if name == b":method":
                    method = value.decode()
                elif name == b":path":
                    path = value.decode()
            assert method == "GET"
            assert path is not None
            data = random.randbytes(int(path))
            self._http.send_headers(
                stream_id=event.stream_id,
                headers=[
                    (b":status", b"200"),
                    (b"x-start", str(time.perf_counter()).encode()),
                ],
                end_stream=False,
            )
            self._http.send_data(
                stream_id=event.stream_id,
                data=data,
                end_stream=True,
            )

    def quic_event_received(self, event: QuicEvent) -> None:
        if isinstance(event, ProtocolNegotiated):
            assert event.alpn_protocol in H3_ALPN
            self._http = H3Connection(self._quic)
        if self._http is not None:
            for http_event in self._http.handle_event(event):
                self.http_event_received(http_event)


class ClientProtocol(QuicConnectionProtocol):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self._http = H3Connection(self._quic)
        self._requests: dict[
            int, tuple[Optional[float], bytearray, asyncio.Future[tuple[float, bytes]]]
        ] = {}

    def http_event_received(self, event: H3Event) -> None:
        if isinstance(event, (HeadersReceived, DataReceived)):
            assert event.stream_id in self._requests
            start, data, waiter = self._requests[event.stream_id]
            if isinstance(event, HeadersReceived):
                assert not event.stream_ended
                assert start is None
                status = None
                start = None
                for name, value in event.headers:
                    if name == b":status":
                        status = value
                    elif name == b"x-start":
                        start = float(value.decode())
                assert status == b"200"
                self._requests[event.stream_id] = start, data, waiter
            else:
                assert start is not None
                data.extend(event.data)
                if event.stream_ended:
                    del self._requests[event.stream_id]
                    waiter.set_result((time.perf_counter() - start, bytes(data)))

    def quic_event_received(self, event: QuicEvent) -> None:
        if isinstance(event, ConnectionTerminated):
            for _, _, waiter in self._requests.values():
                waiter.set_exception(Exception(event.reason_phrase))
            self._requests.clear()
        else:
            for http_event in self._http.handle_event(event):
                self.http_event_received(http_event)

    async def request(self, size: int) -> tuple[float, bytes]:
        stream_id = self._quic.get_next_available_stream_id()
        self._http.send_headers(
            stream_id=stream_id,
            headers=[
                (b":authority", AUTHORITY.encode()),
                (b":method", b"GET"),
                (b":path", str(size).encode()),
            ],
            end_stream=True,
        )
        waiter = self._loop.create_future()
        self._requests[stream_id] = None, bytearray(), waiter
        self.transmit()
        return await asyncio.shield(waiter)


HOST = "127.0.0.1"
AUTHORITY = "test"
PORT = 7733
REPETITIONS = 10


async def start_server() -> QuicServer:
    def abspath(path: str) -> os.PathLike:
        return os.path.join(os.path.dirname(__file__), path)  # type: ignore

    config = QuicConfiguration(
        is_client=False,
        alpn_protocols=H3_ALPN,
        server_name=AUTHORITY,
    )
    config.load_cert_chain(
        certfile=abspath("ssl_cert.pem"),
        keyfile=abspath("ssl_key.pem"),
    )
    return await serve(
        host=HOST,
        port=PORT,
        configuration=config,
        create_protocol=ServerProtocol,
    )


async def main():
    server = await start_server()
    async with connect(
        host=HOST,
        port=PORT,
        configuration=QuicConfiguration(
            is_client=True,
            alpn_protocols=H3_ALPN,
            verify_mode=ssl.CERT_NONE,
        ),
        create_protocol=ClientProtocol,
    ) as proto:
        client = cast(ClientProtocol, proto)
        times = []
        for size in (1, 1000, 1000000, 10000000):
            print(f"Measure size {size}...")
            total = 0
            for _ in range(REPETITIONS):
                duration, result = await client.request(size)
                total += duration
                assert len(result) == size
            times.append((size, total / REPETITIONS))
        print("Closing...")
        client.close()
        server.close()
        df = pandas.DataFrame(times, columns=["size", "duration"])
        df.to_csv("result.csv", index=False)


asyncio.run(main())

@meitinger
Copy link
Contributor Author

It seems that cryptography.hazmat.bindings is considered as internal API of cryptography [1][2], is it a good idea to use this API?

While I agree in principle, that using an internal API is a bad idea, I'd say in this case - and with the set of bindings that are being used - it's okay-ish. These bindings are (and have to be) used by cryptography... for now.
But, as @jlaine said, it might break down the line if cryptography moves their Python code to Rust. At that point, we can also happily drop the OpenSSL bindings ourselves and switch to the (hopefully by then fast) Rust-backed high-level interfaces like Crypto, CipherContext, etc.

@kdashg
Copy link

kdashg commented Apr 24, 2023

FWIW, I did draft a version of this that uses the higher-level cryptography bindings myself, before I found this issue as I was about to submit a PR. :)
It wasn't awful to write, so I don't think it's technical-debt-expensive to rely on the technically-internal openssl bindings directly.

I would love to see at least a python-only-option, because for my use-case (web-platform-test test runner), performance is effectively not a concern.

@jlaine
Copy link
Contributor

jlaine commented Nov 4, 2023

Since we have had Py_LIMITED_API wheels for a while I'm going to close this.

@jlaine jlaine closed this Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants