[WIP, NOREVIEW] Linux SslStream: custom BIO_METHOD over managed buffer windows#128245
[WIP, NOREVIEW] Linux SslStream: custom BIO_METHOD over managed buffer windows#128245rzikm wants to merge 5 commits into
Conversation
|
Tagging subscribers to this area: @bartonjs, @vcsjones, @dotnet/area-system-security |
There was a problem hiding this comment.
Pull request overview
This PR replaces Linux/OpenSSL SslStream memory BIO staging with a custom managed-window BIO to reduce TLS record copies, and also includes a separate NegotiateStream stale-buffer bug fix.
Changes:
- Adds native managed-span BIO APIs, OpenSSL shim entries, and exports.
- Updates Unix
SslStreamhandshake/encrypt/decrypt paths to use managed read/write windows plus spill draining. - Fixes NegotiateStream read-buffer state after mid-frame read failure and adds a regression test.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/native/libs/System.Security.Cryptography.Native/pal_bio.h |
Declares managed-span BIO native APIs. |
src/native/libs/System.Security.Cryptography.Native/pal_bio.c |
Implements custom BIO_METHOD with read carry and write spill buffers. |
src/native/libs/System.Security.Cryptography.Native/opensslshim.h |
Adds OpenSSL BIO method/flag function shims. |
src/native/libs/System.Security.Cryptography.Native/entrypoints.c |
Exports the new native BIO entry points. |
src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.Ssl.cs |
Adds P/Invokes and switches Unix SSL handles to managed-span BIOs. |
src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.OpenSsl.cs |
Reworks Unix OpenSSL handshake/encrypt/decrypt to arm BIO windows and drain spill output. |
src/libraries/System.Net.Security/src/System/Net/Security/NegotiateStream.cs |
Defers read-buffer state updates until reads/decryption succeed. |
src/libraries/System.Net.Security/tests/FunctionalTests/NegotiateStreamStreamToStreamTest.cs |
Adds a regression test for stale data after mid-frame read failure. |
Replace the pair of BIO_s_mem instances backing each SSL handle on Linux with a custom BIO_METHOD that reads/writes directly into caller-supplied managed buffer windows, with a heap-backed spill buffer for output overflow and a heap-backed carry buffer for unconsumed input bytes. This eliminates one memcpy per TLS record in both directions (encrypt and decrypt) by allowing OpenSSL to read plaintext from and write ciphertext into managed buffers in-place, instead of staging through BIO_s_mem. Native side (src/native/libs/System.Security.Cryptography.Native/): * pal_bio.c gains a ManagedSpanBio implementation (read/write/ctrl callbacks, lazy BIO_METHOD init via pthread_once) plus seven exports: BioNewManagedSpan, BioSetReadWindow, BioClearReadWindow, BioSetWriteWindow, BioGetWriteResult, BioDrainSpill, BioResetManagedSpan. * When BioClearReadWindow is called with unread bytes still in the window, the tail is copied into a per-BIO readCarry buffer so the next BIO_read drains it before any new window. This preserves the BIO_s_mem semantic that the SslStreamPal layer relies on. * opensslshim.h adds the BIO_meth_* / BIO_get_data / BIO_set_data / BIO_get_new_index / BIO_clear_flags / BIO_test_flags / BIO_set_init / BIO_set_flags shim entries (all required since OpenSSL 1.1.0). * entrypoints.c registers the new exports. Managed side: * Interop.Ssl.cs declares the seven new P/Invokes and switches SafeSslHandle.Create to allocate ManagedSpan BIOs instead of memory BIOs. * Interop.OpenSsl.cs rewrites Decrypt, Encrypt and DoSslHandshake to pin caller buffers, call BioSet*Window before the SSL_* operation, and BioClearReadWindow / BioGetWriteResult / BioDrainSpill afterwards. New helpers ComputeMaxTlsOutput and DrainOutputBioSpill centralise the output-bound logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
a9e5bb6 to
a134137
Compare
- pal_bio: fail BIO_read on lost carry bytes instead of silently dropping - pal_bio: BIO_CTRL_RESET clears window pointers + error flag - pal_bio: drop unused BioResetManagedSpan entry point - DoSslHandshake/Encrypt/Decrypt: clear BIO windows in finally inside fixed - Encrypt: snapshot pre-write Size so drained spill bytes survive a failed SSL_write instead of being reset to 0 - Encrypt: pass only the per-record upper bound to EnsureAvailableSpace (not Size + upperBound, which over-allocates by Size) - ComputeMaxTlsOutput: use OpenSSL's SSL3_RT_MAX_ENCRYPTED_OVERHEAD (256) per record instead of the 128-byte estimate that could trigger the spill fallback for legitimate cipher suites Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Note AI-generated content disclosure: this benchmark and comment were prepared with GitHub Copilot CLI. Triggering an end-to-end SslStream round-trip benchmark to validate the perf impact of the custom BIO_METHOD. Each iteration writes a chunk on the client and drains it on the server, exercising both the encrypt path (BIO write side) and decrypt path (BIO read side). Parametrized on chunk size so we see the per-record overhead effect (small chunks) and the bulk-throughput effect (large chunks). Linux only — the change is Linux/OpenSSL-specific. @EgorBot -linux_amd -linux_intel using System;
using System.Net;
using System.Net.Security;
using System.Net.Sockets;
using System.Security.Authentication;
using System.Security.Cryptography;
using System.Security.Cryptography.X509Certificates;
using System.Threading.Tasks;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(SslStreamBench).Assembly).Run(args);
[MemoryDiagnoser]
public class SslStreamBench
{
private SslStream _client = null!;
private SslStream _server = null!;
private byte[] _payload = null!;
private byte[] _readBuf = null!;
[Params(64, 1024, 16384, 65536)]
public int ChunkSize { get; set; }
[GlobalSetup]
public void Setup()
{
using var rsa = RSA.Create(2048);
var req = new CertificateRequest("CN=localhost", rsa, HashAlgorithmName.SHA256, RSASignaturePadding.Pkcs1);
var tmp = req.CreateSelfSigned(DateTimeOffset.UtcNow.AddDays(-1), DateTimeOffset.UtcNow.AddDays(30));
var cert = new X509Certificate2(tmp.Export(X509ContentType.Pfx), (string?)null, X509KeyStorageFlags.Exportable);
var listener = new TcpListener(IPAddress.Loopback, 0);
listener.Start();
var clientSock = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
var connectTask = clientSock.ConnectAsync(IPAddress.Loopback, ((IPEndPoint)listener.LocalEndpoint).Port);
var serverSock = listener.AcceptSocket();
connectTask.GetAwaiter().GetResult();
listener.Stop();
clientSock.NoDelay = true;
serverSock.NoDelay = true;
_client = new SslStream(new NetworkStream(clientSock, true), false, (_, _, _, _) => true);
_server = new SslStream(new NetworkStream(serverSock, true), false);
var copts = new SslClientAuthenticationOptions
{
TargetHost = "localhost",
EnabledSslProtocols = SslProtocols.Tls12,
};
var sopts = new SslServerAuthenticationOptions
{
ServerCertificate = cert,
EnabledSslProtocols = SslProtocols.Tls12,
};
Task.WaitAll(
_client.AuthenticateAsClientAsync(copts),
_server.AuthenticateAsServerAsync(sopts));
_payload = new byte[ChunkSize];
new Random(42).NextBytes(_payload);
_readBuf = new byte[ChunkSize];
}
[Benchmark]
public async Task RoundTrip()
{
await _client.WriteAsync(_payload).ConfigureAwait(false);
int total = 0;
while (total < ChunkSize)
{
int r = await _server.ReadAsync(_readBuf.AsMemory(total)).ConfigureAwait(false);
if (r == 0) break;
total += r;
}
}
[GlobalCleanup]
public void Cleanup()
{
_client?.Dispose();
_server?.Dispose();
}
} |
| if (retVal != input.Length) | ||
| { | ||
| outToken.Size = 0; | ||
| // Drop any partial output written by the failed SSL_write but keep the drained spill bytes. | ||
| outToken.Size = preWriteSize; |
| Returns the number of bytes written into the window and into the spill | ||
| buffer respectively since the last reset/window-set. |
| int32_t unread = ctx->readLen - ctx->readPos; | ||
| if (unread > 0 && ctx->readPtr != NULL) | ||
| { | ||
| /* Move existing carry tail down to position 0 first. */ | ||
| int32_t carryTail = ctx->readCarryLen - ctx->readCarryPos; |
|
Note AI-generated content disclosure: this benchmark and comment were produced with GitHub Copilot CLI. crank: SslStream read-write throughput + handshakeSetup
CPU columns below report Results (single run each unless noted)
read-write: Direct decrypt adds another ~+2% on read MB/s and +10% on write MB/s on top of the custom-BIO encrypt change (write side benefits because the now-faster read loop pulls records out of the kernel quicker and the duplex throughput re-balances). Total improvement vs origin/main is now +28.8% read / +26.1% write, with server-CPU efficiency (MB/s per server-core%) up +20.5% — i.e. the same single SSL session now drives ~20% more bytes per unit of server CPU. handshake: Within noise of baseline, as expected — handshake-only scenarios don't exercise the application-data BIO or Both directions are now zero-copy in steady state: Next: I also re-ran the |
|
Note AI-generated content disclosure: this benchmark and comment were produced with GitHub Copilot CLI. crank: httpclient + Kestrel over HTTPSFollow-up to measure impact on an end-to-end HTTP path where the SSL encrypt/decrypt cost is amortized over HTTP parsing/Kestrel processing. Setup
CPU columns below report Results
All six runs (3 versions × 2 variants): 0 errors, 0 bad-status responses. Interpretation
Net: bulk encrypt/decrypt is the wins are clearest in workloads where SSL CPU is the bottleneck (pure-SslStream |
|
Note AI-generated content disclosure: this analysis and comment were produced with GitHub Copilot CLI. Spill-path instrumentation (validation only — now removed from the PR)Added temporary instrumentation (env-var gated, never committed to the final PR) to the
Re-ran the same three scenarios from the previous comments on
(Server counts are the last periodic-dump snapshots before crank Aggregate: 0 spill events across ~7.5 M Spill-stress validation (forced 100% spill, post-direct-decrypt)Because the spill code path is dead-code on the hot path, a second env-var (
Throughput in stress mode (read 617 / write 805 MB/s) is ~10–20% below normal mode (774 / 877 MB/s) — confirming the spill memcpy itself has a real-but-bounded cost, and proving the spill path is functionally correct in the worst case. The spill buffer remains in place as a defensive fallback for:
Neither situation was observed in the regular benchmarks, so the spill path is effectively dead code on the hot path under normal traffic — which is the desired outcome. The path is still required for correctness when those edge cases occur (KeyUpdate response during a Instrumentation and the stress-clamp were removed from the PR after this validation step (commit |
Symmetric to the existing custom-BIO encrypt optimization, this change threads the caller-supplied Memory<byte> all the way through to SSL_read so that OpenSSL writes decrypted plaintext directly into the user destination, eliminating the intermediate copy from the internal encrypted buffer to the user buffer (CopyDecryptedData on the read path) for the common case where the user buffer has enough room. Approach: - Split Interop.OpenSsl.Decrypt into a (ReadOnlySpan<byte> input, Span<byte> output) form: the input span feeds the BIO read window for ciphertext, the output span is the SSL_read destination for plaintext. The legacy in-place call site (DecryptMessage) now passes the same span for both, preserving today behavior. - Add SSL_pending wrapper (CryptoNative_SslPending) so we can detect plaintext residual that OpenSSL buffered internally when the user span was smaller than a records plaintext. The next read drains it via DecryptMessageDirect(empty input, user buffer) before any network IO. - New SslStreamPal API (Unix only for now): DecryptMessageDirect plus IsDirectDecryptSupported. Other PALs (Windows/OSX/Android) expose IsDirectDecryptSupported=false and a throwing stub so the JIT eliminates the new branch on those platforms. - SslStream.IO.ReadAsyncInternal: gated on SslStreamPal.IsDirectDecryptSupported, non-empty user buffer and no in-flight rehandshake, uses the new direct path. Non-OK status copies the direct-written bytes into extraBuffer so the existing Renegotiate/ContextExpired handlers keep working. The net_ssl_renegotiate_buffer guard now also checks _palHasPendingPlaintext to keep the NegotiateClientCertificateAsync_PendingDecryptedData_Throws contract. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This reverts commit 7371f2e. The direct-decrypt optimization caused a record-layer failure (error:0A000139:SSL routines::record layer failure) under HTTP/2 with concurrency >= 2, where the client receives large response payloads via direct decrypt. HTTP/1.1 keep-alive and HTTP/1.1 connection: close paths were not affected and the original PR validation did not exercise HTTP/2 multiplexed reads. Keeping the custom-BIO encrypt optimization (commit a134137), which remains correct under all protocols tested. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address PR review feedback by collapsing the per-operation BIO setup,
SSL_{do_handshake,read,write}, BIO result retrieval and ERR_clear_error
into a single P/Invoke per TLS operation (CryptoNative_Ssl{Handshake,
Encrypt,Decrypt}). This removes three GC suspend/resume transitions per
TLS read or write.
On the read path, the atomic SslDecrypt now takes separate input
(ciphertext) and output (user buffer) pointers. When the user buffer is
large enough to receive a full TLS record plaintext (>= 16 KB), the
decrypted bytes are written directly into the user-provided memory,
avoiding the intermediate copy from _buffer.DecryptedSpan via
CopyDecryptedData. Smaller reads continue to use the in-place path,
keeping the implementation free of partial-record/drain state.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Note This comment was prepared with AI assistance (GitHub Copilot CLI). Measurements were collected and validated by the assistant under my supervision. Update — atomic SSL ops + direct-decrypt re-landed (commit 7d3fc43)Two changes squashed into one commit on top of the prior tip: 1. Atomic native SSL operations (addressing @bartonjs''s review)Each TLS op (handshake step / encrypt / decrypt) is now a single P/Invoke. The native side does New native entry points: 2. Direct-decrypt restored, gated on buffer sizeThe previous direct-decrypt attempt (reverted at Re-landed under a simple gate: direct-decrypt now requires Validation
Benchmark deltas (vs. main baseline
|
| Scenario | Baseline (main 3d73a08) | This PR (7d3fc43) | Δ vs main |
|---|---|---|---|
| read-write read MB/s | 601.1 | 798.6 | +32.9% |
| read-write write MB/s | 695.4 | 835.9 | +20.2% |
| handshake mean ms | 4.974 | 4.924 | noise |
| HTTPS GET 16 KiB, HTTP/1.1, c=32 keep-alive | 27,039 RPS | 27,745 RPS | noise |
| HTTPS GET 16 KiB, HTTP/2, c=10 | 5,048 RPS | 5,251 RPS | +4.0% |
| HTTPS GET 16 KiB, HTTP/2, c=100 | 20,477 RPS | 20,779 RPS | +1.5% |
All runs: 0 bad-status responses, 0 exceptions. HTTP/2 deltas are small because at 16 KiB responses the bottleneck is HTTP/2 framing / Kestrel work rather than the SSL memcpy path; the read-write benchmark directly stresses what this PR actually accelerates.
| int32_t needed = ctx->spillLen + remaining; | ||
| if (!ManagedSpanBioGrowSpill(ctx, needed)) |
| int32_t needed = ctx->readCarryLen + unread; | ||
| if (ManagedSpanBioGrowCarry(ctx, needed)) | ||
| { | ||
| memcpy(ctx->readCarry + ctx->readCarryLen, ctx->readPtr + ctx->readPos, (size_t)unread); | ||
| ctx->readCarryLen += unread; | ||
| } | ||
| else | ||
| { | ||
| /* Carry allocation failed; bytes are lost. Mark the BIO as | ||
| permanently broken so the next BIO_read surfaces the failure | ||
| rather than masking it as a protocol error. */ | ||
| ctx->readError = 1; | ||
| } |
| if (remaining > 0) | ||
| { | ||
| int32_t needed = ctx->spillLen + remaining; | ||
| if (!ManagedSpanBioGrowSpill(ctx, needed)) |
| int32_t unread = ctx->readLen - ctx->readPos; | ||
| if (unread > 0 && ctx->readPtr != NULL) | ||
| { | ||
| /* Move existing carry tail down to position 0 first. */ | ||
| int32_t carryTail = ctx->readCarryLen - ctx->readCarryPos; |
Note
This pull request was prepared with AI assistance (GitHub Copilot CLI). The code, build, and test validation were performed by the assistant under my supervision.
Summary
Replace the two
BIO_s_meminstances backing each SSL handle on Linux with a customBIO_METHODthat reads from / writes to caller-supplied managed buffer windows, and thread the user'sMemory<byte>all the way throughSSL_read. Together these changes eliminate both memcpys per TLS record on the Linux hot path (one on encrypt, one on decrypt).Previous behavior
SslStreamstaged every TLS record throughBIO_s_mem, and read returned plaintext via the internalSslStreamdecrypted buffer:BIO_write(copy 1) →BIO_s_memstorage →SSL_readreads from BIO (copy 2 inside OpenSSL record buffer) → decrypt in place intoSslStream._buffer→CopyDecryptedData(copy 3) into userMemory<byte>SSL_write→BIO_writetoBIO_s_mem(copy 1) →BIO_readfrom BIO tooutToken(copy 2)New behavior
SSL_readwrites plaintext directly into the user'sMemory<byte>(noCopyDecryptedData)SSL_write→BIO_writelands directly in the managedoutTokenwindowTwo distinct optimizations land in this PR:
BIO_METHOD(ManagedSpanBioinpal_bio.c): eliminates the BIO staging memcpy in each direction.DecryptMessageDirect(new Unix-only PAL API): threads the user'sMemory<byte>throughSSL_readso OpenSSL decrypts straight into the caller's buffer, eliminating theCopyDecryptedDatamemcpy on the read side. Windows/OSX/Android exposeIsDirectDecryptSupported = falseand a throwing stub; the JIT eliminates the branch on those platforms.Design notes
ManagedSpanBio (pal_bio.c)
BIO_writelands directly into the write window; once it fills, the BIO falls back to the spill buffer.BioGetWriteResultreports bytes written to the window plus a spill flag;BioDrainSpillcopies the spill out afterwards.BIO_readdrains any leftover carry first, then the window. OnBioClearReadWindowany unread tail is migrated into the carry — this preserves theBIO_s_memaccumulation semantics thatSslStreamPal.HandshakeInternal/DecryptMessagerely on (the SSL engine may not consume every byte handed to the BIO in one call).BIO_CTRL_PENDING/BIO_CTRL_RESETupdated accordingly.The output spill buffer is non-negotiable:
SSL_readcan also emit bytes to the output BIO (KeyUpdate response, alerts,close_notifyon shutdown), so the BIO must always be able to absorb writes even when the caller didn't pre-arm a write window. Spill-stress measurements (forced 100% spill via env var) confirm the path is correct and bounded; normal-workload instrumentation shows zero spill events across ~7.5 MEncryptcalls — the path is dead code on the hot path under normal traffic.DecryptMessageDirect
Interop.OpenSsl.Decrypt(input, output)taking ciphertext and plaintext as separate spans (the prior in-place call site passes the same span for both to preserve today's behavior).SSL_pendingwrapper to detect plaintext that OpenSSL buffered internally when the user span was smaller than a record's plaintext (up to 16 KiB).SslStreamtracks this via_palHasPendingPlaintext(guarded by_handshakeLock) and drains residual via a follow-up directSSL_readwith empty input before the next network IO.SslStreamPal.IsDirectDecryptSupported && !buffer.IsEmpty && _handshakeWaiter == null. Non-OK SSL statuses copy any direct-written bytes intoextraBufferso the existing renegotiate/ContextExpiredflow keeps working.NegotiateClientCertificateAsync_buffer.ActiveLength > 0precondition extended to also check_palHasPendingPlaintextso thePendingDecryptedData_Throwscontract is preserved when the residual lives inside OpenSSL rather than_buffer.Concurrency
DecryptData/DecryptDataDirect/EncryptDataall run under_handshakeLockinSslStream, so the BIO state machine and_palHasPendingPlaintextsee a single in-flightSSL_*call at a time. Buffer pointers are stashed only for the duration of theSSL_read/SSL_writecall and cleared before thefixedblock ends.Compatibility
All shim entries used (
BIO_meth_new,BIO_meth_set_*,BIO_set_data,BIO_get_data,BIO_get_new_index,BIO_set_init,BIO_set_flags/BIO_clear_flags/BIO_test_flags,SSL_pending) are available since OpenSSL 1.1.0, which is our minimum.Validation
libs.native+libsclean (0 warnings, 0 errors).System.Net.Security.Testsfunctional: 4933 pass, 0 fail, 8 skip (matches baseline, all expected platform/OS skips).aspnet-gold-lin(TLS 1.3, baseline =3d73a08f1ba, current tip =7371f2e2a1b):read-write: +28.8% read MB/s (601 → 774), +26.1% write MB/s (695 → 877), +20.5% throughput-per-server-CPU%.handshake: flat (within run noise).See follow-up comments for full benchmark tables (including CPU usage) and instrumentation results.