Enhance performance with benchmarking and credit management improvements#45
Merged
Conversation
- Introduced a new script `run-benchmarks.sh` to automate the benchmarking process for Go and .NET implementations. - Added JSON results for Go benchmarks, comparing Raw TCP, FRP/Yamux, and Smux across various scenarios and data sizes. - The script builds Docker images for both Go and .NET benchmarks, executes them, and generates a combined report.
- Implemented throughput and game-tick benchmarks in Program.cs. - Added results for various scenarios in dotnet-results.json. - Included Go benchmark results file for comparison.
- Introduced pending grant credits in ReadChannel to accumulate credits when in batched flush mode. - Modified SendCreditGrantAsync to handle immediate and batched flush modes appropriately. - Implemented SignalFlush method to signal when to flush pending credits. - Updated WritePendingCreditGrantsAsync to drain and send accumulated credits under a single lock. - Improved overall flow control and efficiency in credit management during streaming operations.
…riting - Updated StreamMultiplexer to utilize PipeWriter for handling writes, reducing memory allocations and improving performance. - Introduced PipeWriter in place of direct Stream writes, allowing for more efficient buffering and flushing. - Adjusted methods to accommodate the new PipeWriter, ensuring proper handling of flush modes and credit grants. - Updated go-results.json with new benchmark results reflecting performance improvements. - Added System.IO.Pipelines package reference to NetConduit project.
- Introduced PROBLEMS.md to document performance issues identified through HotPathProfiler, BottleneckAnalysis, and DeepProfile tools, including detailed evidence and root causes for each problem. - Created TRY_AND_TRY_UNTIL.md to outline a structured approach for improving bulk throughput while maintaining game-tick performance, including steps for establishing a baseline, executing plans, and evaluating results. - Established INDEX.md for tracking failed improvement plans, detailing reasons for failure and insights gained from each attempt. - Documented individual failed plans (PLAN_IMPROVEMENT_001.md to PLAN_IMPROVEMENT_010.md) with specific changes attempted, metrics evaluated, and root causes for failure, providing a comprehensive overview of the optimization efforts.
- Updated Go benchmark results with improved metrics across various implementations. - Modified `run-benchmarks.sh` to enable local NetConduit usage during .NET builds. - Enhanced documentation for benchmarks to reflect updated results and methodologies. - Added new plans for addressing bulk throughput issues, including larger read buffers and immediate flush for large frames. - Refactored `StreamMultiplexer` and `WriteChannel` to optimize flushing behavior for large data frames, reducing latency and improving throughput.
- Removed outdated benchmarking results for Go implementations from `go-results.json`. - Updated `go-bench` binary with new results. - Enhanced flush strategies in the NetConduit: - Introduced accumulation-based flush thresholds to reduce TCP write overhead. - Implemented immediate credit grant flush on the reader side to minimize latency. - Increased read PipeReader buffer size to optimize syscall performance. - Adapted flush behavior based on contention to improve multi-channel throughput. - Documented failed and succeeded plans for future reference and analysis.
- Removed failed plans PLAN_IMPROVEMENT_016 and PLAN_IMPROVEMENT_017. - Increased PipeReader buffer size from 16KB to 1MB in StreamMultiplexer to reduce syscall overhead for large frames. - Introduced accumulation-based flush mechanism in WriteChannel to minimize ForceFlush overhead, allowing batching of writes until a threshold of 256KB is reached. - Updated success criteria for both improvements to reflect expected performance gains in throughput and message rates. - Added benchmark results for game-tick and throughput scenarios for both .NET and Go implementations.
…ted plans - Added direct delivery mechanism in ReadChannel to bypass per-channel Pipe when user is waiting for data, reducing memory copies and improving performance. - Updated INDEX.md files for failed and succeeded plans to reflect new plans (019-022) and their analysis. - Created detailed improvement plans for aggressive credit grant threshold, direct-to-stream write for large frames, reverting accumulation threshold, and increasing accumulation flush threshold. - Added comprehensive unit tests for direct delivery functionality to ensure correctness and performance.
- Updated benchmark results for Go implementations in go-gametick.json and go-throughput.json, reflecting changes in performance metrics. - Added new plans (027 and 028) to optimize FlushInterval and remove frame-size guard from ForceFlush to reduce flush latency. - Documented failed plans (024-026) with insights on FlushLoop signaling and ForceFlush strategies. - Enhanced FlushLoop behavior by adjusting thresholds for signaling and flushing, aiming for improved throughput and reduced latency in multi-channel scenarios.
…sageTransit functionalities - Implement tests for read channel disposal and coordination in DisconnectionTests. - Enhance ReconnectionTests with sync state ring buffer behavior validations. - Introduce DeltaTransit deserialization tests to validate error handling and correct parsing. - Add comprehensive send/receive tests for DeltaTransit to ensure identical states are delivered correctly. - Create MessageTransit tests to verify message framing and duplex stream behavior.
…into feat/harden
Add HighMemory collection to serialize ExtremeTests, BenchmarkTests, PerformanceTests, ChaosRobustnessTests, DataIntegrityStressTests, ChaosTargetedTests, and MemoryPressureTests. These tests allocate significant memory (up to 20K channels, 100MB data transfers) and running them in parallel exceeds the 7GB GitHub runner limit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a comprehensive performance hardening plan for the internal write pipeline, along with significant improvements to the benchmarking infrastructure. The main focus is on documenting a step-by-step optimization roadmap, upgrading to .NET 10, and establishing robust benchmarking environments for both .NET and Go implementations. No public API or feature changes are made; all enhancements are internal and aim to improve throughput and message rates.
Performance Hardening Plan & Documentation:
PLAN_HARDEN.mdoutlining completed optimizations, benchmarking results, identified bottlenecks, and a multi-step roadmap for further performance improvements, including gather-write, batched credit grants, buffer reuse, and a future PipeWriter transport. The document also summarizes lessons learned from previous experiments.Benchmarking Infrastructure Enhancements:
Dockerfile.netconduit) for building and running .NET 10-based comparison benchmarks, including multi-stage build and runtime images.Dockerfile.go) and supporting Go module for cross-language performance comparisons with Yamux and Smux. [1] [2]NetConduit.ComparisonBench.csproj) for internal benchmarking and diagnostics.net10.0) for access to the latest runtime features and performance improvements.Diagnostic and Analysis Tools:
Diagnostic.csutility for fine-grained measurement of time spent in the write path, aiding in identifying and quantifying bottlenecks in single-channel throughput scenarios.