condy-cpp · wokron · Feb 1, 2026 · Feb 1, 2026
diff --git a/docs/bench.md b/docs/bench.md
@@ -5,86 +5,78 @@
 The benchmark source code can be found at [condy-bench](https://github.com/wokron/condy-bench).
 
 Test Environment:
-- CPU: AMD Ryzen 9 7945HX with Radeon Graphics × 16
-- Storage: SK Hynix HFS001TEJ9X115N (NVMe SSD, 1TB, PCIe 4.0 x4)
-- Compiler: clang version 18.1.3
-- OS: Linux Mint 22 Cinnamon
-- Kernel: 6.8.0-90-generic
+- **CPU**: AMD Ryzen 9 7945HX with Radeon Graphics × 16
+- **Storage**: SK Hynix HFS001TEJ9X115N (NVMe SSD, 1TB, PCIe 4.0 x4)
+- **Compiler**: clang version 18.1.3
+- **OS**: Linux Mint 22 Cinnamon
+- **Kernel**: 6.8.0-90-generic
 
 ## Sequential File Read
 
-We tested the throughput of Condy reading a 2GB file sequentially with different block sizes and task counts, and compared the results with Asio, Aio, and synchronous interfaces.
+We tested the performance of Condy in 64KB sequential reads on an 8GB file, collected throughput data, and compared it with baseline implementations using libaio and liburing.
 
-### Varying Block Size
+As shown in the figure, as the queue depth (number of concurrent tasks) increases, the read throughput gradually rises. When Direct IO is not enabled, Condy’s throughput is less than 3500MB/s. After registering files and buffers (marked as Fixed in the figure), Condy’s performance improves to some extent. With Direct IO enabled, Condy’s throughput increases significantly. At a queue depth of 4, Condy Direct IO is only slightly better than libaio. However, as the queue depth increases, Condy’s advantage becomes more pronounced, and at a queue depth of 64, throughput saturates (~6500MB/s). Further enabling IO Polling brings some throughput improvement at low queue depths, but as the queue depth increases, throughput is actually lower than with Direct IO alone.
 
-We fixed the number of concurrent tasks to 16 and gradually increased the block size for each read. The results are shown below. Both axes use logarithmic scales.
+In addition, Condy’s performance is roughly the same as the baseline program implemented with liburing under the same configuration.
 
-![](file_read_block_size.png)
-
-As shown in the figure, when the block size is less than 16KB, regular Condy reading performs best. When the block size exceeds 16KB, Condy Direct IO mode performs best. As the block size increases, the throughput of Condy Direct IO and Aio gradually increases, reaching saturation (~6700 MB/s) at a block size of 256KB. However, before this point, the throughput growth of Aio is slower than that of Condy Direct IO. Using Fixed Fd & Buffer, Condy achieves better performance, but the improvement is not as significant as with Direct IO.
-
-The performance of synchronous reading is close to that of regular Condy at 4KB block size. However, as the block size increases, the throughput of synchronous reading does not change significantly, thus lagging far behind asynchronous methods. Synchronous Direct IO shows a similar growth pattern to Condy Direct IO: when the block size is small, its throughput is inferior to regular synchronous reading, but as the block size increases, its performance gradually surpasses the regular method.
-
-Asio performs the worst, with results inferior to synchronous Direct IO. It is unclear whether this is due to the use of `asio::random_access_file` instead of `asio::stream_file`. However, `asio::stream_file` cannot achieve concurrent read/write, which means it cannot provide file IO throughput by increasing queue depth.
-
-### Varying Number of Tasks
-
-We fixed the block size to 64KB and gradually increased the number of concurrent read tasks. The results are shown below. Both axes use logarithmic scales.
-
-![](file_read_num_tasks.png)
-
-When the concurrency is 4, regular Condy performance is slightly inferior to synchronous reading. But as the concurrency increases, Condy quickly surpasses synchronous reading. Fixed Fd & Buffer brings some performance improvement, but still less than that brought by Direct IO. At low concurrency, Condy Direct IO is slightly inferior to Aio, but as concurrency increases to 32, Condy Direct IO reaches saturation before Aio. Since 64KB is too small for synchronous Direct IO, its performance is worse than regular synchronous in the figure. Asio still performs the worst.
+<div align="center">
+  <img src="file_read_queue_depth.png" width="60%">
+</div>
 
 ## Random File Read
 
-We tested the throughput of Condy reading a 2GB file randomly with different block sizes and task counts, and compared the results with Asio, Aio, and synchronous interfaces.
-
-### Varying Block Size
-
-We fixed the number of concurrent tasks to 16 and gradually increased the block size for each read. The results are shown below. Both axes use logarithmic scales.
-
-![](file_random_read_block_size.png)
-
-Unlike sequential reading, Direct IO and regular Condy IO have similar performance at 4KB block size. As the block size increases, the throughput of Direct IO grows much faster than that of regular Condy IO. Aio and Condy Direct IO perform similarly at all block sizes. Using Fixed Fd & Buffer does not significantly change Condy's throughput. Synchronous read performance is much weaker than asynchronous methods. Asio performs worse than synchronous methods.
+We tested the performance of Condy in 4KB random reads on an 8GB file, collected IOPS data, and compared it with baseline implementations using libaio and liburing.
 
-### Varying Number of Tasks
+As shown in the figure, as the queue depth (number of concurrent tasks) increases, the read IOPS gradually rises. When the queue depth is small (e.g., <=4), the performance of non-Direct IO is actually higher than that of libaio and Condy with Direct IO. Registering files and buffers on top of this provides a slight performance improvement, but the gain is not as significant as in sequential reads. When Direct IO is combined with IO Polling, Condy achieves optimal performance at small queue depths (<=8 in the figure).
 
-We fixed the block size to 64KB and gradually increased the number of concurrent read tasks. The results are shown below. Both axes use logarithmic scales.
+At larger queue depths, Direct IO achieves better performance. When the queue depth reaches 16, libaio achieves the best performance but also reaches the saturation IOPS of the framework. Condy, however, can achieve even better IOPS as the queue depth continues to increase. In this scenario, plain Condy Direct IO achieves the best performance, while IO Polling is slightly worse but still better than libaio.
 
-![](file_random_read_num_tasks.png)
+Similarly, Condy’s performance is roughly the same as the baseline program implemented with liburing under the same configuration.
 
-The trend of throughput change with increasing number of tasks is similar to that of changing block size. At low concurrency, Direct IO is slightly weaker than regular reading. But as the number of tasks increases, Direct IO shows a more significant performance improvement.
+<div align="center">
+  <img src="file_random_read_queue_depth.png" width="60%">
+</div>
 
 ## Echo Server
 
 We tested the throughput of a TCP echo-server implemented with Condy and other methods as the number of connections increases on a single machine. Since the test is conducted locally, it does not fully reflect the performance of a real network card. However, it still allows us to observe the basic overhead brought by different frameworks on network IO.
 
-![](echo_server_num_connections.png)
+<div align="center">
+  <img src="echo_server_num_connections.png" width="60%">
+</div>
 
 As the number of connections increases, the throughput first rises and then falls. The later decline is mainly due to contention caused by an increased number of client threads. As shown in the figure, Condy outperforms Asio and Epoll. With file registration, Condy can achieve a small additional performance gain.
 
 ## Channel
 
 By varying the number of messages sent and the number of Channels, and measuring the total time taken, we compared the performance of Condy and Asio Channels.
 
-![](channel_number_of_messages.png)
+<div align="center">
+  <img src="channel_number_of_messages.png" width="60%">
+</div>
 
-![](channel_task_pairs.png)
+<div align="center">
+  <img src="channel_task_pairs.png" width="60%">
+</div>
 
 As shown in the figures, as the number of messages and concurrent tasks increases, the total time for both Condy and Asio increases linearly. In terms of execution time, Condy achieves a **20x** performance improvement over Asio.
 
 ## Coroutine Spawn
 
 By varying the number of coroutines created and measuring the total time taken, we compared the efficiency of Condy and Asio in coroutine creation.
 
-![](spawn_benchmark.png)
+<div align="center">
+  <img src="spawn_number_of_tasks.png" width="60%">
+</div>
 
 As shown in the figure, as the number of coroutines increases, the total time for both Condy and Asio increases linearly. In terms of execution time, Condy achieves a **5x** performance improvement over Asio.
 
 ## Coroutine Switch
 
 By repeatedly switching coroutines and measuring the total time taken, we compared the efficiency of Condy and Asio in coroutine switching.
 
-![](post_switch_times.png)
+<div align="center">
+  <img src="post_switch_times.png" width="60%">
+</div>
 
 As shown in the figure, as the number of switches increases, the total time for both Condy and Asio increases linearly. In terms of execution time, Condy achieves a **15x** performance improvement over Asio.
diff --git a/docs/imgs/channel_number_of_messages.png b/docs/imgs/channel_number_of_messages.png
diff --git a/docs/imgs/channel_task_pairs.png b/docs/imgs/channel_task_pairs.png
diff --git a/docs/imgs/echo_server_num_connections.png b/docs/imgs/echo_server_num_connections.png
diff --git a/docs/imgs/file_random_read_block_size.png b/docs/imgs/file_random_read_block_size.png
diff --git a/docs/imgs/file_random_read_num_tasks.png b/docs/imgs/file_random_read_num_tasks.png
diff --git a/docs/imgs/file_random_read_queue_depth.png b/docs/imgs/file_random_read_queue_depth.png
diff --git a/docs/imgs/file_read_block_size.png b/docs/imgs/file_read_block_size.png
diff --git a/docs/imgs/file_read_num_tasks.png b/docs/imgs/file_read_num_tasks.png
diff --git a/docs/imgs/file_read_queue_depth.png b/docs/imgs/file_read_queue_depth.png
diff --git a/docs/imgs/post_switch_times.png b/docs/imgs/post_switch_times.png
diff --git a/docs/imgs/spawn_benchmark.png b/docs/imgs/spawn_benchmark.png
diff --git a/docs/imgs/spawn_number_of_tasks.png b/docs/imgs/spawn_number_of_tasks.png