acl-dev · loongs-zhang · Feb 24, 2025 · Feb 24, 2025
diff --git a/README.md b/README.md
@@ -185,6 +185,9 @@ fn main() {
 
 ## ⚓ Learn More
 
+- [Project Overview](core/docs/en/overview.md)
+- [Background](docs/en/background.md)
+- [Why Rust](docs/en/why-rust.md)
 - [Coroutine Overview](core/docs/en/coroutine.md)
 - [Scalable Stack Overview](core/docs/en/scalable-stack.md)
 - [Monitor Overview](core/docs/en/monitor.md)

diff --git a/core/docs/en/overview.md b/core/docs/en/overview.md
@@ -17,10 +17,8 @@ author: loongs-zhang
 The `open-coroutine` is a simple, efficient and generic stackfull-coroutine library, you can use this as a performance
 replacement for IO thread pools, see [why better](../en/why-better.md).
 
-[//]: # (todo 增加英文版本的文档)
-
-- [Background](../../../docs/cn/background.md)
-- [Why Rust](../../../docs/cn/why-rust.md)
+- [Background](../../../docs/en/background.md)
+- [Why Rust](../../../docs/en/why-rust.md)
 - [Why Better](../en/why-better.md)
 - [Quick Start](../../../README.md)
 - [Coroutine Overview](../en/coroutine.md)

diff --git a/docs/cn/background.md b/docs/cn/background.md
@@ -6,6 +6,8 @@ author: loongs-zhang
 
 # 诞生之因
 
+[English](../en/background.md) | 中文
+
 ## 待调优的线程池
 
 在早期程序员为了支持多个用户并发访问服务应用，往往采用多进程方式，即针对每一个TCP网络连接创建一个服务进程。在2000年左右，比较流行使用CGI方式编写Web服务，当时人们用的比较多的Web服务器是基于多进程模式开发的Apache
@@ -54,7 +56,7 @@ PS：假设单线程，CPU时间片为1s，有100个任务，公平调度指每
 
 协程技术哪家强，编程语言找golang。然而随着更深入的学习，我发现几个`goroutine`的不足：
 
-1. `不是严格的thread-per-core`。goroutine运行时也是由线程池来支撑的，而这个线程池的最大线程为256，这个数字可比thread-per-core的线程数大得多；
+1. `不是thread-per-core`。goroutine运行时也是由线程池来支撑的，而这个线程池的最大线程为256，这个数字一般比thread-per-core的线程数大得多，且调度线程未绑定到CPU；
 2. `抢占调度会打断正在运行的系统调用`。如果这个系统调用需要很长时间才能完成，显然会被打断多次，整体性能反而降低；
 3. `goroutine离极限性能有明显差距`。对比隔壁c/c++协程库，其性能甚至能到goroutine的1.5倍；
 

diff --git a/docs/cn/why-rust.md b/docs/cn/why-rust.md
@@ -6,6 +6,8 @@ author: loongs-zhang
 
 # 语言选择
 
+[English](../en/why-rust.md) | 中文
+
 开发open-coroutine用什么语言呢？这是一个很重要的问题，毕竟不同的语言有不同的特性，选择不同的语言会对最终的结果产生很大的影响。
 
 之前研究c协程库时，有看到大佬已经尝试过用c写动态链接库、然后java通过jni去调这种方式，最终失败了，具体原因得深入JVM源码才能得知，对鄙人来说太高深，告辞，因此排除java/kotlin等JVM字节码语言。
@@ -16,8 +18,8 @@ author: loongs-zhang
 
 从研究过的好几个用c写的协程库来看，c的表达力差了点，需要编写巨量代码。相较之下，c++表达力就强多了，但开发的效率还是低了些，主要体现在以下几个方面：
 
-1. `需要不停地写cmake`，告诉系统怎么编译它，有些麻烦，而这其实是不应该操太多心的部分；
-2. `依赖管理麻烦`。如果要用别人写的类库，把代码拉下来，放到自己项目里，然后需要耗费大量时间来通过编译。如果别人依赖的库没有其他依赖还好，一旦有其他依赖，那么它依赖的依赖，也得按照刚才说的步骤处理，这就十分麻烦了；
+1. `必须写cmake`。纯粹为了告诉系统怎么编译，有些麻烦，而这其实是不应该操心的部分；
+2. `依赖管理麻烦`。如果要用别人写的类库，需要把代码拉下来，放到自己项目里，然后不得不耗费大量时间来通过编译。如果别人的库没有其他依赖还好，一旦有其他依赖，那么它依赖的依赖，也得按照刚才说的步骤处理，这就十分麻烦了；
 3. `内存不安全`。c++很难写出没有内存泄漏/崩溃的代码。
 
 <div style="text-align: center;">

diff --git a/docs/en/background.md b/docs/en/background.md
@@ -0,0 +1,95 @@
+---
+title: Reason for Birth
+date: 2025-02-24 17:08:33
+author: loongs-zhang
+---
+
+# Reason for Birth
+
+English | [中文](../cn/background.md)
+
+## The thread pool needs to be optimized
+
+In the early days, developers often adopted multiprocessing to support concurrent access to service applications by
+multiple users, which creates a service process for each TCP connection. Around 2000, it was quite popular to use CGI to
+write web services, and the most commonly used web server at that time was Apache 1.3.x series, which was developed
+based on the multiprocessing model. Because processes occupy more system resources while threads occupy fewer resources,
+people have started using multithreaded (usually using thread pools) to develop web service applications, which has
+increased the user concurrency supported by a single server, but there is still a problem of resource waste.
+
+In 2020, I joined the V company. Due to occasional occurrences of the thread pool being fully filled in the internal
+system, coupled with the fact that the leader had
+read [《Java线程池实现原理及其在美团业务中的实践》](https://tech.meituan.com/2020/04/02/java-pooling-pratice-in-meituan.html),
+we decided to build our own dynamic thread pool. From the process, the results were good:
+
+<div style="text-align: center;">
+    <img src="/docs/img/begin.jpg" width="50%">
+</div>
+
+But this don't fundamentally solve the problem. As is well known, thread context switching has a certain cost, and the
+more threads there are, the greater the cost of thread context switching. For CPU intensive tasks, simply ensure that
+the number of threads is equal to the number of CPU cores and bind the threads to the specified CPU core (hereinafter
+referred to as the `thread-per-core`), it can ensure optimal performance. For IO intensive tasks, since the task almost
+always blocks threads, the cost of thread context switching is generally less than the blocking cost. However, when the
+number of threads is too large, the cost of thread context switching will be greater than the blocking cost.
+
+The essence of dynamic thread pool is to adjust the number of threads to minimize the cost of thread context switching
+compared to blocking. Since this is manual, it cannot be guaranteed.
+
+<div style="text-align: center;">
+    <img src="/docs/img/run.jpg" width="50%">
+</div>
+
+## The pain of using NIO
+
+Is there a technology that can perform IO intensive tasks with performance comparable to multithreading while ensuring
+thread-per-core? The answer is `NIO`, but there are still some limitations or unfriendly aspects:
+
+1. The NIO API is more complex to use compared to the BIO API;
+2. System calls such as sleep still block threads. To achieve optimal performance, it is equivalent to disabling all
+   blocking calls, which is unfriendly to developers;
+3. In the thread pool mode, for a single thread, the next task can only be executed after the current task has been
+   completed, which cannot achieve fair scheduling between tasks;
+
+Note: Assuming a single thread with a CPU time slice of 1 second and 100 tasks, the fair scheduling refers to each task
+being able to fairly occupy a 10ms time slice.
+
+The first point can still be overcome, while the second and third points are weaknesses. In fact, if the third point can
+be implemented, RPC frameworks don't need to have too many threads, just thread-per-core.
+
+How can developers use it easily while ensuring that the performance of IO intensive tasks is not inferior to
+multi threading and thread-per-core? The `Coroutine` technology slowly entered my field of vision.
+
+## Goroutine still has shortcomings
+
+At the beginning of playing with coroutines, due to the cost of learning, I first chose `kotlin`. However, when I
+realized that kotlin's coroutines needed to change APIs (such as replacing Thread.sleep with kotlinx.coroutines.delay)
+to avoid blocking threads, I decisively adjusted the direction to `golang`. About 2 weeks later:
+
+<div style="text-align: center;">
+    <img src="/docs/img/good.jpeg" width="50%">
+</div>
+
+Which technology is strong in coroutine? Look for Golang in program languages. However, as I delved deeper into my
+studies, I discovered several shortcomings of goroutines:
+
+1. `Not thread-per-core`. The goroutine runtime is also supported by a thread pool, and the maximum number of threads in
+   this thread pool is 256, which is generally much larger than the number of threads in the thread-per-core, and the
+   scheduling thread is not bound to the CPU;
+2. `Preemptive scheduling will interrupt the running system calls`. If the system call takes a long time to complete, it
+   will obviously be interrupted multiple times, resulting in a decrease in overall performance;
+3. `There is a significant gap between goroutine and other in best performance`. Compared to the C/C++ coroutine
+   library, its performance can even reach 1.5 times that of goroutines;
+
+With regret, I continued to study the C/C++ coroutine libraries and found that they either only implemented `hook` (here
+we explain hook technology, in simple terms, proxy system calls, such as calling sleep. Without the hook, the operating
+system's sleep function would be called, and after the hook, it would point to our own code. For detailed operation
+steps, please refer to Chapters 41 and 42 of The Linux Programming Interface), or only implemented `work-stealing`.
+Some libraries only provided the most basic `coroutine abstraction`, and the most disappointing thing is that none of
+then implemented `preemptive scheduling`.
+
+There's no other way, it seems like we can only do it ourselves.
+
+<div style="text-align: center;">
+    <img src="/docs/img/just_do_it.jpg" width="100%">
+</div>
diff --git a/docs/en/why-rust.md b/docs/en/why-rust.md
@@ -0,0 +1,38 @@
+---
+title: Language Selection
+date: 2025-02-24 17:37:10
+author: loongs-zhang
+---
+
+# Language Selection
+
+English | [中文](../cn/why-rust.md)
+
+What language is used to develop open routine? This is a very important issue, as different languages have different
+features, and choosing different language can have a significant impact on the final outcome.
+
+When researching the C coroutine library before, I saw that some experts had already tried to write dynamic link
+libraries in C and call them in Java through JNI, but finally failed. The specific reason needs to be found in the
+JVM source code, which is too hard for me, goodbye. So JVM bytecode languages such as Java/Kotlin are excluded.
+
+Obviously, using Golang to implement a goroutine is no less complex than delving into JVM source code, and even if it is
+actually finished, no one would be willing to use it in a production environment, so Golang is excluded.
+
+Now, there are still three players left: c/c++/rust.
+
+From several coroutine libraries written in C that have been studied, it can be seen that the expressiveness of C is a
+bit lacking and requires writing a huge amount of code. In comparison, C++ has much stronger expressive power, but its
+development efficiency is still low, mainly reflected in the following aspects:
+
+1. `Have to write cmake`. Purely to tell the system how to compile, it's a bit troublesome, but this is actually the
+   part that shouldn't be worried about;
+2. `Difficulty in dependency management`. If you want to use a library written by someone else, you need to pull down
+   the code and put it into your own project, and then you have to spend a lot of time compiling it. If the library has
+   no other dependencies, it can barely be handled. Once there are other dependencies, the dependencies it depends on
+   must also be handled according to the steps just mentioned, which can be very troublesome;
+3. `Memory is unsafe`. It's difficult to write code in C++ without memory leaks/crashes.
+
+<div style="text-align: center;">
+    <img src="/docs/img/what_else_can_I_say.jpg" width="50%">
+    <img src="/docs/img/rust.jpeg" width="100%">
+</div>