From d67c6f0b5049512bbf6c3646aa2e56c1fca66a22 Mon Sep 17 00:00:00 2001 From: dragon-zhang Date: Mon, 24 Feb 2025 21:16:41 +0800 Subject: [PATCH] Translate some documents --- README.md | 3 ++ core/docs/en/overview.md | 6 +-- docs/cn/background.md | 4 +- docs/cn/why-rust.md | 6 ++- docs/en/background.md | 95 ++++++++++++++++++++++++++++++++++++++++ docs/en/why-rust.md | 38 ++++++++++++++++ 6 files changed, 145 insertions(+), 7 deletions(-) create mode 100644 docs/en/background.md create mode 100644 docs/en/why-rust.md diff --git a/README.md b/README.md index 7061ffac..b95b34e2 100644 --- a/README.md +++ b/README.md @@ -185,6 +185,9 @@ fn main() { ## ⚓ Learn More +- [Project Overview](core/docs/en/overview.md) +- [Background](docs/en/background.md) +- [Why Rust](docs/en/why-rust.md) - [Coroutine Overview](core/docs/en/coroutine.md) - [Scalable Stack Overview](core/docs/en/scalable-stack.md) - [Monitor Overview](core/docs/en/monitor.md) diff --git a/core/docs/en/overview.md b/core/docs/en/overview.md index 492f639b..7bfb2daf 100644 --- a/core/docs/en/overview.md +++ b/core/docs/en/overview.md @@ -17,10 +17,8 @@ author: loongs-zhang The `open-coroutine` is a simple, efficient and generic stackfull-coroutine library, you can use this as a performance replacement for IO thread pools, see [why better](../en/why-better.md). -[//]: # (todo 增加英文版本的文档) - -- [Background](../../../docs/cn/background.md) -- [Why Rust](../../../docs/cn/why-rust.md) +- [Background](../../../docs/en/background.md) +- [Why Rust](../../../docs/en/why-rust.md) - [Why Better](../en/why-better.md) - [Quick Start](../../../README.md) - [Coroutine Overview](../en/coroutine.md) diff --git a/docs/cn/background.md b/docs/cn/background.md index dbc0bfdf..8866ca66 100644 --- a/docs/cn/background.md +++ b/docs/cn/background.md @@ -6,6 +6,8 @@ author: loongs-zhang # 诞生之因 +[English](../en/background.md) | 中文 + ## 待调优的线程池 在早期程序员为了支持多个用户并发访问服务应用,往往采用多进程方式,即针对每一个TCP网络连接创建一个服务进程。在2000年左右,比较流行使用CGI方式编写Web服务,当时人们用的比较多的Web服务器是基于多进程模式开发的Apache @@ -54,7 +56,7 @@ PS:假设单线程,CPU时间片为1s,有100个任务,公平调度指每 协程技术哪家强,编程语言找golang。然而随着更深入的学习,我发现几个`goroutine`的不足: -1. `不是严格的thread-per-core`。goroutine运行时也是由线程池来支撑的,而这个线程池的最大线程为256,这个数字可比thread-per-core的线程数大得多; +1. `不是thread-per-core`。goroutine运行时也是由线程池来支撑的,而这个线程池的最大线程为256,这个数字一般比thread-per-core的线程数大得多,且调度线程未绑定到CPU; 2. `抢占调度会打断正在运行的系统调用`。如果这个系统调用需要很长时间才能完成,显然会被打断多次,整体性能反而降低; 3. `goroutine离极限性能有明显差距`。对比隔壁c/c++协程库,其性能甚至能到goroutine的1.5倍; diff --git a/docs/cn/why-rust.md b/docs/cn/why-rust.md index 10a03ede..4bb8142d 100644 --- a/docs/cn/why-rust.md +++ b/docs/cn/why-rust.md @@ -6,6 +6,8 @@ author: loongs-zhang # 语言选择 +[English](../en/why-rust.md) | 中文 + 开发open-coroutine用什么语言呢?这是一个很重要的问题,毕竟不同的语言有不同的特性,选择不同的语言会对最终的结果产生很大的影响。 之前研究c协程库时,有看到大佬已经尝试过用c写动态链接库、然后java通过jni去调这种方式,最终失败了,具体原因得深入JVM源码才能得知,对鄙人来说太高深,告辞,因此排除java/kotlin等JVM字节码语言。 @@ -16,8 +18,8 @@ author: loongs-zhang 从研究过的好几个用c写的协程库来看,c的表达力差了点,需要编写巨量代码。相较之下,c++表达力就强多了,但开发的效率还是低了些,主要体现在以下几个方面: -1. `需要不停地写cmake`,告诉系统怎么编译它,有些麻烦,而这其实是不应该操太多心的部分; -2. `依赖管理麻烦`。如果要用别人写的类库,把代码拉下来,放到自己项目里,然后需要耗费大量时间来通过编译。如果别人依赖的库没有其他依赖还好,一旦有其他依赖,那么它依赖的依赖,也得按照刚才说的步骤处理,这就十分麻烦了; +1. `必须写cmake`。纯粹为了告诉系统怎么编译,有些麻烦,而这其实是不应该操心的部分; +2. `依赖管理麻烦`。如果要用别人写的类库,需要把代码拉下来,放到自己项目里,然后不得不耗费大量时间来通过编译。如果别人的库没有其他依赖还好,一旦有其他依赖,那么它依赖的依赖,也得按照刚才说的步骤处理,这就十分麻烦了; 3. `内存不安全`。c++很难写出没有内存泄漏/崩溃的代码。
diff --git a/docs/en/background.md b/docs/en/background.md new file mode 100644 index 00000000..8870bf92 --- /dev/null +++ b/docs/en/background.md @@ -0,0 +1,95 @@ +--- +title: Reason for Birth +date: 2025-02-24 17:08:33 +author: loongs-zhang +--- + +# Reason for Birth + +English | [中文](../cn/background.md) + +## The thread pool needs to be optimized + +In the early days, developers often adopted multiprocessing to support concurrent access to service applications by +multiple users, which creates a service process for each TCP connection. Around 2000, it was quite popular to use CGI to +write web services, and the most commonly used web server at that time was Apache 1.3.x series, which was developed +based on the multiprocessing model. Because processes occupy more system resources while threads occupy fewer resources, +people have started using multithreaded (usually using thread pools) to develop web service applications, which has +increased the user concurrency supported by a single server, but there is still a problem of resource waste. + +In 2020, I joined the V company. Due to occasional occurrences of the thread pool being fully filled in the internal +system, coupled with the fact that the leader had +read [《Java线程池实现原理及其在美团业务中的实践》](https://tech.meituan.com/2020/04/02/java-pooling-pratice-in-meituan.html), +we decided to build our own dynamic thread pool. From the process, the results were good: + +
+ +
+ +But this don't fundamentally solve the problem. As is well known, thread context switching has a certain cost, and the +more threads there are, the greater the cost of thread context switching. For CPU intensive tasks, simply ensure that +the number of threads is equal to the number of CPU cores and bind the threads to the specified CPU core (hereinafter +referred to as the `thread-per-core`), it can ensure optimal performance. For IO intensive tasks, since the task almost +always blocks threads, the cost of thread context switching is generally less than the blocking cost. However, when the +number of threads is too large, the cost of thread context switching will be greater than the blocking cost. + +The essence of dynamic thread pool is to adjust the number of threads to minimize the cost of thread context switching +compared to blocking. Since this is manual, it cannot be guaranteed. + +
+ +
+ +## The pain of using NIO + +Is there a technology that can perform IO intensive tasks with performance comparable to multithreading while ensuring +thread-per-core? The answer is `NIO`, but there are still some limitations or unfriendly aspects: + +1. The NIO API is more complex to use compared to the BIO API; +2. System calls such as sleep still block threads. To achieve optimal performance, it is equivalent to disabling all + blocking calls, which is unfriendly to developers; +3. In the thread pool mode, for a single thread, the next task can only be executed after the current task has been + completed, which cannot achieve fair scheduling between tasks; + +Note: Assuming a single thread with a CPU time slice of 1 second and 100 tasks, the fair scheduling refers to each task +being able to fairly occupy a 10ms time slice. + +The first point can still be overcome, while the second and third points are weaknesses. In fact, if the third point can +be implemented, RPC frameworks don't need to have too many threads, just thread-per-core. + +How can developers use it easily while ensuring that the performance of IO intensive tasks is not inferior to +multi threading and thread-per-core? The `Coroutine` technology slowly entered my field of vision. + +## Goroutine still has shortcomings + +At the beginning of playing with coroutines, due to the cost of learning, I first chose `kotlin`. However, when I +realized that kotlin's coroutines needed to change APIs (such as replacing Thread.sleep with kotlinx.coroutines.delay) +to avoid blocking threads, I decisively adjusted the direction to `golang`. About 2 weeks later: + +
+ +
+ +Which technology is strong in coroutine? Look for Golang in program languages. However, as I delved deeper into my +studies, I discovered several shortcomings of goroutines: + +1. `Not thread-per-core`. The goroutine runtime is also supported by a thread pool, and the maximum number of threads in + this thread pool is 256, which is generally much larger than the number of threads in the thread-per-core, and the + scheduling thread is not bound to the CPU; +2. `Preemptive scheduling will interrupt the running system calls`. If the system call takes a long time to complete, it + will obviously be interrupted multiple times, resulting in a decrease in overall performance; +3. `There is a significant gap between goroutine and other in best performance`. Compared to the C/C++ coroutine + library, its performance can even reach 1.5 times that of goroutines; + +With regret, I continued to study the C/C++ coroutine libraries and found that they either only implemented `hook` (here +we explain hook technology, in simple terms, proxy system calls, such as calling sleep. Without the hook, the operating +system's sleep function would be called, and after the hook, it would point to our own code. For detailed operation +steps, please refer to Chapters 41 and 42 of The Linux Programming Interface), or only implemented `work-stealing`. +Some libraries only provided the most basic `coroutine abstraction`, and the most disappointing thing is that none of +then implemented `preemptive scheduling`. + +There's no other way, it seems like we can only do it ourselves. + +
+ +
diff --git a/docs/en/why-rust.md b/docs/en/why-rust.md new file mode 100644 index 00000000..a21ed698 --- /dev/null +++ b/docs/en/why-rust.md @@ -0,0 +1,38 @@ +--- +title: Language Selection +date: 2025-02-24 17:37:10 +author: loongs-zhang +--- + +# Language Selection + +English | [中文](../cn/why-rust.md) + +What language is used to develop open routine? This is a very important issue, as different languages have different +features, and choosing different language can have a significant impact on the final outcome. + +When researching the C coroutine library before, I saw that some experts had already tried to write dynamic link +libraries in C and call them in Java through JNI, but finally failed. The specific reason needs to be found in the +JVM source code, which is too hard for me, goodbye. So JVM bytecode languages such as Java/Kotlin are excluded. + +Obviously, using Golang to implement a goroutine is no less complex than delving into JVM source code, and even if it is +actually finished, no one would be willing to use it in a production environment, so Golang is excluded. + +Now, there are still three players left: c/c++/rust. + +From several coroutine libraries written in C that have been studied, it can be seen that the expressiveness of C is a +bit lacking and requires writing a huge amount of code. In comparison, C++ has much stronger expressive power, but its +development efficiency is still low, mainly reflected in the following aspects: + +1. `Have to write cmake`. Purely to tell the system how to compile, it's a bit troublesome, but this is actually the + part that shouldn't be worried about; +2. `Difficulty in dependency management`. If you want to use a library written by someone else, you need to pull down + the code and put it into your own project, and then you have to spend a lot of time compiling it. If the library has + no other dependencies, it can barely be handled. Once there are other dependencies, the dependencies it depends on + must also be handled according to the steps just mentioned, which can be very troublesome; +3. `Memory is unsafe`. It's difficult to write code in C++ without memory leaks/crashes. + +
+ + +