Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,9 @@ fn main() {

## ⚓ Learn More

- [Project Overview](core/docs/en/overview.md)
- [Background](docs/en/background.md)
- [Why Rust](docs/en/why-rust.md)
- [Coroutine Overview](core/docs/en/coroutine.md)
- [Scalable Stack Overview](core/docs/en/scalable-stack.md)
- [Monitor Overview](core/docs/en/monitor.md)
Expand Down
6 changes: 2 additions & 4 deletions core/docs/en/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,8 @@ author: loongs-zhang
The `open-coroutine` is a simple, efficient and generic stackfull-coroutine library, you can use this as a performance
replacement for IO thread pools, see [why better](../en/why-better.md).

[//]: # (todo 增加英文版本的文档)

- [Background](../../../docs/cn/background.md)
- [Why Rust](../../../docs/cn/why-rust.md)
- [Background](../../../docs/en/background.md)
- [Why Rust](../../../docs/en/why-rust.md)
- [Why Better](../en/why-better.md)
- [Quick Start](../../../README.md)
- [Coroutine Overview](../en/coroutine.md)
Expand Down
4 changes: 3 additions & 1 deletion docs/cn/background.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ author: loongs-zhang

# 诞生之因

[English](../en/background.md) | 中文

## 待调优的线程池

在早期程序员为了支持多个用户并发访问服务应用,往往采用多进程方式,即针对每一个TCP网络连接创建一个服务进程。在2000年左右,比较流行使用CGI方式编写Web服务,当时人们用的比较多的Web服务器是基于多进程模式开发的Apache
Expand Down Expand Up @@ -54,7 +56,7 @@ PS:假设单线程,CPU时间片为1s,有100个任务,公平调度指每

协程技术哪家强,编程语言找golang。然而随着更深入的学习,我发现几个`goroutine`的不足:

1. `不是严格的thread-per-core`。goroutine运行时也是由线程池来支撑的,而这个线程池的最大线程为256,这个数字可比thread-per-core的线程数大得多;
1. `不是thread-per-core`。goroutine运行时也是由线程池来支撑的,而这个线程池的最大线程为256,这个数字一般比thread-per-core的线程数大得多,且调度线程未绑定到CPU
2. `抢占调度会打断正在运行的系统调用`。如果这个系统调用需要很长时间才能完成,显然会被打断多次,整体性能反而降低;
3. `goroutine离极限性能有明显差距`。对比隔壁c/c++协程库,其性能甚至能到goroutine的1.5倍;

Expand Down
6 changes: 4 additions & 2 deletions docs/cn/why-rust.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ author: loongs-zhang

# 语言选择

[English](../en/why-rust.md) | 中文

开发open-coroutine用什么语言呢?这是一个很重要的问题,毕竟不同的语言有不同的特性,选择不同的语言会对最终的结果产生很大的影响。

之前研究c协程库时,有看到大佬已经尝试过用c写动态链接库、然后java通过jni去调这种方式,最终失败了,具体原因得深入JVM源码才能得知,对鄙人来说太高深,告辞,因此排除java/kotlin等JVM字节码语言。
Expand All @@ -16,8 +18,8 @@ author: loongs-zhang

从研究过的好几个用c写的协程库来看,c的表达力差了点,需要编写巨量代码。相较之下,c++表达力就强多了,但开发的效率还是低了些,主要体现在以下几个方面:

1. `需要不停地写cmake`,告诉系统怎么编译它,有些麻烦,而这其实是不应该操太多心的部分
2. `依赖管理麻烦`。如果要用别人写的类库,把代码拉下来,放到自己项目里,然后需要耗费大量时间来通过编译。如果别人依赖的库没有其他依赖还好,一旦有其他依赖,那么它依赖的依赖,也得按照刚才说的步骤处理,这就十分麻烦了;
1. `必须写cmake`。纯粹为了告诉系统怎么编译,有些麻烦,而这其实是不应该操心的部分
2. `依赖管理麻烦`。如果要用别人写的类库,需要把代码拉下来,放到自己项目里,然后不得不耗费大量时间来通过编译。如果别人的库没有其他依赖还好,一旦有其他依赖,那么它依赖的依赖,也得按照刚才说的步骤处理,这就十分麻烦了;
3. `内存不安全`。c++很难写出没有内存泄漏/崩溃的代码。

<div style="text-align: center;">
Expand Down
95 changes: 95 additions & 0 deletions docs/en/background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: Reason for Birth
date: 2025-02-24 17:08:33
author: loongs-zhang
---

# Reason for Birth

English | [中文](../cn/background.md)

## The thread pool needs to be optimized

In the early days, developers often adopted multiprocessing to support concurrent access to service applications by
multiple users, which creates a service process for each TCP connection. Around 2000, it was quite popular to use CGI to
write web services, and the most commonly used web server at that time was Apache 1.3.x series, which was developed
based on the multiprocessing model. Because processes occupy more system resources while threads occupy fewer resources,
people have started using multithreaded (usually using thread pools) to develop web service applications, which has
increased the user concurrency supported by a single server, but there is still a problem of resource waste.

In 2020, I joined the V company. Due to occasional occurrences of the thread pool being fully filled in the internal
system, coupled with the fact that the leader had
read [《Java线程池实现原理及其在美团业务中的实践》](https://tech.meituan.com/2020/04/02/java-pooling-pratice-in-meituan.html),
we decided to build our own dynamic thread pool. From the process, the results were good:

<div style="text-align: center;">
<img src="/docs/img/begin.jpg" width="50%">
</div>

But this don't fundamentally solve the problem. As is well known, thread context switching has a certain cost, and the
more threads there are, the greater the cost of thread context switching. For CPU intensive tasks, simply ensure that
the number of threads is equal to the number of CPU cores and bind the threads to the specified CPU core (hereinafter
referred to as the `thread-per-core`), it can ensure optimal performance. For IO intensive tasks, since the task almost
always blocks threads, the cost of thread context switching is generally less than the blocking cost. However, when the
number of threads is too large, the cost of thread context switching will be greater than the blocking cost.

The essence of dynamic thread pool is to adjust the number of threads to minimize the cost of thread context switching
compared to blocking. Since this is manual, it cannot be guaranteed.

<div style="text-align: center;">
<img src="/docs/img/run.jpg" width="50%">
</div>

## The pain of using NIO

Is there a technology that can perform IO intensive tasks with performance comparable to multithreading while ensuring
thread-per-core? The answer is `NIO`, but there are still some limitations or unfriendly aspects:

1. The NIO API is more complex to use compared to the BIO API;
2. System calls such as sleep still block threads. To achieve optimal performance, it is equivalent to disabling all
blocking calls, which is unfriendly to developers;
3. In the thread pool mode, for a single thread, the next task can only be executed after the current task has been
completed, which cannot achieve fair scheduling between tasks;

Note: Assuming a single thread with a CPU time slice of 1 second and 100 tasks, the fair scheduling refers to each task
being able to fairly occupy a 10ms time slice.

The first point can still be overcome, while the second and third points are weaknesses. In fact, if the third point can
be implemented, RPC frameworks don't need to have too many threads, just thread-per-core.

How can developers use it easily while ensuring that the performance of IO intensive tasks is not inferior to
multi threading and thread-per-core? The `Coroutine` technology slowly entered my field of vision.

## Goroutine still has shortcomings

At the beginning of playing with coroutines, due to the cost of learning, I first chose `kotlin`. However, when I
realized that kotlin's coroutines needed to change APIs (such as replacing Thread.sleep with kotlinx.coroutines.delay)
to avoid blocking threads, I decisively adjusted the direction to `golang`. About 2 weeks later:

<div style="text-align: center;">
<img src="/docs/img/good.jpeg" width="50%">
</div>

Which technology is strong in coroutine? Look for Golang in program languages. However, as I delved deeper into my
studies, I discovered several shortcomings of goroutines:

1. `Not thread-per-core`. The goroutine runtime is also supported by a thread pool, and the maximum number of threads in
this thread pool is 256, which is generally much larger than the number of threads in the thread-per-core, and the
scheduling thread is not bound to the CPU;
2. `Preemptive scheduling will interrupt the running system calls`. If the system call takes a long time to complete, it
will obviously be interrupted multiple times, resulting in a decrease in overall performance;
3. `There is a significant gap between goroutine and other in best performance`. Compared to the C/C++ coroutine
library, its performance can even reach 1.5 times that of goroutines;

With regret, I continued to study the C/C++ coroutine libraries and found that they either only implemented `hook` (here
we explain hook technology, in simple terms, proxy system calls, such as calling sleep. Without the hook, the operating
system's sleep function would be called, and after the hook, it would point to our own code. For detailed operation
steps, please refer to Chapters 41 and 42 of The Linux Programming Interface), or only implemented `work-stealing`.
Some libraries only provided the most basic `coroutine abstraction`, and the most disappointing thing is that none of
then implemented `preemptive scheduling`.

There's no other way, it seems like we can only do it ourselves.

<div style="text-align: center;">
<img src="/docs/img/just_do_it.jpg" width="100%">
</div>
38 changes: 38 additions & 0 deletions docs/en/why-rust.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Language Selection
date: 2025-02-24 17:37:10
author: loongs-zhang
---

# Language Selection

English | [中文](../cn/why-rust.md)

What language is used to develop open routine? This is a very important issue, as different languages have different
features, and choosing different language can have a significant impact on the final outcome.

When researching the C coroutine library before, I saw that some experts had already tried to write dynamic link
libraries in C and call them in Java through JNI, but finally failed. The specific reason needs to be found in the
JVM source code, which is too hard for me, goodbye. So JVM bytecode languages such as Java/Kotlin are excluded.

Obviously, using Golang to implement a goroutine is no less complex than delving into JVM source code, and even if it is
actually finished, no one would be willing to use it in a production environment, so Golang is excluded.

Now, there are still three players left: c/c++/rust.

From several coroutine libraries written in C that have been studied, it can be seen that the expressiveness of C is a
bit lacking and requires writing a huge amount of code. In comparison, C++ has much stronger expressive power, but its
development efficiency is still low, mainly reflected in the following aspects:

1. `Have to write cmake`. Purely to tell the system how to compile, it's a bit troublesome, but this is actually the
part that shouldn't be worried about;
2. `Difficulty in dependency management`. If you want to use a library written by someone else, you need to pull down
the code and put it into your own project, and then you have to spend a lot of time compiling it. If the library has
no other dependencies, it can barely be handled. Once there are other dependencies, the dependencies it depends on
must also be handled according to the steps just mentioned, which can be very troublesome;
3. `Memory is unsafe`. It's difficult to write code in C++ without memory leaks/crashes.

<div style="text-align: center;">
<img src="/docs/img/what_else_can_I_say.jpg" width="50%">
<img src="/docs/img/rust.jpeg" width="100%">
</div>