Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG Report] coctx_swap violates the Sys V ABI of i386 and AMD64 (违反Sys V ABI约定) #90

Open
hnes opened this issue Jun 30, 2018 · 14 comments

Comments

@hnes
Copy link

hnes commented Jun 30, 2018

Issue的最下方有BUG的详细中文描述,十分感谢。

Stack Pointer

The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary. In other words, the value (%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer, %esp, always points to the end of the latest allocated stack frame.

— Intel386-psABI-1.1:2.2.2 The Stack Frame

The stack pointer, %rsp, always points to the end of the latest allocated stack frame.

— Sys V ABI AMD64 Version 1.0:3.2.2 The Stack Frame

The ABI states that the (E|R)SP should always point to the end of the latest allocated stack frame. But in file coctx_swap.S of libco, the (E|R)SP had been used to address the memory on the heap.

By default, the signal handler is invoked on the normal process stack. It is possible to arrange that the signal handler uses an alternate stack; see sigalstack(2) for a discussion of how to do this and when it might be useful.

— man 7 signal : Signal dispositions

Terrible things may happend if the (E|R)SP is pointing to the data structure on the heap when signal comes. (Using the breakpoint and signal commands of gdb could produce such bug conveniently. Although by using sigalstack to change the default signal stack could alleviate the problem, but still, that kind of usage of (E|R)SP still violates the ABI.)

Control Words of x87 FPU and MXCSR

The control words of x87 FPU and MXCSR should be preserved across function calls (callee saved) in the Sys V ABI of i386 and AMD64. But in the current implementation of coctx_swap, there is no saving/restoring stuff about them.

hnes added a commit to hnes/libaco that referenced this issue Jul 1, 2018
@hnes
Copy link
Author

hnes commented Jul 1, 2018

I have created a bug proof in this branch.

To reproduce this bug:

$ git clone -b tencent_libco_bug_report_and_coctx_swap_benchmark https://github.com/hnes/libaco.git
$ cd libaco
$ # require gcc >= 5.0 to progress
$ bash libco_bug_build.sh
$ ls *libco_bug_0
test_libco_bug_0  test_m32_libco_bug_0
$ # use objdump find the address of acosw (coctx_swap) in test_m32_libco_bug_0
08048786 <acosw>: # actually it is indeed coctx_swap, i just changed the name for convenient build
 8048786:       8d 44 24 04             lea    eax,[esp+0x4]
 804878a:       8b 64 24 04             mov    esp,DWORD PTR [esp+0x4]
 804878e:       8d 64 24 20             lea    esp,[esp+0x20]
 8048792:       50                      push   eax
 8048793:       55                      push   ebp
 8048794:       56                      push   esi    # <- breakpoint we would use later
 8048795:       57                      push   edi
 8048796:       52                      push   edx
 8048797:       51                      push   ecx
 8048798:       53                      push   ebx
 8048799:       ff 70 fc                push   DWORD PTR [eax-0x4]
 804879c:       8b 60 04                mov    esp,DWORD PTR [eax+0x4]
 804879f:       58                      pop    eax
 80487a0:       5b                      pop    ebx
 80487a1:       59                      pop    ecx
 80487a2:       5a                      pop    edx
 80487a3:       5f                      pop    edi
 80487a4:       5e                      pop    esi
 80487a5:       5d                      pop    ebp
 80487a6:       5c                      pop    esp
 80487a7:       50                      push   eax
 80487a8:       31 c0                   xor    eax,eax
 80487aa:       c3                      ret
$ gdb ./test_m32_libco_bug_0
...
(gdb) break *0x8048794
Breakpoint 1 at 0x8048794: file coctx_swap.S, line 33.
(gdb) r
Starting program: /mnt/shr/codes/libaco/test_m32_libco_bug_0
main: sp:0xffffd1c8    ####<<<<!!!! main stack pointer

Breakpoint 1, acosw () at coctx_swap.S:33
33              pushl %esi
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.i686
(gdb) bt
#0  acosw () at coctx_swap.S:33
#1  0xffffd1c8 in ?? ()
#2  0xf7e121b3 in __libc_start_main () from /lib/libc.so.6
#3  0xf7ffcfbc in _DYNAMIC () from /lib/ld-linux.so.2
#4  0x00000001 in ?? ()
#5  0x08048693 in main () at test_libco_bug_0.c:61
(gdb) signal SIGINT
Continuing with signal SIGINT.
Received interrupt signal!
signal: sp:0x804b968         ####<<<<!!!! signal stack pointer -> HEAP!!! BUG!!!
signal: co:0x804c088 save_stack:0x804c0d8 share_stack:0xf7bf7000
signal: exit  ####^<<<<!!!! `co` is on the heap

Breakpoint 1, acosw () at coctx_swap.S:33
33              pushl %esi
(gdb)
$ # same amd64 version, i.e. the test_libco_bug_0

@hnes
Copy link
Author

hnes commented Jul 4, 2018

BUG 1: 栈指针的使用方式违反Sys V ABI约定

The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary. In other words, the value (%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer, %esp, always points to the end of the latest allocated stack frame.

— Intel386-psABI-1.1:2.2.2 The Stack Frame

The stack pointer, %rsp, always points to the end of the latest allocated stack frame.

— Sys V ABI AMD64 Version 1.0:3.2.2 The Stack Frame

ABI规范中规定用户空间程序的栈指针必须时刻指到运行栈的栈顶,而coctx_swap.S中却使用栈指针直接对位于堆中的数据结构进行寻址内存操作,这违反了ABI约定。

By default, the signal handler is invoked on the normal process stack. It is possible to arrange that the signal handler uses an alternate stack; see sigalstack(2) for a discussion of how to do this and when it might be useful.

— man 7 signal : Signal dispositions

当coctx_swap正在用栈指针对位于堆中的数据结构进行寻址内存操作时,若此时执行线程收到了一个信号,接着内核抢占了该执行线程并开始准备接下来用户空间线程的信号处理执行环境,由于在默认情况下,内核将会选择主栈作为信号处理函数的执行栈,但此时栈已经被指向了堆中(用户空间的程序违反ABI约定在先),那么信号处理函数的执行栈就会被错误的放置到堆中,这样,堆中的数据结构在接下来就极有可能会被破坏。

BUG修复的方法可以参考libaco项目的上下文切换函数acosw实现(libaco acosw的速度是Tencent libco coctx_swap速度的1.72倍)。

BUG 2: x87与MXCSR控制字的保存与恢复违反Sys V ABI约定

在Intel386及AMD64 Sys V ABI约定中,x87与MXCSR控制字是“callee saved”的,但是coctx_swap中并没有相关的保存与恢复实现。

根据libaco性能基准测试报告得到的结论,在上下文切换代码中增加对x87与MXCSR控制字的保存与恢复只会给切换汇编带来约0.87%的性能损耗。

@hnes
Copy link
Author

hnes commented Jul 4, 2018

libaco这一分支中,有一个关于coctx_swap的bug最小复现实例。

Bug复现过程:

$ git clone -b tencent_libco_bug_report_and_coctx_swap_benchmark https://github.com/hnes/libaco.git
$ cd libaco
$ # require gcc >= 5.0 to progress
$ # libaco在v1.1之后已经没有对gcc版本的限制了
$ bash libco_bug_build.sh
$ ls *libco_bug_0
test_libco_bug_0  test_m32_libco_bug_0
$ # use `objdump` find the address of acosw (coctx_swap) in test_m32_libco_bug_0
08048786 <acosw>: # 实际上它就是`coctx_swap`,这里为了方便构建直接改写成了libaco的切换函数`acosw`
 8048786:       8d 44 24 04             lea    eax,[esp+0x4]
 804878a:       8b 64 24 04             mov    esp,DWORD PTR [esp+0x4]
 804878e:       8d 64 24 20             lea    esp,[esp+0x20]
 8048792:       50                      push   eax
 8048793:       55                      push   ebp
 8048794:       56                      push   esi    # <- 我们选择的断点
 8048795:       57                      push   edi
 8048796:       52                      push   edx
 8048797:       51                      push   ecx
 8048798:       53                      push   ebx
 8048799:       ff 70 fc                push   DWORD PTR [eax-0x4]
 804879c:       8b 60 04                mov    esp,DWORD PTR [eax+0x4]
 804879f:       58                      pop    eax
 80487a0:       5b                      pop    ebx
 80487a1:       59                      pop    ecx
 80487a2:       5a                      pop    edx
 80487a3:       5f                      pop    edi
 80487a4:       5e                      pop    esi
 80487a5:       5d                      pop    ebp
 80487a6:       5c                      pop    esp
 80487a7:       50                      push   eax
 80487a8:       31 c0                   xor    eax,eax
 80487aa:       c3                      ret
$ gdb ./test_m32_libco_bug_0
...
(gdb) break *0x8048794
Breakpoint 1 at 0x8048794: file coctx_swap.S, line 33.
(gdb) r
Starting program: /mnt/shr/codes/libaco/test_m32_libco_bug_0
main: sp:0xffffd1c8    ####<<<<!!!! 默认的线程主栈

Breakpoint 1, acosw () at coctx_swap.S:33
33              pushl %esi
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.i686
(gdb) bt
#0  acosw () at coctx_swap.S:33
#1  0xffffd1c8 in ?? ()
#2  0xf7e121b3 in __libc_start_main () from /lib/libc.so.6
#3  0xf7ffcfbc in _DYNAMIC () from /lib/ld-linux.so.2
#4  0x00000001 in ?? ()
#5  0x08048693 in main () at test_libco_bug_0.c:61
(gdb) signal SIGINT   
Continuing with signal SIGINT.
Received interrupt signal!
signal: sp:0x804b968         ####<<<<!!!! 信号的执行栈被放置到了堆中 BUG复现
signal: co:0x804c088 save_stack:0x804c0d8 share_stack:0xf7bf7000
signal: exit  ####^<<<<!!!! 这个`co`对象是在堆中分配的

Breakpoint 1, acosw () at coctx_swap.S:33
33              pushl %esi
(gdb)
$ # 对于amd64,可以使用与上述相同的方法对test_libco_bug_0进行bug复现,略。

@hnes hnes changed the title [BUG Report] coctx_swap violates the Sys V ABI of i386 and AMD64 [BUG Report] coctx_swap violates the Sys V ABI of i386 and AMD64 (违反Sys V ABI约定) Jul 5, 2018
@yuanzhubi
Copy link

人人都知道这个bug啦,微信好像内部版本都修复了似的。

@hnes
Copy link
Author

hnes commented Jul 20, 2018

您是指刚刚这几天修复的么?还是指很久以前libco(在这个issue以前)就已经修复了呢(只是一直以来都没有放出来)?

十分感谢 ;-)

@yuanzhubi
Copy link

2016年听他们分享的时候他们就说他们修了。没想到这个开源版还停留在没修改的状态,不太了解具体情况。

@hnes
Copy link
Author

hnes commented Jul 20, 2018

十分感谢您提供的信息,对我很有用处 :D

@Ananfa
Copy link

Ananfa commented Jul 25, 2018

这是我对coctx_swap.S中__x86_64__部分的修改,欢迎指正(手头没有__i386__机器没法试验)

	leaq 8(%rsp), %rax       // rsp + 8 -> rax
	movq %rax, 104(%rdi)
	movq %rbx, 96(%rdi)
	movq -8(%rax), %rdx
	movq %rdx, 72(%rdi)     // ret func addr
	movq %rbp, 48(%rdi)
	movq %r12, 24(%rdi)
	movq %r13, 16(%rdi)
	movq %r14, 8(%rdi)
	movq %r15, 0(%rdi)
	movq %rsi, 64(%rdi)     // param
	movq %rdi, 56(%rdi)     // param
	
	movq 0(%rsi), %r15
	movq 8(%rsi), %r14
	movq 16(%rsi), %r13
	movq 24(%rsi), %r12
	movq 48(%rsi), %rbp
	movq 72(%rsi), %rax     // ret func addr
	movq 96(%rsi), %rbx
	movq 104(%rsi), %rsp
	movq 56(%rsi), %rdi     // param
	movq 64(%rsi), %rsi     // param
	pushq %rax
	
	xorl %eax, %eax
	ret

@hnes
Copy link
Author

hnes commented Jul 25, 2018

抱歉具体的没有仔细看,但是有两个建议哈:

  1. 为什么不直接使用jmp eax类似的方法呢?
pushq %rax
xorl %eax, %eax
ret
  1. 建议直接引用libaco的汇编切换,或者把腾讯libco的协程实现全部换成libaco,因为libaco我会一直维护它。

@hnes
Copy link
Author

hnes commented Jul 26, 2018

对了,如果要测试i386,可以使用gcc的-m32选项进行测试。

@Ananfa
Copy link

Ananfa commented Jul 26, 2018

十分感谢!您的建议很好
已经将

pushq %rax
xorl %eax, %eax
ret

改为

jmp *%rax

另附上__i386__实现

	leal 4(%esp), %edx // sp
	movl 4(%esp), %eax // param1
	movl %edx, 28(%eax)
	movl %ebp, 24(%eax)
	movl %esi, 20(%eax)
	movl %edi, 16(%eax)
	movl %ebx, 4(%eax)
	movl 0(%esp), %edx // ret func addr
	movl %edx, 0(%eax)
	
	movl 8(%esp), %eax // param2
	movl 4(%eax), %ebx
	movl 16(%eax), %edi
	movl 20(%eax), %esi
	movl 24(%eax), %ebp
	movl 28(%eax), %esp
	jmp *(%eax)

@ppLorins
Copy link

ppLorins commented Dec 23, 2018

@Ananfa
__x86_64__部分的修改,是不是少了对 %rcx %rdx对应的操作,应该是这样吧:

leaq 8(%rsp), %rax       // rsp + 8 -> rax
movq %rax, 104(%rdi)
movq %rbx, 96(%rdi)
movq %rcx, 88(%rdi)  # 缺少了 %rcx的拷贝?
movq %rdx, 80(%rdi)  # 缺少了 %rdx的拷贝?
movq -8(%rax), %rdx
movq %rdx, 72(%rdi)     // ret func addr		
movq %rbp, 48(%rdi)
movq %r12, 24(%rdi)
movq %r13, 16(%rdi)
movq %r14, 8(%rdi)
movq %r15, 0(%rdi)
movq %rsi, 64(%rdi)     // param
movq %rdi, 56(%rdi)     // param


movq 0(%rsi), %r15
movq 8(%rsi), %r14
movq 16(%rsi), %r13
movq 24(%rsi), %r12
movq 48(%rsi), %rbp
movq 72(%rsi), %rax     // ret func addr
movq 80(%rsi), %rdx  # 缺少了 %rcx的恢复?
movq 88(%rsi), %rcx  # 缺少了 %rcx的恢复?
movq 96(%rsi), %rbx
movq 104(%rsi), %rsp
movq 56(%rsi), %rdi     // param
movq 64(%rsi), %rsi     // param
pushq %rax

xorl %eax, %eax
ret

另外,__i386__的实现:

movl 0(%esp), %edx // ret func addr
movl %edx, 0(%eax)

能直接这样吗:

movl 0(%esp), 0(%eax) // ret func addr

对汇编不是很懂,如有错误,麻烦指正,谢谢。

@DGuco
Copy link

DGuco commented Nov 11, 2020

BUG 1: 栈指针的使用方式违反Sys V ABI约定.

能麻烦告知在当前有bug的实现中具体是哪句违反了Sys V ABI的约定吗,不是很懂~^o^~~^o^~,

	leaq 112(%rdi),%rsp
	pushq %rax

leaq 112(%rdi),%rsp 这一句把rsp堆栈指针指向了堆内存resg[13] ,然后后面的操作都是在调用push指令,因为堆栈的生长方向是由高到低的所以push指令把对应寄存器中的值按顺序push到resg[13]==>resg[0]中。问题出在不该把rsp堆栈指针指向了堆内存吗?

    mov     r13,QWORD PTR [rsi+0x8]
    mov     r14,QWORD PTR [rsi+0x10]
    mov     r15,QWORD PTR [rsi+0x18]
    mov     rax,QWORD PTR [rsi+0x20] // retaddr
    mov     rcx,QWORD PTR [rsi+0x28] // rsp
    mov     rbx,QWORD PTR [rsi+0x30]
    mov     rbp,QWORD PTR [rsi+0x38]
#ifndef ACO_CONFIG_SHARE_FPU_MXCSR_ENV
    fldcw   WORD PTR      [rsi+0x40]
    ldmxcsr DWORD PTR     [rsi+0x44]
#endif
    mov     rsp,rcx

可是在libaco的实现中mov rsp,rcx这句话不也同样把rsp堆栈指针指向了我们主动malloc出来的堆内存中吗?还是说问题出在leaq 112(%rdi),%rsp 后rsp堆栈指针没有指向用来模拟运行栈帧的堆内存的栈顶。
我又仔细一字一句的看了bug描述,我觉得应该是这样的
首先bug描述:“当coctx_swap正在用栈指针对位于堆中的数据结构进行寻址内存操作时,若此时执行线程收到了一个信号,接着内核抢占了该执行线程并开始准备接下来用户空间线程的信号处理执行环境,由于在默认情况下,内核将会选择主栈作为信号处理函数的执行栈,但此时栈已经被指向了堆中(用户空间的程序违反ABI约定在先),那么信号处理函数的执行栈就会被错误的放置到堆中,这样,堆中的数据结构在接下来就极有可能会被破坏。”
问题的关键在于主栈因为默认情况下,内核将会选择主栈作为信号处理函数的执行栈,而我们在创建协程的时候协程有可能也是从主栈切出的,也就是说有可能出现coctx_swap( mainco(主栈),coctx_t*to )的调用,而恰恰coctx_swap中汇编的第一个片断就是把要切出的栈保存自定义的结构中,此时在有bug的branch中通过 leaq 112(%rdi),%rsp把主栈的rsp指针指向了堆,然后通过push不停的移动rsp指针的位置,达到把要切出的栈保存自定义的结构中的目的。这种操作是不允许的。实际上如果我们不在主栈中往其他的协程切是不会有这个问题的,不知道我的理解对不对?
如果我的理解是正确的话,那么最新版本改成movq后问题bug应该是修复了
有没有大佬出来给个答案 ^o^^o^^o^^o^^o^

@KamenRider-Ming
Copy link

KamenRider-Ming commented Sep 1, 2022

这里感觉回答有点乱,我整理一下:
Bug1:
问题阐述 libco 在处理栈指针上的 movl 4(%esp), %esp 汇编指令,不满足栈顶指针对 16 字节对齐的约定
问题复现:当存在系统中断时,由于不是 16 字节对齐,中断处理函数会以为现在的堆栈是 16 字节对齐的,此时可能会出现一些奔溃,或者意料之外的事情,从而导致程序异常

Bug2:
问题阐述:libco 在浮点单元和信号掩码上没有做保存和恢复
解答:libco 觉得用不到,自己实现的协程库可以实现,性能损耗不大

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants