Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG Report] coctx_swap violates the Sys V ABI of i386 and AMD64 (违反Sys V ABI约定) #90

Open
hnes opened this Issue Jun 30, 2018 · 12 comments

Comments

Projects
None yet
4 participants
@hnes
Copy link

hnes commented Jun 30, 2018

Issue的最下方有BUG的详细中文描述,十分感谢。

Stack Pointer

The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary. In other words, the value (%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer, %esp, always points to the end of the latest allocated stack frame.

— Intel386-psABI-1.1:2.2.2 The Stack Frame

The stack pointer, %rsp, always points to the end of the latest allocated stack frame.

— Sys V ABI AMD64 Version 1.0:3.2.2 The Stack Frame

The ABI states that the (E|R)SP should always point to the end of the latest allocated stack frame. But in file coctx_swap.S of libco, the (E|R)SP had been used to address the memory on the heap.

By default, the signal handler is invoked on the normal process stack. It is possible to arrange that the signal handler uses an alternate stack; see sigalstack(2) for a discussion of how to do this and when it might be useful.

— man 7 signal : Signal dispositions

Terrible things may happend if the (E|R)SP is pointing to the data structure on the heap when signal comes. (Using the breakpoint and signal commands of gdb could produce such bug conveniently. Although by using sigalstack to change the default signal stack could alleviate the problem, but still, that kind of usage of (E|R)SP still violates the ABI.)

Control Words of x87 FPU and MXCSR

The control words of x87 FPU and MXCSR should be preserved across function calls (callee saved) in the Sys V ABI of i386 and AMD64. But in the current implementation of coctx_swap, there is no saving/restoring stuff about them.

hnes added a commit to hnes/libaco that referenced this issue Jul 1, 2018

@hnes

This comment has been minimized.

Copy link
Author

hnes commented Jul 1, 2018

I have created a bug proof in this branch.

To reproduce this bug:

$ git clone -b tencent_libco_bug_report_and_coctx_swap_benchmark https://github.com/hnes/libaco.git
$ cd libaco
$ # require gcc >= 5.0 to progress
$ bash libco_bug_build.sh
$ ls *libco_bug_0
test_libco_bug_0  test_m32_libco_bug_0
$ # use objdump find the address of acosw (coctx_swap) in test_m32_libco_bug_0
08048786 <acosw>: # actually it is indeed coctx_swap, i just changed the name for convenient build
 8048786:       8d 44 24 04             lea    eax,[esp+0x4]
 804878a:       8b 64 24 04             mov    esp,DWORD PTR [esp+0x4]
 804878e:       8d 64 24 20             lea    esp,[esp+0x20]
 8048792:       50                      push   eax
 8048793:       55                      push   ebp
 8048794:       56                      push   esi    # <- breakpoint we would use later
 8048795:       57                      push   edi
 8048796:       52                      push   edx
 8048797:       51                      push   ecx
 8048798:       53                      push   ebx
 8048799:       ff 70 fc                push   DWORD PTR [eax-0x4]
 804879c:       8b 60 04                mov    esp,DWORD PTR [eax+0x4]
 804879f:       58                      pop    eax
 80487a0:       5b                      pop    ebx
 80487a1:       59                      pop    ecx
 80487a2:       5a                      pop    edx
 80487a3:       5f                      pop    edi
 80487a4:       5e                      pop    esi
 80487a5:       5d                      pop    ebp
 80487a6:       5c                      pop    esp
 80487a7:       50                      push   eax
 80487a8:       31 c0                   xor    eax,eax
 80487aa:       c3                      ret
$ gdb ./test_m32_libco_bug_0
...
(gdb) break *0x8048794
Breakpoint 1 at 0x8048794: file coctx_swap.S, line 33.
(gdb) r
Starting program: /mnt/shr/codes/libaco/test_m32_libco_bug_0
main: sp:0xffffd1c8    ####<<<<!!!! main stack pointer

Breakpoint 1, acosw () at coctx_swap.S:33
33              pushl %esi
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.i686
(gdb) bt
#0  acosw () at coctx_swap.S:33
#1  0xffffd1c8 in ?? ()
#2  0xf7e121b3 in __libc_start_main () from /lib/libc.so.6
#3  0xf7ffcfbc in _DYNAMIC () from /lib/ld-linux.so.2
#4  0x00000001 in ?? ()
#5  0x08048693 in main () at test_libco_bug_0.c:61
(gdb) signal SIGINT
Continuing with signal SIGINT.
Received interrupt signal!
signal: sp:0x804b968         ####<<<<!!!! signal stack pointer -> HEAP!!! BUG!!!
signal: co:0x804c088 save_stack:0x804c0d8 share_stack:0xf7bf7000
signal: exit  ####^<<<<!!!! `co` is on the heap

Breakpoint 1, acosw () at coctx_swap.S:33
33              pushl %esi
(gdb)
$ # same amd64 version, i.e. the test_libco_bug_0
@hnes

This comment has been minimized.

Copy link
Author

hnes commented Jul 4, 2018

BUG 1: 栈指针的使用方式违反Sys V ABI约定

The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary. In other words, the value (%esp + 4) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer, %esp, always points to the end of the latest allocated stack frame.

— Intel386-psABI-1.1:2.2.2 The Stack Frame

The stack pointer, %rsp, always points to the end of the latest allocated stack frame.

— Sys V ABI AMD64 Version 1.0:3.2.2 The Stack Frame

ABI规范中规定用户空间程序的栈指针必须时刻指到运行栈的栈顶,而coctx_swap.S中却使用栈指针直接对位于堆中的数据结构进行寻址内存操作,这违反了ABI约定。

By default, the signal handler is invoked on the normal process stack. It is possible to arrange that the signal handler uses an alternate stack; see sigalstack(2) for a discussion of how to do this and when it might be useful.

— man 7 signal : Signal dispositions

当coctx_swap正在用栈指针对位于堆中的数据结构进行寻址内存操作时,若此时执行线程收到了一个信号,接着内核抢占了该执行线程并开始准备接下来用户空间线程的信号处理执行环境,由于在默认情况下,内核将会选择主栈作为信号处理函数的执行栈,但此时栈已经被指向了堆中(用户空间的程序违反ABI约定在先),那么信号处理函数的执行栈就会被错误的放置到堆中,这样,堆中的数据结构在接下来就极有可能会被破坏。

BUG修复的方法可以参考libaco项目的上下文切换函数acosw实现(libaco acosw的速度是Tencent libco coctx_swap速度的1.72倍)。

BUG 2: x87与MXCSR控制字的保存与恢复违反Sys V ABI约定

在Intel386及AMD64 Sys V ABI约定中,x87与MXCSR控制字是“callee saved”的,但是coctx_swap中并没有相关的保存与恢复实现。

根据libaco性能基准测试报告得到的结论,在上下文切换代码中增加对x87与MXCSR控制字的保存与恢复只会给切换汇编带来约0.87%的性能损耗。

@hnes

This comment has been minimized.

Copy link
Author

hnes commented Jul 4, 2018

libaco这一分支中,有一个关于coctx_swap的bug最小复现实例。

Bug复现过程:

$ git clone -b tencent_libco_bug_report_and_coctx_swap_benchmark https://github.com/hnes/libaco.git
$ cd libaco
$ # require gcc >= 5.0 to progress
$ # libaco在v1.1之后已经没有对gcc版本的限制了
$ bash libco_bug_build.sh
$ ls *libco_bug_0
test_libco_bug_0  test_m32_libco_bug_0
$ # use `objdump` find the address of acosw (coctx_swap) in test_m32_libco_bug_0
08048786 <acosw>: # 实际上它就是`coctx_swap`,这里为了方便构建直接改写成了libaco的切换函数`acosw`
 8048786:       8d 44 24 04             lea    eax,[esp+0x4]
 804878a:       8b 64 24 04             mov    esp,DWORD PTR [esp+0x4]
 804878e:       8d 64 24 20             lea    esp,[esp+0x20]
 8048792:       50                      push   eax
 8048793:       55                      push   ebp
 8048794:       56                      push   esi    # <- 我们选择的断点
 8048795:       57                      push   edi
 8048796:       52                      push   edx
 8048797:       51                      push   ecx
 8048798:       53                      push   ebx
 8048799:       ff 70 fc                push   DWORD PTR [eax-0x4]
 804879c:       8b 60 04                mov    esp,DWORD PTR [eax+0x4]
 804879f:       58                      pop    eax
 80487a0:       5b                      pop    ebx
 80487a1:       59                      pop    ecx
 80487a2:       5a                      pop    edx
 80487a3:       5f                      pop    edi
 80487a4:       5e                      pop    esi
 80487a5:       5d                      pop    ebp
 80487a6:       5c                      pop    esp
 80487a7:       50                      push   eax
 80487a8:       31 c0                   xor    eax,eax
 80487aa:       c3                      ret
$ gdb ./test_m32_libco_bug_0
...
(gdb) break *0x8048794
Breakpoint 1 at 0x8048794: file coctx_swap.S, line 33.
(gdb) r
Starting program: /mnt/shr/codes/libaco/test_m32_libco_bug_0
main: sp:0xffffd1c8    ####<<<<!!!! 默认的线程主栈

Breakpoint 1, acosw () at coctx_swap.S:33
33              pushl %esi
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.i686
(gdb) bt
#0  acosw () at coctx_swap.S:33
#1  0xffffd1c8 in ?? ()
#2  0xf7e121b3 in __libc_start_main () from /lib/libc.so.6
#3  0xf7ffcfbc in _DYNAMIC () from /lib/ld-linux.so.2
#4  0x00000001 in ?? ()
#5  0x08048693 in main () at test_libco_bug_0.c:61
(gdb) signal SIGINT   
Continuing with signal SIGINT.
Received interrupt signal!
signal: sp:0x804b968         ####<<<<!!!! 信号的执行栈被放置到了堆中 BUG复现
signal: co:0x804c088 save_stack:0x804c0d8 share_stack:0xf7bf7000
signal: exit  ####^<<<<!!!! 这个`co`对象是在堆中分配的

Breakpoint 1, acosw () at coctx_swap.S:33
33              pushl %esi
(gdb)
$ # 对于amd64,可以使用与上述相同的方法对test_libco_bug_0进行bug复现,略。

@hnes hnes changed the title [BUG Report] coctx_swap violates the Sys V ABI of i386 and AMD64 [BUG Report] coctx_swap violates the Sys V ABI of i386 and AMD64 (违反Sys V ABI约定) Jul 5, 2018

@yuanzhubi

This comment has been minimized.

Copy link

yuanzhubi commented Jul 20, 2018

人人都知道这个bug啦,微信好像内部版本都修复了似的。

@hnes

This comment has been minimized.

Copy link
Author

hnes commented Jul 20, 2018

您是指刚刚这几天修复的么?还是指很久以前libco(在这个issue以前)就已经修复了呢(只是一直以来都没有放出来)?

十分感谢 ;-)

@yuanzhubi

This comment has been minimized.

Copy link

yuanzhubi commented Jul 20, 2018

2016年听他们分享的时候他们就说他们修了。没想到这个开源版还停留在没修改的状态,不太了解具体情况。

@hnes

This comment has been minimized.

Copy link
Author

hnes commented Jul 20, 2018

十分感谢您提供的信息,对我很有用处 :D

@Ananfa

This comment has been minimized.

Copy link

Ananfa commented Jul 25, 2018

这是我对coctx_swap.S中__x86_64__部分的修改,欢迎指正(手头没有__i386__机器没法试验)

	leaq 8(%rsp), %rax       // rsp + 8 -> rax
	movq %rax, 104(%rdi)
	movq %rbx, 96(%rdi)
	movq -8(%rax), %rdx
	movq %rdx, 72(%rdi)     // ret func addr
	movq %rbp, 48(%rdi)
	movq %r12, 24(%rdi)
	movq %r13, 16(%rdi)
	movq %r14, 8(%rdi)
	movq %r15, 0(%rdi)
	movq %rsi, 64(%rdi)     // param
	movq %rdi, 56(%rdi)     // param
	
	movq 0(%rsi), %r15
	movq 8(%rsi), %r14
	movq 16(%rsi), %r13
	movq 24(%rsi), %r12
	movq 48(%rsi), %rbp
	movq 72(%rsi), %rax     // ret func addr
	movq 96(%rsi), %rbx
	movq 104(%rsi), %rsp
	movq 56(%rsi), %rdi     // param
	movq 64(%rsi), %rsi     // param
	pushq %rax
	
	xorl %eax, %eax
	ret
@hnes

This comment has been minimized.

Copy link
Author

hnes commented Jul 25, 2018

抱歉具体的没有仔细看,但是有两个建议哈:

  1. 为什么不直接使用jmp eax类似的方法呢?
pushq %rax
xorl %eax, %eax
ret
  1. 建议直接引用libaco的汇编切换,或者把腾讯libco的协程实现全部换成libaco,因为libaco我会一直维护它。
@hnes

This comment has been minimized.

Copy link
Author

hnes commented Jul 26, 2018

对了,如果要测试i386,可以使用gcc的-m32选项进行测试。

@Ananfa

This comment has been minimized.

Copy link

Ananfa commented Jul 26, 2018

十分感谢!您的建议很好
已经将

pushq %rax
xorl %eax, %eax
ret

改为

jmp *%rax

另附上__i386__实现

	leal 4(%esp), %edx // sp
	movl 4(%esp), %eax // param1
	movl %edx, 28(%eax)
	movl %ebp, 24(%eax)
	movl %esi, 20(%eax)
	movl %edi, 16(%eax)
	movl %ebx, 4(%eax)
	movl 0(%esp), %edx // ret func addr
	movl %edx, 0(%eax)
	
	movl 8(%esp), %eax // param2
	movl 4(%eax), %ebx
	movl 16(%eax), %edi
	movl 20(%eax), %esi
	movl 24(%eax), %ebp
	movl 28(%eax), %esp
	jmp *(%eax)
@ppLorins

This comment has been minimized.

Copy link

ppLorins commented Dec 23, 2018

@Ananfa
__x86_64__部分的修改,是不是少了对 %rcx %rdx对应的操作,应该是这样吧:

leaq 8(%rsp), %rax       // rsp + 8 -> rax
movq %rax, 104(%rdi)
movq %rbx, 96(%rdi)
movq %rcx, 88(%rdi)  # 缺少了 %rcx的拷贝?
movq %rdx, 80(%rdi)  # 缺少了 %rdx的拷贝?
movq -8(%rax), %rdx
movq %rdx, 72(%rdi)     // ret func addr		
movq %rbp, 48(%rdi)
movq %r12, 24(%rdi)
movq %r13, 16(%rdi)
movq %r14, 8(%rdi)
movq %r15, 0(%rdi)
movq %rsi, 64(%rdi)     // param
movq %rdi, 56(%rdi)     // param


movq 0(%rsi), %r15
movq 8(%rsi), %r14
movq 16(%rsi), %r13
movq 24(%rsi), %r12
movq 48(%rsi), %rbp
movq 72(%rsi), %rax     // ret func addr
movq 80(%rsi), %rdx  # 缺少了 %rcx的恢复?
movq 88(%rsi), %rcx  # 缺少了 %rcx的恢复?
movq 96(%rsi), %rbx
movq 104(%rsi), %rsp
movq 56(%rsi), %rdi     // param
movq 64(%rsi), %rsi     // param
pushq %rax

xorl %eax, %eax
ret

另外,__i386__的实现:

movl 0(%esp), %edx // ret func addr
movl %edx, 0(%eax)

能直接这样吗:

movl 0(%esp), 0(%eax) // ret func addr

对汇编不是很懂,如有错误,麻烦指正,谢谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.