fix: thread exit improvement for 10.14 and later #76

LouisBrunner · 2023-03-09T13:04:27Z

@paulfloyd Tested with your dummy program and no crashes anymore. Is that the kind of things you had in mind?

paulfloyd · 2023-03-09T20:31:52Z

@paulfloyd Tested with your dummy program and no crashes anymore. Is that the kind of things you had in mind?

I had two ideas, either similar to this (trying to get the right arguments for bsdthread_terminate, plus other platforms make the calls in inline asm).

Or alternatively see what happens if we just restart the thread by calling VG_(scheduler)(tid)

I'll try to explain.

At the point that we're trying to kill the thread we're in this line of code

	_pthread_exit(self, (self->fun)(self->arg));

(line 888 here https://github.com/apple-oss-distributions/libpthread/blob/libpthread-514/src/pthread.c)

The client thread function is pointed to by self->fun, at least conceptually. To get that to work, hijack_thread_state et al will have previously replaced the original self->fun with a pointer to pthread_hijack which when executed calls start_thread_NORETURN which will run the original thread function in the Valgrind scheduler.

If we can continue running the scheduler then it should execute _pthread_exit and clean up and kill the thread.

It may be necessary to store the result of the guest thread function in RAX.

I also suspect that the changes made to the stack in _call_on_new_stack_0_1 mean that that the return address (in the scheduler) will no longer take us back to the call to _pthread_exit

LouisBrunner · 2023-03-10T17:11:47Z

I had two ideas, either similar to this (trying to get the right arguments for bsdthread_terminate, plus other platforms make the calls in inline asm).

This would require reading the internal pthread struct right? (https://github.com/apple-oss-distributions/libpthread/blob/libpthread-514/src/pthread.c#L733) I couldn't find any place where Valgrind currently does that, it might also be a bit inconsistent between macOS (and thus libpthread) versions?

Is there any point in doing the call in assembly?

Or alternatively see what happens if we just restart the thread by calling VG_(scheduler)(tid)

Interesting, I will have a look.

paulfloyd · 2023-03-11T06:38:14Z

I had two ideas, either similar to this (trying to get the right arguments for bsdthread_terminate, plus other platforms make the calls in inline asm).

This would require reading the internal pthread struct right? (https://github.com/apple-oss-distributions/libpthread/blob/libpthread-514/src/pthread.c#L733) I couldn't find any place where Valgrind currently does that, it might also be a bit inconsistent between macOS (and thus libpthread) versions?

I really don't know what the right thing to do is. Not setting these values could cause a per-thread leak. But then we're creating the thread stack in pthread_hijack so maybe they should be 0 after all.

The call in pthread,c is

__bsdthread_terminate((void *)freeaddr, freesize, kport, custom_stack_sema);

We can get the first 3 args from 'self' and 'kport' passed to pthread_hijack.
However I don't see any way to access custom_stack_sema.

Is there any point in doing the call in assembly?

I don't see why it was done, but there must have been some reason, possibly a long time ago and only on Linux copied by all the other ports.

Or alternatively see what happens if we just restart the thread by calling VG_(scheduler)(tid)

Interesting, I will have a look.

asm(
".globl _pthread_hijack_asm\n"
"_pthread_hijack_asm:\n"
"   movq %rsp,%rbp\n"
"   push $0\n"    // alignment pad
"   push %rbp\n"  // original sp
                  // other values stay where they are in registers
"   push $0\n"    // fake return address
"   jmp _pthread_hijack\n"
);

The above looks one-way. A call instead of push/jmp might be needed.

LouisBrunner · 2023-03-16T13:20:32Z

I really don't know what the right thing to do is. Not setting these values could cause a per-thread leak. But then we're creating the thread stack in pthread_hijack so maybe they should be 0 after all.

The call in pthread,c is
__bsdthread_terminate((void *)freeaddr, freesize, kport, custom_stack_sema);
We can get the first 3 args from 'self' and 'kport' passed to pthread_hijack. However I don't see any way to access custom_stack_sema.

I dug into pthread a bit to understand the exact flow with the hijack (been a long time since I looked at it). Turns out, we already execute all the _pthread_exit code, including bsdthread_terminate. The syscall is caught by Valgrind and all clean up is already performed in PRE(bsdthread_terminate). Check the logs with --trace-syscalls=yes:

...
Hello from thread_function, and goodbye
SYSCALL[7427,2](unix:397) ... [async] --> Success(0x0:0x28)
SYSCALL[7427,2](unix:331) __disable_threadsignal(1, 40, 0) --> [pre-success] Success(0x0:0x0)
SYSCALL[7427,2](unix:361) bsdthread_terminate( 0x7000034c3000, 81000, mach_thread_self(), NULL ) --> [pre-success] Success(0x0:0x0)
...

So, all that we have left is to actually call bsdthread_terminate as you suggested but no arguments are required.

Is there any point in doing the call in assembly?

I don't see why it was done, but there must have been some reason, possibly a long time ago and only on Linux copied by all the other ports.

Ok, I looked at other ports, no harm to do it in assembly if it helps avoiding messing with the stack.

Or alternatively see what happens if we just restart the thread by calling VG_(scheduler)(tid)

Interesting, I will have a look.
asm(
".globl _pthread_hijack_asm\n"
"_pthread_hijack_asm:\n"
"   movq %rsp,%rbp\n"
"   push $0\n"    // alignment pad
"   push %rbp\n"  // original sp
                  // other values stay where they are in registers
"   push $0\n"    // fake return address
"   jmp _pthread_hijack\n"
);
The above looks one-way. A call instead of push/jmp might be needed.

As explained above, that code is already running.

In conclusion, this PR should be enough to solve the issue. Do you have any way to test on x86? It's been a long time since I last wrote assembly and I don't have any way to test outside amd64/arm64 macOS 13.

fix: thread exit improvement for 10.14 and later

c4c8533