Skip to content

Commit

Permalink
sched: solve the thread completion/destruction race
Browse files Browse the repository at this point in the history
Our code had a serious bug in thread completion: When a thread complete()s,
i.e., finishes its work, it wake()s the thread doing join() on it,  and
that joiner thread in turn deletes the completed thread and its stack.
On rare occasions, the wake() was very slow but the joiner thread was very
quick in deleting the thread - leading to a crash on return (retq)
from wake() because the stack on which it was running has been deleted.

This patch includes a simple, but effective, fix for this bug:

We add a new per-cpu field, cpu::terminating_thread. complete() no longer
calls unref() itself - as the thread unref()ing itself caused the bug.
Instead, complete() just sets terminating_thread to the current thread.
After the scheduler on this CPU switches to the next thread, we
call unref() on the thread specified in terminating_thread. We know
this is safe because this thread is no longer running.

This fix seems simple and effective (the crashes that were apparent
in tst-wake and the sunflow benchmark seem to be gone, as far as I
can tell). Its biggest downside is an extra "if" on every context switch.
It is possible to devise different solutions, without the cost of the
extra if, but these solutions are more complicated and require a lot
more code changes. I'll add a bug-tracker entry documenting them.
  • Loading branch information
nyh committed Jun 23, 2013
1 parent 6278271 commit 96ee424
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 19 deletions.
36 changes: 17 additions & 19 deletions core/sched.cc
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ class thread::reaper {
cpu::cpu(unsigned _id)
: id(_id)
, idle_thread([this] { idle(); }, thread::attr(this))
, terminating_thread(nullptr)
{
percpu_init(id);
}
Expand Down Expand Up @@ -107,6 +108,10 @@ void cpu::reschedule_from_interrupt(bool preempt)
if (preempt) {
p->_fpu.restore();
}
if (p->_cpu->terminating_thread) {
p->_cpu->terminating_thread->unref();
p->_cpu->terminating_thread = nullptr;
}
}
}

Expand Down Expand Up @@ -370,21 +375,16 @@ void thread::ref()
void thread::unref()
{
if (_ref_counter.fetch_add(-1) == 1) {
// thread can't unref() itself, because if it decides to wake joiner,
// it will delete the stack it is currently running on.
assert(thread::current() != this);

// FIXME: we have a problem in case of a race between join() and the
// thread's completion. Here we can see _joiner==0 and not notify
// anyone, but at the same time join() decided to go to sleep (because
// status is not yet status::terminated) and we'll never wake it.
if (_joiner) {
_joiner->wake_with([&] { _status.store(status::terminated); });
// FIXME: At this point, we've awoken _joiner, which can promptly
// clean this thread up, including deleting the stack we and our
// caller (complete()) is running on. So right now we can be
// running on a deleted stack! This is a serious bug (even if
// rare because requires very specific timing).
// NOTE: This bug can especially be seen happen when wake above
// calls wrmsr (for the ipi) which takes a long time (an exit) and
// in the meantime the joiner deletes our stack, and the return
// from the ipi/wrmsr function (retq instruction) gets a segfault.
} else {
_status.store(status::terminated);
}
Expand Down Expand Up @@ -457,18 +457,16 @@ void thread::complete()
if (_attr.detached) {
_s_reaper->add_zombie(this);
}
// If this thread gets preempted after changing status (to terminating
// here, or to terminated in unref()), it will never be scheduled
// again and never get to wake the joiner. So must disable preemption.
// If this thread gets preempted after changing status it will never be
// scheduled again to set terminating_thread. So must disable preemption.
preempt_disable();
_status.store(status::terminating);
// unref() here will either do the exit (change state to terminated, wake
// joiner thread) now, or if a ref() is blocking us, leave the state
// "terminating" and the matching unref() will complete the termination.
unref();
preempt_enable();
// The thread is now in the "terminating" or "terminated" state, so on
// call to schedule() or on preemption, it will never get to run again.
// We want to run unref() here, but can't because it cause the stack we're
// running on to be deleted. Instead, set a _cpu field telling the next
// thread running on this cpu to do the unref() for us.
_cpu->terminating_thread = this;
// The thread is now in the "terminating" state, so on call to schedule()
// it will never get to run again.
while (true) {
schedule();
}
Expand Down
1 change: 1 addition & 0 deletions include/sched.hh
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@ struct cpu {
typedef lockless_queue<thread, &thread::_wakeup_link> incoming_wakeup_queue;
cpu_set incoming_wakeups_mask;
incoming_wakeup_queue* incoming_wakeups;
thread* terminating_thread;
static cpu* current();
void init_on_cpu();
void schedule(bool yield = false);
Expand Down

0 comments on commit 96ee424

Please sign in to comment.