New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup race in SchedulerBlockingFunction::Block() and ::Cancel() #759

Closed
GoogleCodeExporter opened this Issue Apr 6, 2015 · 3 comments

Comments

Projects
None yet
1 participant
@GoogleCodeExporter
Copy link

GoogleCodeExporter commented Apr 6, 2015

From mikearrgh@gmail.com:

Recently we got strange crash while load testing our reverse proxy with 
integrated PSOL 1.5.28.2. So far it have happened only once and I'm not sure 
whether it's pagespeed to blame or some memory corruption in our proxy.

(gdb) bt
#0  0x00007fdee1f77037 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fdee1f7a698 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fdee1f6fe03 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fdee1f6feb2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00000000004ce9ad in 
scoped_ptr<net_instaweb::ThreadSystem::Condvar>::operator-> 
(this=0x7fdee541f080 <_rtld_global+32>)
    at pagespeed/src/third_party/chromium/src/base/memory/scoped_ptr.h:201
#5  0x00000000006c4ff6 in net_instaweb::Scheduler::Wakeup (this=0x7fdee541f060 
<_rtld_global>) at pagespeed/kernel/thread/scheduler.cc:387
#6  0x00000000006c525d in net_instaweb::SchedulerBlockingFunction::Cancel 
(this=0x7fded77fbca0) at pagespeed/kernel/thread/scheduler.cc:427
#7  0x00000000006c5229 in net_instaweb::SchedulerBlockingFunction::Run 
(this=0x7fded77fbca0) at pagespeed/kernel/thread/scheduler.cc:422
#8  0x000000000069c70c in net_instaweb::Function::CallRun (this=0x7fded77fbca0) 
at pagespeed/kernel/base/function.cc:51
#9  0x00000000005643db in net_instaweb::RewriteDriver::FlushAsyncDone 
(this=0x7fdecc7cce00, num_rewrites=31, callback=0x7fded77fbca0) at 
net/instaweb/rewriter/rewrite_driver.cc:709
#10 0x000000000057cae6 in 
net_instaweb::MemberFunction2<net_instaweb::RewriteDriver, int, 
net_instaweb::Function*>::Run (this=0x7fded4081150) at 
./pagespeed/kernel/base/function.h:202
#11 0x000000000069c70c in net_instaweb::Function::CallRun (this=0x7fded4081150) 
at pagespeed/kernel/base/function.cc:51
#12 0x00000000006bcfc9 in net_instaweb::QueuedWorkerPool::Run 
(this=0x7fdedd065000, sequence=0x7fded2ca9300, worker=0x7fded047ac00) at 
pagespeed/kernel/thread/queued_worker_pool.cc:156
#13 0x00000000006c4214 in 
net_instaweb::MemberFunction2<net_instaweb::QueuedWorkerPool, 
net_instaweb::QueuedWorkerPool::Sequence*, net_instaweb::QueuedWorker*>::Run 
(this=0x7fded407fce0)
    at ./pagespeed/kernel/base/function.h:202
#14 0x000000000069c70c in net_instaweb::Function::CallRun (this=0x7fded407fce0) 
at pagespeed/kernel/base/function.cc:51
#15 0x00000000006c9a2c in net_instaweb::Worker::WorkThread::Run 
(this=0x7fded509f0c0) at pagespeed/kernel/thread/worker.cc:84
#16 0x00000000006bc5a3 in net_instaweb::PthreadThreadImpl::InvokeRun 
(self_ptr=0x7fded047c3d0) at 
pagespeed/kernel/thread/pthread_thread_system.cc:122
#17 0x00007fdee42a3f8e in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007fdee2039e1d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Backtrace shows that crash happened because of invalid Scheduler pointer:

(gdb) f 6
#6  0x00000000006c525d in net_instaweb::SchedulerBlockingFunction::Cancel 
(this=0x7fded77fbca0) at pagespeed/kernel/thread/scheduler.cc:427
427      scheduler_->Wakeup();
(gdb) p *this
$2 = {<net_instaweb::Function> = {_vptr.Function = 0x7fdedd0321c0, 
quit_requested_ = 0x7fded77fbce0, run_called_ = false, cancel_called_ = false, 
delete_after_callback_ = false}, 
  scheduler_ = 0x7fdee541f060 <_rtld_global>, success_ = false, done_ = {value_ = 652166099533665536}}

Looks like SchedulerBlockingFunction object was freed.

Looking at pagespeed code I can see that RewriteDriver::FlushAsyncDone is 
queued from RewriteDriver::FlushAsync and SchedulerBlockingFunction must stay 
alive until FlushAsyncDone is finished.

I've checked all the threads in core dump and there is no FlushAsync in 
progress with corresponding SchedulerBlockingFunction.

I think that SchedulerBlockingFunction was destroyed before FlushAsyncDone was 
finished.

In scheduler.cc I can see next functions:

bool SchedulerBlockingFunction::Block() {
  ScopedMutex lock(scheduler_->mutex());
  while (!done_.value()) {
    scheduler_->ProcessAlarms(10 * Timer::kSecondUs);
  }
  return success_;
}

and 

void SchedulerBlockingFunction::Cancel() {
  done_.set_value(true);
  scheduler_->Wakeup();
}

My understanding is that at some very rare event (some OS thread scheduler 
problem? or hardware?) Block can exit and SchedulerBlockingFunction can be 
destroyed before Cancel takes scheduler_ value. In that case it is possible 
that scheduler_ value is already garbage, since SchedulerBlockingFuntion was 
allocated on stack. 

I find this case very unlikely, but still possible in theory. I guess 
SchedulerBlockingFunction should provide correct synchronization mechanism and 
not rely on atomic boolean. Or perhaps scheduler pointer should be stored on 
stack before setting done_ to true (in SchedulerBlockingFunction::Cancel), thus 
avoiding access to possibly destroyed SchedulerBlockingFunction object.

Original issue reported on code.google.com by jmaes...@google.com on 31 Jul 2013 at 2:50

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Original comment by jmara...@google.com on 12 Aug 2013 at 6:58

  • Changed state: Fixed
@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Original comment by jmara...@google.com on 12 Aug 2013 at 6:58

  • Added labels: Milestone-v30, release-note
@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Was fixed in https://code.google.com/p/modpagespeed/source/detail?r=3397

Original comment by jmara...@google.com on 12 Aug 2013 at 6:59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment