Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Deadlock when rewriting resources #840

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 10 comments
Closed

Deadlock when rewriting resources #840

GoogleCodeExporter opened this issue Apr 6, 2015 · 10 comments

Comments

@GoogleCodeExporter
Copy link

Sometimes under high load I can see rewrites being stuck for 10 seconds. gdb 
gives this backtrace:

#0  pthread_cond_timedwait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000aabc36 in net_instaweb::PthreadCondvar::TimedWait 
(this=0x11456e0, timeout_ms=0) at pagespeed/kernel/thread/pthread_condvar.cc:66
#2  0x000000000066dabe in 
net_instaweb::CheckingThreadSystem::CheckingCondvar::TimedWait (this=0x1145730, 
timeout_ms=10000)
    at pagespeed/kernel/base/checking_thread_system.cc:55
#3  0x0000000000695d64 in net_instaweb::Scheduler::AwaitWakeupUntilUs 
(this=0x11455d0, wakeup_time_us=1385368997267462)
    at pagespeed/kernel/thread/scheduler.cc:382
#4  0x0000000000695e51 in net_instaweb::Scheduler::ProcessAlarms 
(this=0x11455d0, timeout_us=10000000) at 
pagespeed/kernel/thread/scheduler.cc:401
#5  0x0000000000696034 in net_instaweb::SchedulerBlockingFunction::Block 
(this=0x7ffff0b69dd0) at pagespeed/kernel/thread/scheduler.cc:434
#6  0x0000000000540226 in net_instaweb::RewriteDriver::FinishParse 
(this=0x13e66e0) at net/instaweb/rewriter/rewrite_driver.cc:2470

It seems that sometimes Wakeup() can be called before AwaitWakeupUntilUs starts 
waiting on condition variable and thus misses condition variable broadcast and 
stucks waiting for 10 second timeout given at 
SchedulerBlockingFunction::Block().

If I lock mutex in Scheduler::Wakeup before broadcasting on condition variable 
then I can no longer see timeouts, everything works fine.

I'm using custom reverse proxy with PSOL 1.5.28.2. Looking at code I believe 
that issue is still present in trunk.

Original issue reported on code.google.com by mikear...@gmail.com on 25 Nov 2013 at 9:03

@GoogleCodeExporter
Copy link
Author

Thanks; will take a look. If Wakeup() got called early it shouldn't really 
matter, though, since it should evaluate the predicate before sleeping. 

Original comment by morlov...@google.com on 25 Nov 2013 at 3:29

@GoogleCodeExporter
Copy link
Author

This looks (from the backtrace) like it might be 
https://code.google.com/p/modpagespeed/issues/detail?id=759 which was fixed.  
Maks, what do you think?

If so, upgrading should sort the problem out.

Original comment by jmaes...@google.com on 25 Nov 2013 at 3:53

@GoogleCodeExporter
Copy link
Author

Issue 759 was reported by me as well and I have fix backported.

Original comment by mikear...@gmail.com on 25 Nov 2013 at 4:09

@GoogleCodeExporter
Copy link
Author

Likely schedule for the race:
1) Thread 1: Acquire mutex()
2) Thread 1: Read done = false (+memory acquire)
3) Thread 2: Write done = true  (+memory release)
4) Thread 2: Wakeup()
5) Thread 1: Wait()

Locking in Wakeup() of course makes it impossible for it to serialize at (4), 
but I think I would prefer to lock in SchedulerBlockingFunction::Cancel to also 
give atomicity to steps 3 + 4 to make things a bit easier to reason about.

Original comment by morlov...@google.com on 3 Dec 2013 at 6:35

@GoogleCodeExporter
Copy link
Author

Yup, that looks like it.  Agree with your suggested fix, sounds like we're not 
consistent about whether Cancel() is run under mutex() or not and that will 
need to be fixed first.

Original comment by jmaes...@google.com on 3 Dec 2013 at 6:56

@GoogleCodeExporter
Copy link
Author

Subbmitted https://code.google.com/p/modpagespeed/source/detail?r=3670
It's similar to what you suggested, but with an additional change to make sure 
that locking state for callbacks is consistent (w/o it one can deadlock trying 
to 
acquire the same lock twice from the same thread).

Original comment by morlov...@google.com on 9 Dec 2013 at 3:20

@GoogleCodeExporter
Copy link
Author

Original comment by jefftk@google.com on 2 Jan 2014 at 4:19

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Original comment by jefftk@google.com on 2 Jan 2014 at 5:21

  • Added labels: Milestone-v31

@GoogleCodeExporter
Copy link
Author

Original comment by hui...@google.com on 6 Jan 2014 at 6:25

  • Changed title: Deadlock when rewriting resources

@GoogleCodeExporter
Copy link
Author

Not deadlock, just missed wake up.

Original comment by morlov...@google.com on 7 Jan 2014 at 3:09

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant