Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]a dead lock case #170

Closed
a8156268 opened this issue Jul 13, 2021 · 9 comments · Fixed by #179
Closed

[BUG]a dead lock case #170

a8156268 opened this issue Jul 13, 2021 · 9 comments · Fixed by #179
Labels
Milestone

Comments

@a8156268
Copy link

Describe the bug
A dead lock occured. And then cause download blocked.

Dead lock reason:
cleaner thread acquired connectionLock then try acquiring EventBus Lock
collector thread acquired EventBus Lock then try acquiring connectionLock

stack of ConnectionPool Cleaner thread

"6891.bt.net.pool.cleaner" #480 daemon prio=5 os_prio=0 tid=0x000000000af38800 nid=0xd9fb waiting for monitor entry [0x00007f56e4ef2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at bt.event.EventBus.firePeerDisconnected(EventBus.java:76)
        - waiting to lock <0x00000001330a6c18> (a bt.event.EventBus)
        at bt.net.PeerConnectionPool.purgeConnection(PeerConnectionPool.java:263)
        at bt.net.PeerConnectionPool.access$200(PeerConnectionPool.java:48)
        at bt.net.PeerConnectionPool$Cleaner.lambda$run$0(PeerConnectionPool.java:239)
        at bt.net.PeerConnectionPool$Cleaner$$Lambda$796/158316397.accept(Unknown Source)
        at bt.net.Connections$$Lambda$797/128473136.accept(Unknown Source)
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
        at bt.net.Connections.visitConnections(PeerConnectionPool.java:332)
        at bt.net.PeerConnectionPool$Cleaner.run(PeerConnectionPool.java:237)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

stack of peer-collector thread

"6891.bt.peer.peer-collector" #482 daemon prio=5 os_prio=0 tid=0x0000000008379800 nid=0xd9fd waiting on condition [0x00007f56e4e6f000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000001330a7e08> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at bt.net.PeerConnectionPool.getConnection(PeerConnectionPool.java:87)
        at bt.net.ConnectionSource.getExistingOrPendingConnection(ConnectionSource.java:169)
        at bt.net.ConnectionSource.getConnectionAsync(ConnectionSource.java:89)
        at bt.torrent.messaging.TorrentWorker.onPeerDiscovered(TorrentWorker.java:415)
        - locked <0x000000012f000ba8> (a bt.torrent.messaging.TorrentWorker)
        at bt.torrent.messaging.TorrentWorker.lambda$new$0(TorrentWorker.java:108)
        at bt.torrent.messaging.TorrentWorker$$Lambda$698/90086491.accept(Unknown Source)
        at bt.event.EventBus.lambda$addListener$3(EventBus.java:246)
        at bt.event.EventBus$$Lambda$599/292545143.accept(Unknown Source)
        at bt.event.EventBus.doFireEvent(EventBus.java:180)
        at bt.event.EventBus.fireEvent(EventBus.java:163)
        at bt.event.EventBus.firePeerDiscovered(EventBus.java:61)
        - locked <0x00000001330a6c18> (a bt.event.EventBus)
        at bt.peer.PeerRegistry.addPeer(PeerRegistry.java:214)
        at bt.peer.PeerRegistry.queryPeerSource(PeerRegistry.java:194)
        at bt.peer.PeerRegistry.lambda$null$2(PeerRegistry.java:137)
        at bt.peer.PeerRegistry$$Lambda$727/1689179593.accept(Unknown Source)
        at java.lang.Iterable.forEach(Iterable.java:75)
        at bt.peer.PeerRegistry.lambda$collectAndVisitPeers$3(PeerRegistry.java:136)
        at bt.peer.PeerRegistry$$Lambda$681/315981216.accept(Unknown Source)
        at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
        at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
        at bt.peer.PeerRegistry.collectAndVisitPeers(PeerRegistry.java:104)
        at bt.peer.PeerRegistry$$Lambda$679/1839725766.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
@a8156268 a8156268 added the bug label Jul 13, 2021
@a8156268
Copy link
Author

a8156268 commented Jul 13, 2021

Dose EventBus really need a sync lock?

@a8156268
Copy link
Author

data.worker still been blocked on EventBus


"6891.bt.torrent.data.worker-01" #547 daemon prio=5 os_prio=0 tid=0x000000000c879800 nid=0xda3e waiting for monitor entry [0x00007f56e39bf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at bt.event.EventBus.nextId(EventBus.java:151)
        - waiting to lock <0x00000001330a6c18> (a bt.event.EventBus)
        at bt.event.EventBus.firePieceVerified(EventBus.java:126)
        at bt.torrent.messaging.PieceConsumer.lambda$null$0(PieceConsumer.java:119)
        at bt.torrent.messaging.PieceConsumer$$Lambda$815/1020742529.accept(Unknown Source)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

@a8156268
Copy link
Author

Another deadlock:

=============================
"6881.bt.net.pool.connection-worker-49":
  waiting for ownable synchronizer 0x0000000151000040, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "6881.bt.net.pool.cleaner"
"6881.bt.net.pool.cleaner":
  waiting to lock monitor 0x0000000014c5eef8 (object 0x0000000151001fb0, a bt.torrent.messaging.TorrentWorker),
  which is held by "6881.bt.peer.peer-collector"
"6881.bt.peer.peer-collector":
  waiting for ownable synchronizer 0x0000000151000040, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "6881.bt.net.pool.cleaner"
  
===================================================
"6881.bt.net.pool.connection-worker-49":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000151000040> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at bt.net.PeerConnectionPool.addConnectionIfAbsent(PeerConnectionPool.java:131)
        at bt.net.IncomingConnectionListener.lambda$establishConnection$2(IncomingConnectionListener.java:89)
        at bt.net.IncomingConnectionListener$$Lambda$755/135765622.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
		
"6881.bt.net.pool.cleaner":
        at bt.torrent.messaging.TorrentWorker.onPeerDisconnected(TorrentWorker.java:430)
        - waiting to lock <0x0000000151001fb0> (a bt.torrent.messaging.TorrentWorker)
        at bt.torrent.messaging.TorrentWorker.lambda$new$2(TorrentWorker.java:112)
        at bt.torrent.messaging.TorrentWorker$$Lambda$699/1957213216.accept(Unknown Source)
        at bt.event.EventBus.lambda$addListener$3(EventBus.java:248)
        at bt.event.EventBus$$Lambda$599/1570015850.accept(Unknown Source)
        at bt.event.EventBus.doFireEvent(EventBus.java:182)
        at bt.event.EventBus.fireEvent(EventBus.java:165)
        at bt.event.EventBus.firePeerDisconnected(EventBus.java:80)
        at bt.net.PeerConnectionPool.purgeConnection(PeerConnectionPool.java:265)
        at bt.net.PeerConnectionPool.access$200(PeerConnectionPool.java:48)
        at bt.net.PeerConnectionPool$Cleaner.lambda$run$0(PeerConnectionPool.java:241)
        at bt.net.PeerConnectionPool$Cleaner$$Lambda$792/3405922.accept(Unknown Source)
        at bt.net.Connections$$Lambda$793/250877460.accept(Unknown Source)
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
        at bt.net.Connections.visitConnections(PeerConnectionPool.java:334)
        at bt.net.PeerConnectionPool$Cleaner.run(PeerConnectionPool.java:239)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
		
		
"6881.bt.peer.peer-collector":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000151000040> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at bt.net.PeerConnectionPool.getConnection(PeerConnectionPool.java:87)
        at bt.net.ConnectionSource.getExistingOrPendingConnection(ConnectionSource.java:169)
        at bt.net.ConnectionSource.getConnectionAsync(ConnectionSource.java:89)
        at bt.torrent.messaging.TorrentWorker.onPeerDiscovered(TorrentWorker.java:412)
        - locked <0x0000000151001fb0> (a bt.torrent.messaging.TorrentWorker)
        at bt.torrent.messaging.TorrentWorker.lambda$new$0(TorrentWorker.java:108)
        at bt.torrent.messaging.TorrentWorker$$Lambda$695/415660731.accept(Unknown Source)
        at bt.event.EventBus.lambda$addListener$3(EventBus.java:248)
        at bt.event.EventBus$$Lambda$599/1570015850.accept(Unknown Source)
        at bt.event.EventBus.doFireEvent(EventBus.java:182)
        at bt.event.EventBus.fireEvent(EventBus.java:165)
        at bt.event.EventBus.firePeerDiscovered(EventBus.java:62)
        at bt.peer.PeerRegistry.addPeer(PeerRegistry.java:214)
        at bt.peer.PeerRegistry.queryPeerSource(PeerRegistry.java:194)
        at bt.peer.PeerRegistry.lambda$null$2(PeerRegistry.java:137)
        at bt.peer.PeerRegistry$$Lambda$720/999371603.accept(Unknown Source)
        at java.lang.Iterable.forEach(Iterable.java:75)
        at bt.peer.PeerRegistry.lambda$collectAndVisitPeers$3(PeerRegistry.java:136)
        at bt.peer.PeerRegistry$$Lambda$678/1741590285.accept(Unknown Source)
        at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
        at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
        at bt.peer.PeerRegistry.collectAndVisitPeers(PeerRegistry.java:104)
        at bt.peer.PeerRegistry$$Lambda$676/355874953.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

@atomashpolskiy
Copy link
Owner

Thanks for reporting! This needs some additional thought. Seems like the culprit here is not the EventBus but rather the TorrentWorker's synchronized methods (onPeerDiscovered and onPeerDisconnected in the cases that you've encountered, with onPeerDisconnected being the only method that is used by the pool cleaner). I'm not sure why these methods were marked synchronized at the moment they were added (61abc06) -- most probably to enforce the hard limit on the number of peers -- but there's a good chance that things that they do can be done in a different way that would not need lock- or monitor-based synchronization. And onPeerDisconnected method in particular looks like it does not need to be synchronized at all...

@a8156268
Copy link
Author

Yes, both EventBus and TorrentWorker have this issue.
Since EventBus also has synchronized methods(firePeerDiscovered and firePeerConnected and firePeerDisconnected) that same with TorrentWorker(onPeerDiscovered and onPeerDisconnected and onPeerDisconnected)

@atomashpolskiy
Copy link
Owner

atomashpolskiy commented Jul 16, 2021 via email

@a8156268
Copy link
Author

EventBus has 8 fireXXX methods, but only the 3 fireXXX methods marked as synchronized.
From git history, the 3 methods is original first 3 methods, and other 5 was added by flowing commits.
That is, events are processed one at a time is not 100% guaranteed.

@atomashpolskiy
Copy link
Owner

atomashpolskiy commented Jul 16, 2021 via email

@pyckle
Copy link
Collaborator

pyckle commented Jul 18, 2021

Please also see the above pull request too. It uses lockfree algorithms to remove synchronization from TorrentWorker while ensuring that the limits set in the configuration are exactly adhered to.

@atomashpolskiy atomashpolskiy added this to the 1.10 milestone Jul 19, 2021
atomashpolskiy added a commit that referenced this issue Jul 20, 2021
Update comment, fix more race conditions related to Bug: #170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants