Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker threads get stuck waiting for Semaphore #348

Closed
MichaelRoeder opened this issue Apr 24, 2020 · 1 comment
Closed

Worker threads get stuck waiting for Semaphore #348

MichaelRoeder opened this issue Apr 24, 2020 · 1 comment
Labels

Comments

@MichaelRoeder
Copy link
Member

Error description

All worker threads seem to be stuck while preparing the datasets or doing some post processing of the received annotator results.

The stack traces of the workers:

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-1 (NIF WS)","D-1 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=100.0% of dataset
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:50)
org.aksw.gerbil.execute.ExperimentTask.prepareAnnotatorResults(ExperimentTask.java:229)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:330)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:143)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-2 (NIF WS)","D-2 (uploaded)","RE","STRONG_ENTITY_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMarkings(SameAsRetrieverUtils.java:31)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:75)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-1 (NIF WS)","D-3 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMarkings(SameAsRetrieverUtils.java:31)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:75)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-3 (NIF WS)","D-2 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=100.0% of dataset
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:50)
org.aksw.gerbil.execute.ExperimentTask.prepareAnnotatorResults(ExperimentTask.java:229)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:330)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:143)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMarkings(SameAsRetrieverUtils.java:31)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:75)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:50)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-3 (NIF WS)","D-4 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=100.0% of dataset
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:50)
org.aksw.gerbil.execute.ExperimentTask.prepareAnnotatorResults(ExperimentTask.java:229)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:330)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:143)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-1 (NIF WS)","D-4 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=100.0% of dataset
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:50)
org.aksw.gerbil.execute.ExperimentTask.prepareAnnotatorResults(ExperimentTask.java:229)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:330)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:143)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

Summary of stack traces

  • XXX is one of the in-build annotator
  • A-x is a NIF-based webservice where x denotes an ID
  • D-x is an uploaded dataset
Thread ID Annotator Dataset Progress Pos. of thread
1 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
2 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
3 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
4 A-1 D-1 100% FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
5 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
6 A-2 D-2 null FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
7 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
8 A-1 D-3 null FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
9 A-3 D-2 100% FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
10 XXX ACE2004 null FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
11 A-3 D-4 100% FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
12 A-1 D-4 100% FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)

It looks like threads 1, 2, 3, 4 and 7 wait for thread 10 to finish the initialisation of the ACE2004 dataset. This seems to be fine. However, the threads 4, 6, 8, 9, 10, 11 and 12 seem to wait to get access to the FileBasedCachingSameAsRetriever and it is unclear which thread has not released the Semaphore, before.

There is no Exception in the logs that seems to be related to that. GERBIL is configured to use 12 worker threads and all of them are still alive - so no thread crashed.

Proposed solution

It should be checked whether the usage of the Semaphore class is really necessary. Especially within the FileBasedCachingSameAsRetriever other, safer methods could be useful.

@MichaelRoeder
Copy link
Member Author

The proposed solution above does not work in this case, since the class makes use of two Semaphores

    private static final int MAX_CONCURRENT_READERS = 1000;

    private Semaphore cacheReadMutex = new Semaphore(MAX_CONCURRENT_READERS);
    private Semaphore cacheWriteMutex = new Semaphore(1);

In line 115, the cacheReadMutex is aquired and later on, released in line 169.

        try {
            cacheReadMutex.acquire();
        } catch (InterruptedException e) {
            LOGGER.error("Exception while waiting for read mutex. Returning null.", e);
            return null;
        }
        ...
        cacheReadMutex.release();
        return result;

The issue is caused by this part of the code being not covered by a try-finally construct. Because of that problem, the 1000 available read permits were lost throughout the time GERBIL was running. As soon as all of them were lost, the service got stuck.

Proposed solution

  • Fix this part of the class using try-finally.
  • Go through all other classes using Semaphores and check them for the same issue.

MichaelRoeder added a commit that referenced this issue May 1, 2020
Fixed #348 by ensuring that semaphore permits are released.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant