Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResidueDB not thread safe. #905

Closed
timosachsenberg opened this issue Jul 20, 2014 · 15 comments
Closed

ResidueDB not thread safe. #905

timosachsenberg opened this issue Jul 20, 2014 · 15 comments
Labels
defect major critical issues that are important to fix OpenMS - library

Comments

@timosachsenberg
Copy link
Contributor

e.g. parsing of AASequences containing modifications with multiple threads causes segmentation faults.
Reason is that modified residues are implicitly added to the ResidueDB singleton if they don't exist.
#5 0x00007ffff6b95586 in OpenMS::Map<OpenMS::String, OpenMS::Map<OpenMS::String, OpenMS::Residue*> >::operator[](this=0x7678e8, key=...)

at /abi-data/sachsenb/OpenMS_IDE/src/openms/include/OpenMS/DATASTRUCTURES/Map.h:127

#6 0x00007ffff6b92179 in OpenMS::ResidueDB::addResidue_ (this=0x7670b0, r=0x7fffdc004400) at /abi-data/sachsenb/OpenMS_IDE/src/openms/source/CHEMISTRY/ResidueDB.cpp:161
#7 0x00007ffff6b94aae in OpenMS::ResidueDB::getModifiedResidue (this=0x7670b0, residue=0x846070, modification=...)

at /abi-data/sachsenb/OpenMS_IDE/src/openms/source/CHEMISTRY/ResidueDB.cpp:483

#8 0x00007ffff6b5b00a in OpenMS::AASequence::setModification (this=0x7fffdc004ad0, index=16, modification=...)

at /abi-data/sachsenb/OpenMS_IDE/src/openms/source/CHEMISTRY/AASequence.cpp:902

#9 0x00007ffff70a8be9 in OpenMS::ModifiedPeptideGenerator::applyFixedModifications (fixed_mods_begin=..., fixed_mods_end=..., peptide=...)

at /abi-data/sachsenb/OpenMS_IDE/src/openms/source/ANALYSIS/RNPXL/ModifiedPeptideGenerator.cpp:71

#10 0x0000000000497534 in SimpleSearchEngine::main_ (.omp_data_i=0x7fffffffaf40) at /abi-data/sachsenb/OpenMS_IDE/src/utils/SimpleSearchEngine.cpp:582
#11 0x00007ffff400ae3a in gomp_thread_start (xdata=) at ../.././libgomp/team.c:115
#12 0x00000033854079d1 in start_thread () from /lib64/libpthread.so.0
#13 0x00000030bd2e8b6d in clone () from /lib64/libc.so.6

@hroest
Copy link
Contributor

hroest commented Jul 21, 2014

#pragma omp critical(addResidue)
{
residue_mod_names_[*it][*mod_it] = r;
}

actually, I am more worried here about the buildResidueNames_ part, that also seems to change the state of the singleton.

Maybe it would be smarter to wrap the whole 483 addResidue_(res); call with a critical section (that should not happen too often, should it?). Another solution would be to only call const-functions on the singleton object (make getModifiedResidue const and return an error code if not found and then call a non-const function for adding the residue) and use critical on all non-const function calls.

@timosachsenberg
Copy link
Contributor Author

I think for rmost application it is fine to wrap addResidue_().

@cbielow
Copy link
Contributor

cbielow commented Jul 21, 2014

correct me if I'm wrong, but calling a const-function does not guarantee correct behaviour if at the same time a non-const (wrapped in critical or not) is changing the underlying data structure....

@timosachsenberg
Copy link
Contributor Author

yep you are right. e.g. reallocation of a container would be an example

@cbielow
Copy link
Contributor

cbielow commented Jul 21, 2014

there is multiple routes to this (there might be more):

  • make all calls to redidue DB critital (probably a no-go for performance reasons)
  • every thread maintains its own copy (overhead for creating entries.. but probably not severe)
  • create a read-only residiue DB which is 'complete' in the first place and does not need modification (don't know if thats feasible).
  • read-only access to ResDB. If an entry is missing, cancel all threads, modify DB, and resume. Since this only happens a limited number of times, the overhead could be bearable...

@timosachsenberg
Copy link
Contributor Author

my 2 cents (as a omp neewb):

  1. option: no-go as to slow (we basically loose all parallelisation)
  2. option: don't know how this could be done in practice (have a Singleton DefaultResidueDBProvider that creates mutable ResidueDB objects from the xml files and that need to be passed along? Include a mechanism to extend the underlying database in DefaultResidueDBProvider with the newly created entries?)
  3. option: partially feasible if all residues are created based on all existing modifications in unimod (doesn't work for tags etc.)
  4. option: don't know how this could be done if threads are started externally

@hroest
Copy link
Contributor

hroest commented Jul 21, 2014

One common solution that I know is uses widely in the linux kernel are spinlocks. One would need a write lock and each thread would first have to acquire a lock before an operation. There is only one write lock and while one thread is using the lock, no other thread may either read or write. Since this will be only a short amount of time, the thread will simply execute a while(locked) {;} NOOP. Of course, changing the lock needs to be atomic and there is overhead to acquire the lock. See also http://en.wikipedia.org/wiki/Semaphore_%28programming%29

furtunately, OpenMP already supplies some locking to some extend:

even though it does not allow our "many can read, only one can write" scenario. One implementation I found for this was here

This way, it is ensured that no reader is writing while a writer is writing and also that a writer actually gets to write at some point by acquiring one read-lock at a time. I wonder why this is so complicated, it seems that other people must have had the same problem before, but apparently there is no generic solution available as far as I can see.

@cbielow
Copy link
Contributor

cbielow commented Jul 21, 2014

sounds good and the last link seems to provide reasonable code.
This needs some benchmarking though, since I'm not too optimistic about OpenMP performance when it comes to locks. In general, the overhead was rather large from other places I've seen.
However, worth a try.
Also, we might need a second (non omp) version, which we can use if #threads == 1, since overhead of lock/unlock might be rather large (needs testing though).

@timosachsenberg
Copy link
Contributor Author

the problem seems to be real: at least I get errors like:

FATAL: uncaught exception!
last entry in the exception handler:
exception of type Parse Error occured in line 957, function static void OpenMS::AASequence::parseString_(const OpenMS::String&, OpenMS::AASequence&, bool) of /abi-data/sachsenb/OpenMS_IDE/src/openms/source/CHEMISTRY/AASequence.cpp
error message: Cannot convert string to amino acid sequence: unexpected character 'P' in: APL...

occassionally when processing modified peptides with -threads 10

I'm pretty sure that's an iterator invalidated by reallocation.

@aiche
Copy link
Contributor

aiche commented Jul 22, 2014

Timo, could you provide a simple test example producing the error, based on this we could test different solutions. I would assume that readers/writer lock is the most sensible solution, but we would need to implement by hand. Any volunteers?

@hroest
Copy link
Contributor

hroest commented Jul 30, 2014

@timosachsenberg does this only happen in multithreaded settings? are the exceptions caught inside or outside the OpenMP loop?

@timosachsenberg
Copy link
Contributor Author

So far only happens in multithreaded setting. Sorry, did not find time to put together a test. Something like calling AASequence::fromString("TEST[" + String(number) + "]PEPTIDE" in omp parallel for using a range of doubles should reproduce it. It is definitly thrown inside the loop but I have to check where it is caught (I think its the global exception handler but I'm not sure - why is this important?).

@hroest
Copy link
Contributor

hroest commented Jul 30, 2014

its important because catching exceptions outside the OpenMP loops are not supported while catching them inside the loop should (in theory) be fine. However in OpenMS there seem to be issues with multi threaded exception handling due to the singletons we use when doing that. Basically, uou can use exceptions inside parallel regions. But they mustn't leave these parallel regions.

See also https://software.intel.com/en-us/blogs/2009/11/05/openmp-and-exceptions

@hroest
Copy link
Contributor

hroest commented May 4, 2018

Related to #957

@hendrikweisser
Copy link
Contributor

Can this be closed based in #3993 or not yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect major critical issues that are important to fix OpenMS - library
Development

Successfully merging a pull request may close this issue.

5 participants