New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid taking open handler mutex from request manager failure. #8130
Conversation
Observed deadlock: Thread 1: - FileTimer::Run holds FileTimer::pMutex - FileStateHandler::Tick wants to take the FileStateHandler::pMutex Thread 2: - FileStateHandler::OnStateError holds the FileStateHandler::pMutex lock - RequestManager::requestFailure calls - OpenHandler::current_source, which wants the OpenHandler::m_mutex Thread 3: - OpenHandler::HandleResponseWithHosts holds OpenHandler::m_mutex, - ~FileStateHandler calls FileTimer::UnRegisterFileObject which tries to get the FileTimer::pMutex. We remove the call to OpenHandler::current_source to break the deadlock. If a file-open is in progress, we cannot take the open handler mutex from within (RequestManager::requestFailure). It is safe to call XrdAdaptor::RequestManager::OpenHandler::open from within the requestFailure callback; if the file-open was in progress, it will return the shared future and not touch Xrootd code. If the file-open was not in progress, it is safe to take the open handler mutex in the first place.
@Dr15Jones here's the counterpart to #8129. Still trying to figure out how to completely avoid taking this lock from inside and outside a xrootd callback. I'm at FNAL on Monday/Tuesday; I'll see if I can sneak out of meetings and brainstorm. |
A new Pull Request was created by @bbockelm (Brian Bockelman) for CMSSW_7_5_X. Avoid taking open handler mutex from request manager failure. It involves the following packages: Utilities/XrdAdaptor @cmsbuild, @Dr15Jones, @ktf, @nclopezo can you please review it and eventually sign? Thanks. |
Please test |
We also need this for 7_4 |
The tests are being triggered in jenkins. |
+1 |
This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs unless changes (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @nclopezo, @ktf, @smuzaffar |
+1 |
Avoid taking open handler mutex from request manager failure.
Observed deadlock:
Thread 1:
Thread 2:
Thread 3:
to get the FileTimer::pMutex.
We remove the call to OpenHandler::current_source to break the deadlock.
If a file-open is in progress, we cannot take the open handler
mutex from within (RequestManager::requestFailure).
It is safe to call XrdAdaptor::RequestManager::OpenHandler::open
from within the requestFailure callback; if the file-open was in progress, it will return the shared
future and not touch Xrootd code. If the file-open was not in progress, it
is safe to take the open handler mutex in the first place.
See also #8129 - that also fixes the deadlock quoted above. However, this patch has the advantage of not trying to acquire OpenHandler::m_mutex when OpenHandler::HandleResponseWithHosts is alive and calling into the Xrootd library.