-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multisource xrootd #558
Multisource xrootd #558
Conversation
if (m_parent1 && m_parent2) | ||
{ | ||
timespec stop; | ||
clock_gettime(CLOCK_MONOTONIC, &stop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test this on mac? IIRC it does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
- What should we do on Mac OS X instead of the monotonic clock?
- How do I get started with Mac OS X development in CMSSW?
My plan would be to fix compilation hours now, but wait for testing until a pre-release (as this requires a new version of Xrootd too).
Forgot to mention -- this requires Xrootd 3.3.3. If you want to test it (from CERN), you can do: scram setup /afs/cern.ch/cms/slc5_amd64_gcc472/external/xrootd-toolfile/1.0-cms6/etc/scram.d/xrootd.xml |
Do you really need those metrics on mac? I would simply ifdef them out, it not. If yes, you can probably have a look at: http://stackoverflow.com/questions/11680461/monotonic-clock-on-osx |
I would prefer to have the timing metrics (and, in general, have this work as well as on Linux) -- I don't want to cripple Mac OS X support. |
Ok. To get mac support the easiest thing is to use the usual recipe to install it https://twiki.cern.ch/twiki/bin/view/CMSPublic/SDTCMSSW_aptinstaller Just use a mac architecture. You can also use |
@nclopezo can you test and compile (also on osx108, please). |
Hi, When I ran the RelVals I got the following error in workflow 4.22, step2: globaltag = PRE_62_V8::All 271 DQMStore::DQMStore 22-Aug-2013 11:18:44 CEST Initiating request to open file root://eoscms//eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root?svcClass=default [2013-08-22 11:18:44 +0200][Error ][XRootD ] [lxfsrf49c01.cern.ch:1095] Handling error while processing : [ERROR] Error response. [2013-08-22 11:18:44 +0200][Error ][File ] [0x638f7140@root://eoscms//eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root?svcClass=default] Fatal file state error. Message returned with [ERROR] Server responded with an error: [3011] Unable to stat file /eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root; No such file or directory 22-Aug-2013 11:18:45 CEST Fallback request to file root://xrootd.ba.infn.it//store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root XrdSec: No authentication protocols are available. [2013-08-22 11:18:45 +0200][Error ][XRootDTransport ] [xrootd.ba.infn.it:1094 #0.0] No protocols left to try [2013-08-22 11:18:45 +0200][Error ][AsyncSock ] [xrootd.ba.infn.it:1094 #0.0] Socket error while handshaking: [FATAL] Auth failed [2013-08-22 11:18:45 +0200][Error ][PostMaster ] [xrootd.ba.infn.it:1094 #0] Unable to recover: [FATAL] Auth failed. [2013-08-22 11:18:45 +0200][Error ][XRootD ] [xrootd.ba.infn.it:1094] Impossible to send message . Trying to recover. [2013-08-22 11:18:45 +0200][Error ][XRootD ] [xrootd.ba.infn.it:1094] Handling error while processing : [FATAL] Auth failed. ----- Begin Fatal Exception 22-Aug-2013 11:18:45 CEST----------------------- An exception of category 'FallbackFileOpenError' occurred while [0] Constructing the EventProcessor [1] Constructing input source of type PoolSource [2] Calling RootInputFileSequence::initFile() [3] Calling StorageFactory::open() [4] Calling XrdFile::open() Exception Message: Failed to open the file 'root://xrootd.ba.infn.it//store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root' Additional Info: [a] Input file root://eoscms//eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root?svcClass=default could not be opened. Fallback Input file root://xrootd.ba.infn.it//store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root also could not be opened. [b] XrdCl::File::Open(name='root://xrootd.ba.infn.it//store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root', flags=0x10, permissions=0660) => error '[FATAL] Auth failed' (errno=0, code=204) ----- End Fatal Exception ------------------------------------------------- |
Hi, And also, I tried to build a CMSSW_7_0_XROOTD_X_2013-08-22-0200 for osx108, which is CMSSW_7_0_X_2013-08-22-0200 plus this pull request. When I was building i got the following error: /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc: In constructor 'XrdAdaptor::RequestMana ger::RequestManager(const string&, XrdCl::OpenFlags::Flags, XrdCl::Access::Mode)': /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:62:17: error: 'CLOCK_MONOTONIC' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:62:37: error: 'clock_gettime' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc: In member function 'std::future XrdAdaptor::RequestManager::handle(std::shared_ptr)': /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:230:17: error: 'CLOCK_MONOTONIC' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:230:38: error: 'clock_gettime' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc: In member function 'std::future XrdAdaptor::RequestManager::handle(std::shared_ptr >)': /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:318:19: error: 'CLOCK_MONOTONIC' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:318:40: error: 'clock_gettime' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc: In member function 'void XrdAdaptor::RequestManager::requestFailure(std::shared_ptr)': /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:404:23: error: 'CLOCK_MONOTONIC' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:404:44: error: 'clock_gettime' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc: In member function 'virtual void XrdAdaptor::RequestManager::OpenHandler::HandleResponseWithHosts(XrdCl::XRootDStatus*, XrdCl::AnyObject*, XrdCl::HostList*)': /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:538:23: error: 'CLOCK_MONOTONIC' was not declared in this scope /Volumes/build1/dmendezl/tmp/BUILDROOT/e81c0e286d98ac393fff5d6ad19197f6/opt/cmssw/osx108_amd64_gcc472/cms/cmssw/CMSSW_7_0_XROOTD_X_2013-08-22-0200/src/Utilities/XrdAdaptor/src/XrdRequestManager.cc:538:44: error: 'clock_gettime' was not declared in this scope gmake: *** [tmp/osx108_amd64_gcc472/src/Utilities/XrdAdaptor/src/UtilitiesXrdAdaptor/XrdRequestManager.o] Error 1 |
@nclopezo - how do I reproduce workflow 4.22, step2? From the message, it looks like an EOS error -- tough to tell though! It would be nice if I had your build available too... I see the issue with CLOCK_MONOTONIC - I had missed fixes for a source file. I'll get to that later today. |
Actually, my calendar just reminded me I have a faculty retreat all morning long. I did a quick fix for the CLOCK_MONOTONIC issue - still don't have a way to test on Mac OS X, can you try again? |
Hi @bbockelm To reproduce the error you can execute the following commands: scram p CMSSW_7_0_X_2013-08-23-0200 cd CMSSW_7_0_X_2013-08-23-0200/ cmsenv git cms-merge-topic 558 scram setup /afs/cern.ch/cms/slc5_amd64_gcc472/external/xrootd-toolfile/1.0-cms6/etc/scram.d/xrootd.xml scram b -j 12 runTheMatrix.py -l 4.22 I ran it again with jenkins and I noticed that the same error appears on workflows 4.53 and 1000.0, you can see the logs here: |
Hi @bbockelm I took your last commit and I built again for osx. This time it compiled without errors. |
Hi, I ran the RelVals on my installation on osx. And I am getting the same errors on workflows 4.22, 4.53 and 1000.0 that I showed you in previous messages when I tested on scl5. For example, this is the message for 4.22: globaltag = PRE_62_V8::All 271 DQMStore::DQMStore 26-Aug-2013 15:35:24 CEST Initiating request to open file root://eoscms//eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root?svcClass=default [2013-08-26 15:35:24 +0200][Error ][XRootD ] [lxfsrf45c01.cern.ch:1095] Handling error while processing : [ERROR] Error response. [2013-08-26 15:35:24 +0200][Error ][File ] [0x2e67b0f0@root://eoscms//eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root?svcClass=default] Fatal file state error. Message returned with [ERROR] Server responded with an error: [3011] Unable to stat file /eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root; No such file or directory ----- Begin Fatal Exception 26-Aug-2013 15:35:25 CEST----------------------- An exception of category 'FileOpenError' occurred while [0] Constructing the EventProcessor [1] Constructing input source of type PoolSource [2] Calling RootInputFileSequence::initFile() [3] Calling StorageFactory::open() [4] Calling XrdFile::open() Exception Message: Failed to open the file 'root://eoscms//eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root?svcClass=default' Additional Info: [a] Input file root://eoscms//eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root?svcClass=default could not be opened. [b] XrdCl::File::Stat(name='root://eoscms//eos/cms/store/data/Run2011A/Cosmics/RAW/v1/000/160/960/049F6443-8E53-E011-A943-003048F117EA.root?svcClass=default) => error '[ERROR] Error response: Unknown error: 3011' (errno=3011, code=400) [c] Active source: lxfsrf45c01.cern.ch:1095 ----- End Fatal Exception ------------------------------------------------- |
-1 |
this fails on EOS. Fixup error messages to use the correct ToStr.
fallback file-open failure message.
Figured out the issue - it's a bug in EOS that the old client avoided. The recent push does the same workaround as before; I'll followup with EOS separately. Following @nclopezo's reproduction recipe above, I can confirm the issue has gone away. |
@nclopezo David, please rerun the standard tests on this updated request |
@nclopezo - is it reasonable for me to do runTheMatrix from lxplus? If no, do I have access to any machine at CERN where I can do this myself? Would save a few round trips... |
Hi @bbockelm I am currently running the tests, and you can see the logs here: |
Hi @bbockelm Sorry, I forgot to include command for setting up xrootd 3.3.3. You can see the logs for the new run here: |
The tests finished, all passed |
+1 @ktf @davidlt @smuzaffar |
This pull request is fully signed and it will be integrated in one of the next IBs unless changes or unless it breaks tests. |
@Dr15Jones I just merged my xrootd changes to CMSDIST. All should be available in 0200 IB. |
Revert "Revert "Merge pull request cms-sw#558 from bbockelm/multisource-xrootd"" This reverts commit d0a3832.
Adjust include guards. Adjust comments on preprocessor macros.
This patch series switches XrdAdaptor to the new XrdCl's asynchronous interface.
This internally tracks file server performance and actively load-balance across two active sources in order to avoid poorly-performing endpoints.
Understandably, this is not a simple algorithm. See Utilities/XrdAdaptor/doc/multisource_algorithm_design.txt for a full description.