New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate Calibration Code to New Random Service Interface #2523
Conversation
Migrate Calibration code to use the new interface of the random number generator service designed to work with the multithreaded Framework. The main interface change is to require a StreamID or LuminosityBlockIndex argument to the getEngine function. These objects are only available during the event or beginLuminosityBlock method. There is only one calibration module using the random number interface. I do not know how to test this module as nothing else in CMSSW references it.
A new Pull Request was created by @wddgit (W. David Dagenhart) for CMSSW_7_1_X. Migrate Calibration Code to New Random Service Interface It involves the following packages: CalibMuon/DTCalibration @cmsbuild, @Degano, @diguida, @rcastello, @nclopezo can you please review it and eventually sign? Thanks. |
-1 runTheMatrix-results/401.0_TTbarNewMix+TTbarFSPU2+HARVESTFS/step1_TTbarNewMix+TTbarFSPU2+HARVESTFS.log ----- Begin Fatal Exception 19-Feb-2014 10:08:58 CET----------------------- An exception of category 'FallbackFileOpenError' occurred while [0] Constructing the EventProcessor [1] Constructing module: class=MixingModule label='mixGenPU' [2] Calling RootInputFileSequence::initFile() [3] Calling StorageFactory::open() [4] Calling XrdFile::open() Exception Message: Failed to open the file 'root://xrootd.ba.infn.it//store/relval/CMSSW_5_3_6-START53_V14/RelValProdMinBias/GEN-SIM-RAW/v2/00000/4677049F-042A-E211-8525-0026189438E8.root' Additional Info: [a] Input file root://eoscms//eos/cms/store/relval/CMSSW_5_3_6-START53_V14/RelValProdMinBias/GEN-SIM-RAW/v2/00000/4677049F-042A-E211-8525-0026189438E8.root?svcClass=default could not be opened. Fallback Input file root://xrootd.ba.infn.it//store/relval/CMSSW_5_3_6-START53_V14/RelValProdMinBias/GEN-SIM-RAW/v2/00000/4677049F-042A-E211-8525-0026189438E8.root also could not be opened. [b] XrdClient::Open(name='root://xrootd.ba.infn.it//store/relval/CMSSW_5_3_6-START53_V14/RelValProdMinBias/GEN-SIM-RAW/v2/00000/4677049F-042A-E211-8525-0026189438E8.root', flags=0x10, permissions=0666) => error 'cannot obtain credentials for protocol: Secgsi: ErrParseBuffer: unknown CA: cannot verify server certificate: kXGS_init: unable to get protocol object.' (errno=3010) [c] Current server connection: root://xrootd.ba.infn.it:1094//store/relval/CMSSW_5_3_6-START53_V14/RelValProdMinBias/GEN-SIM-RAW/v2/00000/4677049F-042A-E211-8525-0026189438E8.root ----- End Fatal Exception ------------------------------------------------- you can see the results of the tests here: |
xrootd is on holidays today... On 19 Feb 2014, at 10:33, cmsbuild wrote:
|
@ktf but we only go to xrootd if we first fail to find a file in eos. |
I think it should have been "EOS is on holidays". xrootd == EOS in my mind… Anyways, problems seems to be solved: https://hypernews.cern.ch/HyperNews/CMS/get/cernCompAnnounce/911/1.html david can you restart the tests? |
I restarted them, the results are in my previous comment. |
+1 |
Hi @wddgit @Dr15Jones |
Can you please mention the contacts here as well and put a -1 until you are ok? |
Hi @ktf |
Hi @olschewski @passaseo @ronchese Thanks in advance for contributing to keep CMSSW well maintained [1]
|
Ciao Salvatore, On 20 febbraio 2014 19:14:04 CET, Salvatore Di Guida notifications@github.com wrote:
-- Inviato dal mio tablet con K-9 Mail. Marina Passaseo |
Dear All, On 20/02/2014 19:14, Salvatore Di Guida wrote:
Marina Passaseo /Istituto Nazionale di Fisica Nucleare/ /Sezione di Padova/ /Via Marzolo, 8/ /35131 Padova/ // /Tel: +390499677099/ /Fax: +390499677110/ |
Yes, beginLuminosityBlock and JobEnd could be on different threads. In fact there could be multiple beginLuminosityBlock's running concurrently, each on its own thread and each would have its own random engine instance. JobEnd would not run concurrently with them, but might or might not be on the same thread as one of them. Independently of that, for replay to work properly, random numbers can only be generated in the event and beginLuminosityBlock methods and the new interface makes it much more difficult to violate this rule. This replay requirement predates this multithreading effort, although there have been a number of violations of it. This is one of the reasons replay often fails to work. |
I am not sure if you were included in the previous discussions so I will repeat this. With the interface change no argument getEngine function will be deleted. The current version of this code will fail to compile when that function is deleted. In the future, a StreamID or LuminosityBlockIndex will be required as an argument. It is only possible to get those in the beginLuminosityBlock method and the event method of a module. |
Not approving for now. Alca please put a -1 until responsible are happy (or get them to agree). |
-1
Regarding the old code:
As far as the
Finally concerning the changes proposed here:
Based on the last two points, I have a question to @wddgit and @Dr15Jones : is the
flag thread-safe? If two threads start at the same time, they both could see it false and triggering the creation of the payload and the writing into the DB. Am I correct? This also reminds me that this will be a "Classic" module (one instance shared across all Thanks |
@passaseo @ronchese
Can you please make the configuration work again? Thanks |
Since this software is not used o written by us,
Marina Passaseo /Istituto Nazionale di Fisica Nucleare/ /Sezione di Padova/ /Via Marzolo, 8/ /35131 Padova/ // /Tel: +390499677099/ /Fax: +390499677110/ |
The flag in the proposed new code is OK in a "legacy (aka classic)" type module. It would also be OK in a "one" type module that watches luminosity blocks. If we were to convert the module to be a "stream" or "global" type module, the flag would be a problem with thread safety and we would have to do something else, possibly make it an atomic. Based on what the code does, I would think we would probably make it a "one" type module. I will make that additional change if you would like. I didn't only because I was trying to do the minimum to migrate to the new random number service interface. (I've migrated more than a hundred modules already and have been doing the minimum so I could finish the migration faster, this is one of 3 modules left to be migrated) In your good summary, I noticed only error. The number of concurrent beginLuminosityBlock methods is independent of the number of streams. Actually, when there is a legacy module in the job, it will be limited to one. |
Fix test configuration. The random number service was being configured with a syntax deprecated in 2006 and for which support was dropped in March 2013. If run without this fix and a configuration exception would be thrown.
I added to this pull request a fix for the test configuration that I said was broken. It had been broken since March 2013. With the fix, it no longer dies with a configuration error. Now on my local machine it dies with the following error (presumably because I am not at cern) Connection on "sqlite_file:/afs/cern.ch/cms/CAF/CMSALCA/ALCA_MUONCALIB/DTCALIB/COMM09/ttrig/ttrig_ResidCorr_112281.db" cannot be established ( CORAL : "ConnectionPool::getSessionFromNewConnection" from "CORAL/Services/ConnectionService" ) I'm hoping you can approve this soon. As of this morning, this became the very last thing to complete this large migration. Even if you decide that you do not like my fix, you can rewrite in a subsequent pull request however you like or delete it. |
I understand you are busy with other genuinely important issues, but can you please approve this pull request soon? First of all, I think the modified code will work as well as the previous version. Second, if after having more time consider this, you want to revise this file again in some different way, please feel free to do so. If you want me to modify the pull request, please let me know how and I will gladly do it. This is the last small step of a migration that involved hundreds of files and several pull requests and is important to move the multithreaded development effort forward. |
+1 |
Hi @wddgit |
New Random Number Generator -- Migrate Calibration Code to New Random Service Interface
Migrate Calibration code to use the new interface
of the random number generator service designed to work
with the multithreaded Framework. The main interface
change is to require a StreamID or LuminosityBlockIndex
argument to the getEngine function. These objects are only
available during the event or beginLuminosityBlock method.
There is only one calibration module using the random
number interface. I do not know how to test this module
as nothing else in CMSSW references it.