Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a flag to prevent storage of LHEXMLStringProduct #22935

Merged
merged 2 commits into from Apr 12, 2018

Conversation

fabiocos
Copy link
Contributor

LHEXMLStringProduct is no more really used in production, and it is creating memory issues keeping on hold production in 93X. This PR adds a boolean flag as untracked parameter to activate/deactivate the storage of the xml output into the product, setting the default to "false" (i.e. do not store).

With the opportunity I fixed also the ExternalLHEAsciiDumper analyzer that was supposed to extract the xml file out of the edm root output, and that was no more working.

Both the "false" and "true" option have been tested in wf 512 (10 events), with a visible reduction of the final file size (84kB vs 112kB), and when the option was true the dumper was used to extract the xml file (216 kB after de-compression), proving that is again functional.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@fabiocos
Copy link
Contributor Author

@Dr15Jones @bendavid @franzoni @vlimant I assume that this is what we want to get rid of the memory problem in 93X (at least part of it) without losing the possibility to optionally save and extract the xml file

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-22935/4318

Code check has found code style and quality issues which could be resolved by applying a patch in https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-22935/4318/git-diff.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-22935/4318/git-diff.patch | patch -p1

You can run scram build code-checks to apply code checks directly

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@Dr15Jones
Copy link
Contributor

@Dr15Jones @bendavid @franzoni @vlimant I assume that this is what we want to get rid of the memory problem in 93X (at least part of it) without losing the possibility to optionally save and extract the xml file

From my point, it solves the file merge problem. It is unknown to me if this solves a memory problem as well.

@fabiocos
Copy link
Contributor Author

@Dr15Jones to be verified, I agree, but it is something that anyway we want to have

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fabiocos (Fabio Cossutti) for master.

It involves the following packages:

GeneratorInterface/LHEInterface

@cmsbuild, @efeyazgan, @perrozzi can you please review it and eventually sign? Thanks.
@alberto-sanchez, @agrohsje, @mkirsano this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@fabiocos
Copy link
Contributor Author

please test workflow 512

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 11, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27438/console Started: 2018/04/11 17:22

@cmsbuild
Copy link
Contributor

-1

Tested at: 1b9b2b8

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-22935/27438/summary.html

I found follow errors while testing this PR

Failed tests: AddOn

  • AddOn:

I found errors in the following addon tests:

cmsDriver.py RelVal -s L1REPACK:Full --data --scenario=pp -n 10 --conditions auto:run2_hlt_Fake2 --relval 9000,50 --datatier "RAW" --eventcontent RAW --customise=HLTrigger/Configuration/CustomConfigs.L1T --era Run2_2016 --fileout file:RelVal_Raw_Fake2_DATA.root --filein /store/data/Run2016B/JetHT/RAW/v1/000/272/762/00000/C666CDE2-E013-E611-B15A-02163E011DBE.root : FAILED - time: date Wed Apr 11 18:40:16 2018-date Wed Apr 11 18:35:14 2018 s - exit: 23552
cmsRun /cvmfs/cms-ib.cern.ch/nweek-02519/slc6_amd64_gcc630/cms/cmssw-patch/CMSSW_10_2_X_2018-04-11-1100/src/HLTrigger/Configuration/test/OnLine_HLT_Fake2.py realData=True globalTag=@ inputFiles=@ : FAILED - time: date Wed Apr 11 18:40:16 2018-date Wed Apr 11 18:35:14 2018 s - exit: 21504
cmsDriver.py RelVal -s HLT:Fake2,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_Fake2 --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016 --processName=HLTRECO --filein file:RelVal_Raw_Fake2_DATA.root --fileout file:RelVal_Raw_Fake2_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 11 18:40:16 2018-date Wed Apr 11 18:35:14 2018 s - exit: 21504
cmsDriver.py RelVal -s L1REPACK:Full --data --scenario=pp -n 10 --conditions auto:run2_hlt_PRef --relval 9000,50 --datatier "RAW" --customise=HLTrigger/Configuration/CustomConfigs.L1T --era Run2_2018 --eventcontent RAW --fileout file:RelVal_Raw_PRef_DATA.root --filein /store/data/Run2017A/HLTPhysics4/RAW/v1/000/295/606/00000/36DE5E0A-3645-E711-8FA1-02163E01A43B.root : FAILED - time: date Wed Apr 11 18:40:33 2018-date Wed Apr 11 18:35:17 2018 s - exit: 23552
cmsRun /cvmfs/cms-ib.cern.ch/nweek-02519/slc6_amd64_gcc630/cms/cmssw-patch/CMSSW_10_2_X_2018-04-11-1100/src/HLTrigger/Configuration/test/OnLine_HLT_PRef.py realData=True globalTag=@ inputFiles=@ : FAILED - time: date Wed Apr 11 18:40:33 2018-date Wed Apr 11 18:35:17 2018 s - exit: 21504
cmsDriver.py RelVal -s HLT:PRef,RAW2DIGI,L1Reco,RECO --data --scenario=pp -n 10 --conditions auto:run2_data_PRef --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2018 --processName=HLTRECO --filein file:RelVal_Raw_PRef_DATA.root --fileout file:RelVal_Raw_PRef_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 11 18:40:33 2018-date Wed Apr 11 18:35:17 2018 s - exit: 21504
cmsDriver.py RelVal -s L1REPACK:Full2015Data --data --scenario=HeavyIons -n 10 --conditions auto:run2_hlt_HIon --relval 9000,50 --datatier "RAW" --eventcontent RAW --customise=HLTrigger/Configuration/CustomConfigs.L1T --era Run2_2016,Run2_HI --fileout file:RelVal_Raw_HIon_DATA.root --filein /store/hidata/HIRun2015/HIHardProbes/RAW-RECO/HighPtJet-PromptReco-v1/000/263/689/00000/1802CD9A-DDB8-E511-9CF9-02163E0138CA.root : FAILED - time: date Wed Apr 11 18:40:32 2018-date Wed Apr 11 18:35:20 2018 s - exit: 23552
cmsRun /cvmfs/cms-ib.cern.ch/nweek-02519/slc6_amd64_gcc630/cms/cmssw-patch/CMSSW_10_2_X_2018-04-11-1100/src/HLTrigger/Configuration/test/OnLine_HLT_HIon.py realData=True globalTag=@ inputFiles=@ : FAILED - time: date Wed Apr 11 18:40:32 2018-date Wed Apr 11 18:35:20 2018 s - exit: 21504
cmsDriver.py RelVal -s HLT:HIon,RAW2DIGI,L1Reco,RECO --data --scenario=HeavyIons -n 10 --conditions auto:run2_data_HIon --relval 9000,50 --datatier "RAW-HLT-RECO" --eventcontent FEVTDEBUGHLT --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --era Run2_2016,Run2_HI --processName=HLTRECO --filein file:RelVal_Raw_HIon_DATA.root --fileout file:RelVal_Raw_HIon_DATA_HLT_RECO.root : FAILED - time: date Wed Apr 11 18:40:32 2018-date Wed Apr 11 18:35:20 2018 s - exit: 21504

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 11, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27444/console Started: 2018/04/11 22:12

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-22935/27444/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 29
  • DQMHistoTests: Total histograms compared: 2505375
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2505198
  • DQMHistoTests: Total skipped: 176
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.740000000005 KiB( 23 files compared)
  • Checked 119 log files, 9 edm output root files, 29 DQM output files

@fabiocos
Copy link
Contributor Author

@perrozzi @efeyazgan this change, previously discussed, is meant to be back-ported into 93X to verify to which extent it solves the problems found in production. Please check and sign or comment

@perrozzi
Copy link
Contributor

thanks Fabio. So, as far as I understand LHEXMLStringProduct was storing the LHE file as such and was used by the ExternalLHEAsciiDumper to dump it in case of need.
Always according to my understanding there is another copy of the LHE content denoted as "c++ translation" through LHEEventProduct and LHERunInfoProduct that saves all the important info but then there is no direct way to retrieve the original LHE file as it was.
I guess that one could always write another dumper and would just be a matter of formatting it.
I personally used the LHEXMLStringProduct in the past but only rarely, and maybe we can think of removing it if needed to solve merge (and memory?) issues.

@fabiocos
Copy link
Contributor Author

@perrozzi in the present implementation nothing changes from the point of view of file structure (the product stays there) and provenance (I added an untracked parameter), but by default the product will be empty, so taking very little space. This should hopefully solve at least part of the problems keeping the production on hold (@kpedro88 FYI)

The LHEXML product was meant to store the XML file as we were doing in the past, but I understand this is no more the default since quite sometime. The C++ translation is what the CMS FW is practically using in its workflow, the xml can be read to build it with the other code in the package, here it is done on the fly.

I would suggest to move forward with this for the short term. If then the GEN group has plans for a deeper revision of the code he is welcome to present them. But be careful when you modify the file content, this should not happen in the middle of a production campaign.

@perrozzi
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor Author

+1

@cmsbuild cmsbuild merged commit 85668c8 into cms-sw:master Apr 12, 2018
fabiocos added a commit to fabiocos/cmssw that referenced this pull request Apr 12, 2018
cmsbuild added a commit that referenced this pull request Apr 13, 2018
Backport of #22935 to 93X (Add a flag to prevent storage of LHEXMLStringProduct)
@fabiocos fabiocos deleted the fc-fixLHEProducer branch April 14, 2018 12:46
perrozzi pushed a commit to perrozzi/cmssw that referenced this pull request Apr 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants