Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multithreading to runTheMatrix #7754

Merged
merged 4 commits into from Feb 24, 2015

Conversation

ktf
Copy link
Contributor

@ktf ktf commented Feb 16, 2015

Adds --nThreads option for both IB and WMAgent, percolating it
down to cmsDriver level.

Adds --nThreads option for both IB and WMAgent, percolating it
down to cmsDriver level.
@ktf
Copy link
Contributor Author

ktf commented Feb 16, 2015

@amaltaro @sextonkennedy

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @ktf (Giulio Eulisse) for CMSSW_7_4_X.

Add support for multithreading to runTheMatrix

It involves the following packages:

Configuration/PyReleaseValidation

@cmsbuild, @srimanob, @nclopezo, @boudoul, @franzoni can you please review it and eventually sign? Thanks.
@ghellwig, @Martin-Grunewald this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.
@Degano you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@cmsbuild
Copy link
Contributor

-1
Tested at: 3d17728
When I ran the RelVals I found an error in the following worklfows:
5.1 step1

runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log
----- Begin Fatal Exception 17-Feb-2015 10:54:45 CET-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Processing run: 1
   [1] Running path 'validation_step'
   [2] Calling endRun for module BasicHepMCValidation/'basicHepMCValidation'
   Additional Info:
      [a] Fatal Root Error: @SUB=TH1F::Add
Attempt to add histograms with different axis limits
----- End Fatal Exception -------------------------------------------------

135.4 step1

runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log
----- Begin Fatal Exception 17-Feb-2015 11:00:42 CET-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Processing run: 1
   [1] Running path 'validation_step'
   [2] Calling endRun for module BasicHepMCValidation/'basicHepMCValidation'
   Additional Info:
      [a] Fatal Root Error: @SUB=TH1F::Add
Attempt to add histograms with different axis limits
----- End Fatal Exception -------------------------------------------------

1330.0 step3

runTheMatrix-results/1330.0_ZMM_13+ZMM_13+DIGIUP15+RECOUP15+HARVESTUP15+MINIAODMCUP15/step3_ZMM_13+ZMM_13+DIGIUP15+RECOUP15+HARVESTUP15+MINIAODMCUP15.log
----- Begin Fatal Exception 17-Feb-2015 12:02:53 CET-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Processing run: 1
   [1] Running path 'validation_step'
   [2] Calling endRun for module BasicHepMCValidation/'basicHepMCValidation'
   Additional Info:
      [a] Fatal Root Error: @SUB=TH1F::Add
Attempt to add histograms with different axis limits
----- End Fatal Exception -------------------------------------------------

9.0 step3

runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step3_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log
----- Begin Fatal Exception 17-Feb-2015 11:41:04 CET-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Processing run: 1
   [1] Running path 'validation_step'
   [2] Calling endRun for module BasicHepMCValidation/'basicHepMCValidation'
   Additional Info:
      [a] Fatal Root Error: @SUB=TH1F::Add
Attempt to add histograms with different axis limits
----- End Fatal Exception -------------------------------------------------

25.0 step3

runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECO+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECO+HARVEST+ALCATT.log
----- Begin Fatal Exception 17-Feb-2015 11:41:04 CET-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Processing run: 1
   [1] Running path 'validation_step'
   [2] Calling endRun for module BasicHepMCValidation/'basicHepMCValidation'
   Additional Info:
      [a] Fatal Root Error: @SUB=TH1F::Add
Attempt to add histograms with different axis limits
----- End Fatal Exception -------------------------------------------------

1306.0 step3

runTheMatrix-results/1306.0_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15+MINIAODMCUP15/step3_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15+MINIAODMCUP15.log
----- Begin Fatal Exception 17-Feb-2015 11:54:02 CET-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Processing run: 1
   [1] Running path 'validation_step'
   [2] Calling endRun for module BasicHepMCValidation/'basicHepMCValidation'
   Additional Info:
      [a] Fatal Root Error: @SUB=TH1F::Add
Attempt to add histograms with different axis limits
----- End Fatal Exception -------------------------------------------------

25202.0 step3

runTheMatrix-results/25202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15+MINIAODMCUP15/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15+MINIAODMCUP15.log

you can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7754/2641/summary.html

@Dr15Jones
Copy link
Contributor

I'm betting the problem histogram is status1ShortLived
http://cmslxr.fnal.gov/source/Validation/EventGenerator/plugins/BasicHepMCValidation.cc?v=CMSSW_7_4_X_2015-02-13-0200#0182

It is possible that one or more streams never get any entries in the histogram since the fill is conditional
http://cmslxr.fnal.gov/source/Validation/EventGenerator/plugins/BasicHepMCValidation.cc?v=CMSSW_7_4_X_2015-02-13-0200#0182

If so, then this isn't strictly a threading problem since merging files from two different jobs could have the same problem.

@ktf
Copy link
Contributor Author

ktf commented Feb 17, 2015

As discussed, it looks like this was running mult-threaded by accident. I changed it to single threaded by default.

@franzoni
Copy link

@cmsbuild can you please test

@franzoni
Copy link

+1

hello @ktf @sextonkennedy,

the proposed modifications look good to me. if no --nThreads is set, the standard single-threaded behaviour is ok. If I specify --nThreads 2, the .py configuration reflects that, still I could only see one thread running (using top). Is this expected ?

I could perform no test of injection, those cannot be done with an IB (a prerelease is necessary).

I'd like that showWorkFlows displayed "--nThreads" in the cmsDriver commands when using -ne. Could you please look into that ?

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_4_X IBs unless changes or unless it breaks tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @Degano, @ktf, @smuzaffar

@ktf
Copy link
Contributor Author

ktf commented Feb 24, 2015

@cmsbuild please test

@franzoni, it is not so smart yet... ;)

Ciao,
Giulio

On 24 Feb 2015, at 20:11, Giovanni Franzoni wrote:

@cmsbuild can you please test


Reply to this email directly or view it on GitHub:
#7754 (comment)

@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_4_X IBs unless changes or unless it breaks tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @Degano, @ktf, @smuzaffar

davidlange6 added a commit that referenced this pull request Feb 24, 2015
Add support for multithreading to runTheMatrix
@davidlange6 davidlange6 merged commit f418ee6 into cms-sw:CMSSW_7_4_X Feb 24, 2015
@cmsbuild
Copy link
Contributor

@ktf ktf deleted the mt-rm branch February 25, 2015 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants