Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle cross Path exceptions #16065

Merged

Conversation

Dr15Jones
Copy link
Contributor

Handle the case where a module on one Path depends on data from a module on another Path but the other module throws an exception.

In order to handle the case where a module does not put a data
product, e.g. because of an exception, we still need to release
any modules waiting on the data product. We accomplish this by
having the PuttableProductResolver attach a waiting task to the
module to know when it has completed.
If all paths containing a module skip that module, be sure that
any tasks waiting on that module to complete are released.
Added a unit test for the case where a module on a path depends
on data from a module on another path and the other module throws
an exception.
Removed an unused function and an unused parameter from a function.
@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2016

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_8_1_DEVEL_X.

It involves the following packages:

FWCore/Framework

@cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit, @wmtan this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/15485/console

@Dr15Jones
Copy link
Contributor Author

@smuzaffar this should fix the last two RelVal failures which are timing out.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2016

-1

Tested at: 575bdf6

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-16065/15485/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests RelVals AddOn

  • Unit Tests:

I found errors in the following unit tests:

---> test TestFWCoreFrameworkGlobalStreamOne had ERRORS
---> test testRecoMETMETProducers had ERRORS

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
5.1 step1

runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log
135.4 step1
runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log
  • AddOn:

I found errors in the following addon tests:

cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@Fake,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Sun Oct 2 00:55:00 2016-date Sun Oct 2 00:53:31 2016 s - exit: 20736
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@relval25ns,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_25ns --magField 38T_PostLS1 : FAILED - time: date Sun Oct 2 00:54:28 2016-date Sun Oct 2 00:53:47 2016 s - exit: 23040
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@relval2016,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_2016 --magField 38T_PostLS1 : FAILED - time: date Sun Oct 2 01:01:41 2016-date Sun Oct 2 00:53:54 2016 s - exit: 23040

@Dr15Jones
Copy link
Contributor Author

both errors appear to be transient. The failures in the addOnTests are ones in the IB.

@Dr15Jones
Copy link
Contributor Author

+1

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2016

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2016

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_DEVEL_X IBs after it passes the integration tests.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2016

-1

Tested at: 575bdf6

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-16065/15487/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests RelVals AddOn

  • Unit Tests:

I found errors in the following unit tests:

---> test testRecoMETMETProducers had ERRORS

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
5.1 step1

runTheMatrix-results/5.1_TTbar+TTbarFS+HARVESTFS/step1_TTbar+TTbarFS+HARVESTFS.log
135.4 step1
runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log
  • AddOn:

I found errors in the following addon tests:

cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi --conditions auto:run1_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@Fake,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - time: date Sun Oct 2 07:33:05 2016-date Sun Oct 2 07:31:12 2016 s - exit: 20736
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@relval25ns,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_25ns --magField 38T_PostLS1 : FAILED - time: date Sun Oct 2 07:32:06 2016-date Sun Oct 2 07:31:25 2016 s - exit: 23040
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc --fast -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,EI,HLT:@relval2016,VALIDATION --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_2016 --magField 38T_PostLS1 : FAILED - time: date Sun Oct 2 07:39:36 2016-date Sun Oct 2 07:31:32 2016 s - exit: 23040

@Dr15Jones
Copy link
Contributor Author

The unit test is also failing in the IB.

@Dr15Jones
Copy link
Contributor Author

All the failing RelVal and addOnTests are also failing in the _DEVEL_X IB. This pull request is not meant to fix those particular workflows.

@Dr15Jones
Copy link
Contributor Author

@smuzaffar please merge this into the _DEVEL release.

@smuzaffar smuzaffar merged commit 031cee8 into cms-sw:CMSSW_8_1_DEVEL_X Oct 2, 2016
@Dr15Jones Dr15Jones deleted the betterExceptionHandling branch November 9, 2016 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants