Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start EndPaths at the same time as we start Paths #21035

Merged
merged 6 commits into from Oct 30, 2017

Conversation

Dr15Jones
Copy link
Contributor

Starting EndPaths as the same time as Paths gives the framework more concurrent tasks to use.
If a module on the EndPath depends on the present process' TriggerResults, such as an OutputModule with a SelectEvents configuration, the regular data dependency system will guarantee the proper run order of the modules.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-21035/1667

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

FWCore/Framework
FWCore/Integration

@cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit this is something you requested to watch as well.
@davidlange6, @slava77 you are the release manager for this.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 26, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/24000/console Started: 2017/10/27 00:13

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-21035/24000/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 70 differences found in the comparisons
  • DQMHistoTests: Total files compared: 26
  • DQMHistoTests: Total histograms compared: 2838442
  • DQMHistoTests: Total failures: 141
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2838130
  • DQMHistoTests: Total skipped: 171
  • DQMHistoTests: Total Missing objects: 0
  • Checked 107 log files, 10 edm output root files, 26 DQM output files

@Dr15Jones
Copy link
Contributor Author

The differences in the comparisons are strictly from the MessageLogger. This is to be expected since more modules are potentially run (since stuff on EndPath can start) before the ErrorLogger module gets run and therefore more Errors/Warnings can be captured.

@Dr15Jones
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

Pull request #21035 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again.

@Dr15Jones
Copy link
Contributor Author

@davidlange6 wrote

Indeed there are changes in the message logger histograms. Are these improvements?

I'd call them a side step? Because the LogErrorHarvester is reading from a global while multiple threads are writing to it, exactly what it records can be different for each run of the same job. The latest change to LogErrorHarvester (which was merged a while ago) makes sure that the harvester at least waits for the specified modules to finish before reading the log. However, if other modules not on the 'wait' list write to the logger, there is no guarantee on if their messages will also get harvested. This is the main reason I've never been happy with the LogErrorHarvester as a concept.

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 29, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/24034/console Started: 2017/10/29 15:47

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-21035/24034/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 70 differences found in the comparisons
  • DQMHistoTests: Total files compared: 26
  • DQMHistoTests: Total histograms compared: 2838442
  • DQMHistoTests: Total failures: 147
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2838123
  • DQMHistoTests: Total skipped: 171
  • DQMHistoTests: Total Missing objects: 0
  • Checked 107 log files, 10 edm output root files, 26 DQM output files

@Dr15Jones
Copy link
Contributor Author

Dr15Jones commented Oct 30, 2017

@slava77
I looked at the reported RECO changes.

The following only had changes in the number of reports from MessageLogger:
workflows: 4.53, 20434.0, 25202.0, 1000.0, 4.22, 1306.0, 25.0, 1330.0, 20034.0

The following saw the number of entries in the following be different:
/DQMData/Run 1/PixelPhase1V/Run summary/TrackingParticle
workflows: 10024, 11624.0, 10824.0, 10224.0

The following saw a 1-3 tau items shift phi and/or eta
/DQMData/*/HLT/Run summary/TAU/Inclusive/L1
/DQMData/*/L1T/Run summary/L1TStage2CaloLayer2/Isolated-Tau
workflows: 136.788, 10224.0, 136.731

workflow 136.788, in addition to the above, also saw a change in
/DQMData/Run 297557/L1T/Run summary/L1TStage2CaloLayer2/NonIsolated-Tau

If this pull request had a bug, I would have expected catastophic failures (e.g. whole modules not being run in correct order) not a few bins changing in histograms. This pull request can change the order in which non-dependent modules would be run. So it is possible that there is an unexpected side-effect of module run order related to the changed histograms. One item could be random numbers. If a module used random numbers but did not re-seed the generator at the beginning of each event then it would be dependent on some other module running ahead of it that did set the seed. Another item would be memory would shift around so if we had an out of memory write it could affect a different memory address with this pull request.

@Dr15Jones
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar (and backports should be raised in the release meeting by the corresponding L2)

@slava77
Copy link
Contributor

slava77 commented Oct 30, 2017 via email

@Dr15Jones
Copy link
Contributor Author

@Slava I found a memory problem in SiPixelPhase1TrackingParticleV which could explain the randomness

@davidlange6
Copy link
Contributor

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants