Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support concurrent LuminosityBlocks in HLTrigReport #27961

Merged
merged 2 commits into from Sep 13, 2019

Conversation

Dr15Jones
Copy link
Contributor

PR description:

Switched to using a LuminosityBlockCache to allow HLTrigReport to not be sensitive to the ordering of completion of a LuminosityBlock. This allows concurrent LuminosityBlocks to be processed properly.

Support for concurrent Runs would require further work.

Added unit tests to be sure the behavior remains the same.

This change is needed to allow workflows containing this module to be able to use concurrent LuminosityBlocks. This includes the new workflow which is testing such concurrency, 1631.181.

PR validation:

The new unit test passes.

Switched to using a LuminosityBlockCache to allow HLTrigReport
to not be sensitive to the ordering of completion of a LuminosityBlock.
This allows concurrent LuminosityBlocks to be processed properly.

Support for concurrent Runs would require further work.
@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-27961/11834

  • This PR adds an extra 24KB to repository

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 10, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/2451/console Started: 2019/09/10 17:36

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

HLTrigger/HLTanalyzers

@Martin-Grunewald, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks.
@Martin-Grunewald this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b0aee8/2451/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2957224
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2956882
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 145 log files, 15 edm output root files, 34 DQM output files

@fabiocos
Copy link
Contributor

@Dr15Jones running 1361.181 with "-t 4" I still see:

%MSG-s ModulesSynchingOnLumis:  AfterModConstruction 11-Sep-2019 10:50:36 CEST  pre-events
The following modules require synchronizing on LuminosityBlock boundaries:
  HLTrigReport hltTrigReport
%MSG

Do you confirm that this is the expected behaviour?

@Dr15Jones
Copy link
Contributor Author

@fabiocos When I run a modified version of 1361.18 (to have 4 threads and 2 concurrent lumis) under CMSSW_11_0_0_pre7 without this pull request I see that message. If I run the exact same configuration file but with my changes in a CMSSW_11_0_0_pre7 build area, the message does not appear.

from test_hltrigreport_base_cfg import process

process.hlTrigReport.resetBy = "run"
process.hlTrigReport.reportBy = "job"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it really make sense to reset every run and report only at the end of the job ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would give you the last run of the job.
It was what the original code supported so I did the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... IIRC the possibility was just an accident in the original code, not an intended feature.
Of course, having it doesn't harm, as long as it didn't require extra work to reimplement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, having it doesn't harm, as long as it didn't require extra work to reimplement.

It actually does require additional logic in order to implement. But it is there and working.

from test_hltrigreport_base_cfg import process

process.hlTrigReport.resetBy = "lumi"
process.hlTrigReport.reportBy = "job"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it really make sense to reset every lumi and report only at the end of the job ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would give you the last lumi of the job.
Again, it was what the original coe supported so I kept the same behavior.

@fabiocos
Copy link
Contributor

@Dr15Jones my questions is due to the fact that I did a standalone test with 1361.181 -t 4 and did not see what you observe, while I was expecting. I am repeating the test.

@Dr15Jones
Copy link
Contributor Author

@fabiocos I appreciate that you did independent testing, I don't know why you do not see the same results as I do. I also added the Tracer service to a very simple job using the HLTrigReport and it also shows that concurrent LuminosityBlocks are working.

@fabiocos
Copy link
Contributor

@Dr15Jones may be something wrong in my setup, I'll try oncve more, then we can see in the IB teh output of the new test wfs.

@Martin-Grunewald could you please check the PR?

@Dr15Jones
Copy link
Contributor Author

@fabiocos I did

scram project CMSSW_11_0_X_2019-09-11-2300
cd CMSSW_11_0_X_2019-09-11-2300/src
cmsenv
git cms-addpkg HLTrigger/HLTanalyzers
git checkout concurrentLumiHLTrigReport
scram b -j 12
cd ..
mkdir matrix
cd matrix/
runTheMatrix.py -l 1361.181 -t 4

Then looking at the log file for step2, I get

%MSG-i ThreadStreamSetup:  (NoModuleName) 12-Sep-2019 09:15:19 CDT pre-events
setting # threads 4
setting # streams 4
%MSG
12-Sep-2019 09:15:35 CDT  Initiating request to open file file:step1.root
12-Sep-2019 09:15:38 CDT  Successfully opened file file:step1.root
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 12-Sep-2019 09:16:06.984 CDT
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 1 at 12-Sep-2019 09:16:06.985 CDT
...

I.e. no message about HLTrigReport prohibiting concurrent LuminosityBlocks.

@fabiocos
Copy link
Contributor

@Dr15Jones I did something wrong in my initial test, I confirm your results.

@Dr15Jones
Copy link
Contributor Author

@Dr15Jones I did something wrong in my initial test, I confirm your results.

Oh good. You had me worried.

@Martin-Grunewald
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants