Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an exception if a module requiring synchronization on lumi boundaries is used when concurrent lumis are enabled #35326

Merged
merged 4 commits into from Oct 5, 2021

Conversation

makortel
Copy link
Contributor

PR description:

Following the completion of #25090. This PR simply turns an existing warning to an exception. It also addresses framework unit tests that would fail because of the exception. It is possible that some unit tests in other packages would fail too (in which case I'll address those separately).

Resolves cms-sw/framework-team#121

PR validation:

Framework unit tests run.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35326/25367

  • This PR adds an extra 68KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

  • FWCore/Concurrency (core)
  • FWCore/Framework (core)
  • FWCore/Integration (core)

@makortel, @smuzaffar, @cmsbuild, @Dr15Jones can you please review it and eventually sign? Thanks.
@wddgit this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

test parameters:

  • enable_tests = threading

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c4b1fc/18727/summary.html
COMMIT: 0910e65
CMSSW: CMSSW_12_1_X_2021-09-17-1100/slc7_amd64_gcc900
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35326/18727/install.sh to create a dev area with all the needed externals and cmssw changes.

AddOn Tests

----- Begin Fatal Exception 18-Sep-2021 03:00:22 CEST-----------------------
An exception of category 'ModulesSynchingOnLumis' occurred while
   [0] Calling beginJob
Exception Message:
The framework is configured to use at least two streams, but the following modules
require synchronizing on LuminosityBlock boundaries:
  Pythia8GeneratorFilter generator

The situation can be fixed by either
 * modifying the modules to support concurrent LuminosityBlocks (preferred), or
 * setting 'process.options.numberOfConcurrentLuminosityBlocks = 1' in the configuration file
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 18-Sep-2021 03:00:30 CEST-----------------------
An exception of category 'FileOpenError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
   [2] Calling RootInputFileSequence::initTheFile()
   [3] Calling StorageFactory::open()
   [4] Calling File::sysopen()
Exception Message:
Failed to open the file 'RelVal_Raw_Fake_MC.root'
   Additional Info:
      [a] Input file file:RelVal_Raw_Fake_MC.root could not be opened.
      [b] open() failed with system error 'No such file or directory' (error code 2)
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 18-Sep-2021 03:01:12 CEST-----------------------
An exception of category 'FileOpenError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
   [2] Calling RootInputFileSequence::initTheFile()
   [3] Calling StorageFactory::open()
   [4] Calling File::sysopen()
Exception Message:
Failed to open the file 'RelVal_Raw_Fake_MC.root'
   Additional Info:
      [a] Input file file:RelVal_Raw_Fake_MC.root could not be opened.
      [b] open() failed with system error 'No such file or directory' (error code 2)
----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...

Comparison Summary

The workflows 140.53 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 1305 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 3000833
  • DQMHistoTests: Total failures: 3682
  • DQMHistoTests: Total nulls: 20
  • DQMHistoTests: Total successes: 2997109
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 45.699 KiB( 38 files compared)
  • DQMHistoSizes: changed ( 140.53 ): 44.531 KiB Hcal/DigiRunHarvesting
  • DQMHistoSizes: changed ( 140.53 ): 1.172 KiB RPC/DCSInfo
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

makortel commented Oct 4, 2021

@cmsbuild, please abort

@makortel
Copy link
Contributor Author

makortel commented Oct 4, 2021

Nope, not like this

Co-authored-by: Chris Jones <chrisdjones15@gmail.com>
@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35326/25742

@makortel
Copy link
Contributor Author

makortel commented Oct 4, 2021

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 4, 2021

Pull request #35326 was updated. @makortel, @smuzaffar, @Dr15Jones can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c4b1fc/19385/summary.html
COMMIT: 9ab1bbf
CMSSW: CMSSW_12_1_X_2021-10-04-1300/slc7_amd64_gcc900
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35326/19385/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 40
  • DQMHistoTests: Total histograms compared: 3219394
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3219372
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 39 files compared)
  • Checked 169 log files, 37 edm output root files, 40 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

makortel commented Oct 5, 2021

+1

@perrotta @qliphy It would be good to exercise this PR in the IBs for at least one round (mostly to really check that everything still works) before cutting the pre4. It is also ok for me to merge this PR after pre4.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 5, 2021

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Oct 5, 2021

+1

  • Try to exercize it in the 1100 IB, and see if any exception is thrown

@cmsbuild cmsbuild merged commit dd549ca into cms-sw:master Oct 5, 2021
@makortel makortel deleted the throwLumiSynchronize branch October 5, 2021 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change LogSystem to exception when trying to run a job with >= 2 concurrent lumis with an incompatible module
5 participants