-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in DAQ modules when exception is thrown in event processing (only affecting multithreading) - 75X #12200
Conversation
…g, with other thread already requests next event from source. Source can then open next LS (internally) and report event number in past LS to the FastMonitoringService. In this case it is possible to run preEndLumi triggered by exception later than source report, in which case exception check was (incorrectly) being skipped.
A new Pull Request was created by @smorovic (Srecko Morovic) for CMSSW_7_5_X. Fix race condition in DAQ modules when exception is thrown in event processing (only affecting multithreading) - 75X It involves the following packages: EventFilter/Utilities @mommsen, @cvuosalo, @cmsbuild, @emeschi, @slava77 can you please review it and eventually sign? Thanks. |
@cmsbuild please test |
The tests are being triggered in jenkins. |
-1 runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step3_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log ----- Begin Fatal Exception 30-Oct-2015 14:12:39 CET----------------------- An exception of category 'FatalRootError' occurred while [0] Calling EventProcessor::runToCompletion (which does almost everything after beginJob and before endJob) Additional Info: [a] Fatal Root Error: @SUB=TFile::Flush error flushing file step3_inDQM.root (Disk quota exceeded) ----- End Fatal Exception ------------------------------------------------- 1330.0 step1 runTheMatrix-results/1330.0_ZMM_13+ZMM_13+DIGIUP15+RECOUP15+HARVESTUP15/step1_ZMM_13+ZMM_13+DIGIUP15+RECOUP15+HARVESTUP15.log you can see the results of the tests here: |
@cmsbuild please test |
The tests are being triggered in jenkins. |
+1 |
This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_7_6_X is complete. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @Degano, @smuzaffar |
+1 |
Fix race condition in DAQ modules when exception is thrown in event processing (only affecting multithreading) - 75X
A rare race condition occurs when exception is thrown during processing of last few events in a file and LS. In this case, another thread can already request next event from the source. If next event belongs to the next LS, input source reports to the FastMonitoringService a total number of events in previous LS.
Normally in case of exception, we skip writing JSON stream output (catching exception action callback in the FastMonitoringService), and subsequently hltd assigns missing events as error events to close micro-merge of that LS. However, suppression was not happening after input source already reported the total number of events to the FastMonitoringService. This lead to incomplete micromerge for some streams. The problem is present only in multithreading, as in the single-threaded mode source can get a request for next event before exception on currently processed event is thrown (i.e. event requests are aborted and run/LS get closed).
In this update, JSON output is suppressed if exception has been thrown, regardless of input source report.