-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQM merge failure in CMSSW_11_0_0_pre3 #27528
Comments
A new Issue was created by @fabiocos Fabio Cossutti. @davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign dqm for the problem itself |
New categories assigned: dqm @jfernan2,@andrius-k,@schneiml,@fioriNTU,@kmaeshima you have been requested to review this Pull request/Issue and eventually sign? Thanks |
assign pdmv for the extension of release tests |
repeating the test with wf 136.85 in the recent IB, using the addition by @Dr15Jones #27473 👍
|
This histogram appears to be added in #27330 by @hbecerri in https://github.com/cms-sw/cmssw/blame/master/DQM/TrackingMonitor/interface/TrackAnalyzer.h#L405 , defined in https://github.com/cms-sw/cmssw/blob/master/DQM/TrackingMonitor/src/TrackAnalyzer.cc#L863 |
I wonder whether the problem is not caused by https://github.com/cms-sw/cmssw/blob/master/DQM/TrackingMonitor/src/TrackAnalyzer.cc#L866 where the axis range is dynamically extended |
indeed, removing all the instances of |
Oh sorry, I see you already created it...git hub had not refreshed the status. |
@fabiocos @jfernan2 @mtosi @schneiml just to add a comment, the Tracker i believe is the main user of the SetCanExtend flag, there the use case is to have time profiles defined with a definite bin number but with the maximum value that get adjusted automatically in the case the run gets too long. I have used privately this functionality for years, and I can assure that the CanExtend+hadd usually does the right thing, it was one of the most robust (and useful) features of root. In this case I see that the extendable axes are "All", while usually we need only the x-axis to be extendable. I have no idea in which case the extension of the y axis of an histogram can be effective, but maybe is this feature that confuses root. Sorry for the long comment, but I believe that informing ROOT developers about this issue, instead of changing DQM code, would be better. |
The exception is from CMSSW, not from ROOT. In CMSSW we 'trap' ROOT Error and Warning messages and turn them into exceptions. This is because way to many of the ROOT messages are actually fatal but we do not discover the problems until way too late (e.g. missing dictionaries are just a warning message but cause data to not be stored). This is done here https://github.com/cms-sw/cmssw/blob/master/FWCore/Services/plugins/InitRootHandlers.cc#L179 We can add special handling for certain message, however, we really try to avoid this since we have found that ROOT error and warning messages are almost invariable saying there is a serious problem happening that a human is supposed to do something about (i.e. ROOT really is setup to work interactively and not super great for batch processing). |
#27535 solves this issue (tested with the workflow 136.85 split into two parts) |
as closing remark I think we need to extend test 137.8 so as to probe also DQMIo file merge (althouhg it would not detect this specific problem) |
The PdmV team reports a massive failure of the validation in CMSSW_11_0_0_pre3 in the DQMIO merge step, with errors like:
seen for instance in
https://cms-unified.web.cern.ch/cms-unified/report/muahmad_RVCMSSW_11_0_0_pre3Higgs200ChargedTaus_13_190711_065022_9820
or
https://cms-unified.web.cern.ch/cms-unified/report/chayanit_RVCMSSW_11_0_0_pre3RunCharmonium2018D__RelVal_2018D_190710_083031_7155
(but the problem is general).
In the set of IB workflows we have now test 137.8 which performs DQM harvesting on two separate files from two periods, without failing. I have also tested the DQMIO merge (missing, I would add it), but it does not fail. But if I try to split 136.85 in two pieces of 50 events each, and I try the merge, I get the failure.
This looks a show-stopper for a meaningful validation of the release, and should be solved asap.
The text was updated successfully, but these errors were encountered: