New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQM: Merge MonitorElement and ConcurrentMonitorElement #28092
Conversation
The code-checks are being triggered in jenkins. |
1 similar comment
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28092/12080
|
A new Pull Request was created by @schneiml (Marcel Schneider) for master. It involves the following packages: DQMServices/Core @smuzaffar, @andrius-k, @Dr15Jones, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test just to see if anything blows up so far. |
The tests are being triggered in jenkins. |
-1 Tested at: 1125474 You can see the results of the tests here: I found follow errors while testing this PR Failed tests: Build
I found compilation warning when building: See details on the summary page. |
Comparison not run due to Build errors (RelVals and Igprof tests were also skipped) |
@schneiml |
@rovere Ciao Marco, yes, we'll need to have this discussion in a bit broader scope at some point. However, I think it is fine to remove the DQMGUI support on DQMStore's side, since we changed/need to change the inner workings of DQM dramatically on CMSSW side (first threaded mode, in the future decentralized DQMStore). IIRC DQMGUI currently uses a CMSSW7 version of the DQMStore, so it has missed a lot of changes already (does the version it uses even have threaded mode?). Even if we stick with the old DQMGUI, it is probably perfectly fine to maintain it's version of the DQMStore separately. After all, all that we care about is file IO (and network IO, which is arguably a special case of file IO), and the file formats need to be stable for other reasons anyways. Also, no changes to the file formats are planned at the moment (except maybe minor changes to DQMIO). |
@gennai @Sam-Harper following the comment #28092 (comment) could you please address the question of @schneiml or point to someone able to do that? The point raised by @fwyzard is relevant for the integration of this code, and as 11_0_X needs to be used for productions we cannot afford to use it just to validate choices a posteriori. |
so I'm following up with our experts and hope to have a timing recipe for you soon. In general though @schneiml , we would prefer if it could be run on our dedicated timing machines vocms003, vocms004 as this allows us an apples to apples comparison. Would this be a problem for you do to so. If not, could you email me your cern username, I'll add you to the people who can access then. |
@Sam-Harper thank you very much for "activating" your experts so quickly! However, since this measurement has never be done in the context of the DQM core team, and moving on with this PR is quite urgent, it would be possible to have this test ran by the HLT expert himself? I do not know how long it can take, but I am sure you can take 1/10 of the time with respect to us. Do you think it is feasible? |
@fioriNTU , this is not normally our policy to do this for developers otherwise we quickly get overwhelmed. However in this particular case we are also having to commission the timing in 11_0_X as a reference (still iterating with experts), and therefore is it not any significant extra work to also run the timing for this PR at the same time. So we will do so. |
So incase folks are wondering what happened, the test was run, realised it was inadequate for this type of change, re-run again. Those results have been looked at and furtther tests were needed which are on going. |
@Sam-Harper thanks for the update! It looks like this sill merges cleanly after #28297, so we might get away without another rebase (unless #28247 goes first). |
btw the first results, I got a 10ms increase but it looks like it was just a random upset as re-running it multiple times we converged on the above result. |
Thank you very much @Sam-Harper ! |
@Sam-Harper @fwyzard @Martin-Grunewald are there other concerns/comments by HLT experts? |
From my side, I'm happy with the result of the test Sam has done. |
+1 |
Thanks @Sam-Harper ! Btw, if you have reasonably easy instructions how to run these things now I am still interested (on the longer term), to do some profiling and see how much impact DQM has at all in these configurations. In Offline, it is typically very little, which is why I think that many speed hacks in the DQM code are out of place... |
+1 |
@christopheralanwest @tlampen @rekovic please check this and comment in case for a possible further iteration, the changes in your are looks consequential to the heart of the PR |
merge |
+1 |
PR description:
Moving forward, the separation between
ConcurrentMonitorElement
andMonitorElement
is rather poitnless and annoying, given that all DQM will need to useConcurrentMonitorElement
semantics in the future.This PR introduces a Frankenstein-ME that uses parts of the new ME implementation [1] (namely, the locked
MonitorElementData::Value
type) in the current ME code. The result is that the usual interactions with the ME are now locked and thread-safe, similar to theConcurrentMonitorElement
. This means we can now drop theConcurrentMonitorElement
and instead useMonitorElement*
. Of course there are still plenty of non-thread-safe interactions in theMonitorElement
, but these should not be used.[1] https://github.com/schneiml/cmssw/blob/dqm-new-dqmstore-on-CMSSW_11_0_0_pre5/DQMServices/Core/src/MonitorElement.cc
PR validation:
No output changes expected, and none observed.
However, some of the tests where probably incorrect -- the code was almost certainly uncompilable yet passed the tests.