Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQM: Merge MonitorElement and ConcurrentMonitorElement #28092

Merged
merged 17 commits into from Nov 8, 2019

Conversation

schneiml
Copy link
Contributor

@schneiml schneiml commented Sep 30, 2019

PR description:

Moving forward, the separation between ConcurrentMonitorElement and MonitorElement is rather poitnless and annoying, given that all DQM will need to use ConcurrentMonitorElement semantics in the future.

This PR introduces a Frankenstein-ME that uses parts of the new ME implementation [1] (namely, the locked MonitorElementData::Value type) in the current ME code. The result is that the usual interactions with the ME are now locked and thread-safe, similar to the ConcurrentMonitorElement. This means we can now drop the ConcurrentMonitorElement and instead use MonitorElement*. Of course there are still plenty of non-thread-safe interactions in the MonitorElement, but these should not be used.

[1] https://github.com/schneiml/cmssw/blob/dqm-new-dqmstore-on-CMSSW_11_0_0_pre5/DQMServices/Core/src/MonitorElement.cc

PR validation:

No output changes expected, and none observed.
However, some of the tests where probably incorrect -- the code was almost certainly uncompilable yet passed the tests.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

1 similar comment
@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28092/12080

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @schneiml (Marcel Schneider) for master.

It involves the following packages:

DQMServices/Core
DataFormats/Histograms

@smuzaffar, @andrius-k, @Dr15Jones, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks.
@barvic, @rovere this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@schneiml
Copy link
Contributor Author

please test

just to see if anything blows up so far.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 30, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/2727/console Started: 2019/09/30 14:40

@cmsbuild
Copy link
Contributor

-1

Tested at: 1125474

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-dcb6be/2727/summary.html

I found follow errors while testing this PR

Failed tests: Build

  • Build:

I found compilation warning when building: See details on the summary page.

@cmsbuild
Copy link
Contributor

Comparison not run due to Build errors (RelVals and Igprof tests were also skipped)

@rovere
Copy link
Contributor

rovere commented Sep 30, 2019

@schneiml
Ciao Marcel, at some point CMS/DQM has to take a decision related to DQMGUI. This Pr is fully destructive and the DQMGUI will not even compile against it.
We can still build it against older versions, which is what we do, but a clear road-map and plan have to be put in place.

@schneiml
Copy link
Contributor Author

@rovere Ciao Marco, yes, we'll need to have this discussion in a bit broader scope at some point.

However, I think it is fine to remove the DQMGUI support on DQMStore's side, since we changed/need to change the inner workings of DQM dramatically on CMSSW side (first threaded mode, in the future decentralized DQMStore). IIRC DQMGUI currently uses a CMSSW7 version of the DQMStore, so it has missed a lot of changes already (does the version it uses even have threaded mode?).

Even if we stick with the old DQMGUI, it is probably perfectly fine to maintain it's version of the DQMStore separately. After all, all that we care about is file IO (and network IO, which is arguably a special case of file IO), and the file formats need to be stable for other reasons anyways. Also, no changes to the file formats are planned at the moment (except maybe minor changes to DQMIO).

@fabiocos
Copy link
Contributor

fabiocos commented Nov 4, 2019

@gennai @Sam-Harper following the comment #28092 (comment) could you please address the question of @schneiml or point to someone able to do that? The point raised by @fwyzard is relevant for the integration of this code, and as 11_0_X needs to be used for productions we cannot afford to use it just to validate choices a posteriori.

@schneiml schneiml mentioned this pull request Nov 4, 2019
@Sam-Harper
Copy link
Contributor

so I'm following up with our experts and hope to have a timing recipe for you soon.

In general though @schneiml , we would prefer if it could be run on our dedicated timing machines vocms003, vocms004 as this allows us an apples to apples comparison. Would this be a problem for you do to so. If not, could you email me your cern username, I'll add you to the people who can access then.

@fioriNTU
Copy link
Contributor

fioriNTU commented Nov 4, 2019

@Sam-Harper thank you very much for "activating" your experts so quickly! However, since this measurement has never be done in the context of the DQM core team, and moving on with this PR is quite urgent, it would be possible to have this test ran by the HLT expert himself?

I do not know how long it can take, but I am sure you can take 1/10 of the time with respect to us. Do you think it is feasible?

@Sam-Harper
Copy link
Contributor

@fioriNTU , this is not normally our policy to do this for developers otherwise we quickly get overwhelmed.

However in this particular case we are also having to commission the timing in 11_0_X as a reference (still iterating with experts), and therefore is it not any significant extra work to also run the timing for this PR at the same time. So we will do so.

@Sam-Harper
Copy link
Contributor

So incase folks are wondering what happened, the test was run, realised it was inadequate for this type of change, re-run again. Those results have been looked at and furtther tests were needed which are on going.

@schneiml
Copy link
Contributor Author

schneiml commented Nov 6, 2019

@Sam-Harper thanks for the update!

It looks like this sill merges cleanly after #28297, so we might get away without another rebase (unless #28247 goes first).

@Sam-Harper
Copy link
Contributor

Okay timing is okay

image

image

No real difference within the jitter.

@Sam-Harper
Copy link
Contributor

btw the first results, I got a 10ms increase but it looks like it was just a random upset as re-running it multiple times we converged on the above result.

@fioriNTU
Copy link
Contributor

fioriNTU commented Nov 7, 2019

Thank you very much @Sam-Harper !

@fabiocos
Copy link
Contributor

fabiocos commented Nov 7, 2019

@Sam-Harper @fwyzard @Martin-Grunewald are there other concerns/comments by HLT experts?
@christopheralanwest @tlampen @rekovic could you please check and comment in case?

@fwyzard
Copy link
Contributor

fwyzard commented Nov 7, 2019

From my side, I'm happy with the result of the test Sam has done.

@Martin-Grunewald
Copy link
Contributor

+1

@schneiml
Copy link
Contributor Author

schneiml commented Nov 8, 2019

Thanks @Sam-Harper !

Btw, if you have reasonably easy instructions how to run these things now I am still interested (on the longer term), to do some profiling and see how much impact DQM has at all in these configurations. In Offline, it is typically very little, which is why I think that many speed hacks in the DQM code are out of place...

@fabiocos
Copy link
Contributor

fabiocos commented Nov 8, 2019

+1

@fabiocos
Copy link
Contributor

fabiocos commented Nov 8, 2019

@christopheralanwest @tlampen @rekovic please check this and comment in case for a possible further iteration, the changes in your are looks consequential to the heart of the PR

@fabiocos
Copy link
Contributor

fabiocos commented Nov 8, 2019

merge

@cmsbuild cmsbuild merged commit 8f02b2b into cms-sw:master Nov 8, 2019
@christopheralanwest
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet