New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQM-related developments, fixes and improvements (master branch) #21652
Conversation
@dmitrijus @rovere FYI |
The code-checks are being triggered in jenkins. |
@cmsbuild, please test |
@dmitrijus, I may push more developments here as I go, so there is no need to sign this yet. |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-21652/2454 Code check has found code style and quality issues which could be resolved by applying a patch in https://cmssdt.cern.ch/SDT/code-checks/PR-21652/2454/git-diff.patch You can run |
The code-checks are being triggered in jenkins. |
@cmsbuild, please test |
+code-checks |
The tests are being triggered in jenkins. |
A new Pull Request was created by @fwyzard (Andrea Bocci) for master. It involves the following packages: DQMServices/Components @vazzolini, @kmaeshima, @dmitrijus, @cmsbuild, @jfernan2, @vanbesien can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
More code checks. |
Comparison is ready The workflows 1003.0, 1001.0, 1000.0, 140.53, 136.7611, 136.731, 4.22 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons @slava77 comparisons for the following workflows were not done due to missing matrix map:
Comparison Summary:
|
// MonitorElement is being booked. | ||
void setTitle(std::string const& title) | ||
{ | ||
me_->setTitle(title); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @fwyzard - ignorant question - why not add the same lock_guards in these functions (or would that not be sufficient for making them concurrent safe?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you are right, and it would indeed make these methods concurrent safe.
The reason I did not mark them as such is the DQM usage pattern, or at least the one I have in mind:
- the
ConcurrentMonitorElements
(CMEs) are booked in the (global) begin run transition, where they are accessible as non-const
objects; - the CMEs are later filled in the event loop, where they are accessible only as
const
objects to emphasise that they are concurrent-safe.
I can add the locks also for the operations on the titles and labels; however, I expect those to be set when the CMEs are created and available as non-const
, are used only by one thread, and anyway already serialised by the big DQM lock.
An alternative approach I've been playing with - which is more flexible but slightly less clean - is to let the users create the histograms on their own via auto h = std::make_unique<TH1F>(name, ...)
etc., customise the axes, titles, labels etc via access the full ROOT object interface, and when done transfer ownership to the DQMStore via a call to book1D(name, std::move(h))
.
This should be just as efficient from the memory point of view (no Clone()
involved), allow more flexibility, at the cost of one more line of code (per histogram) and with the small risk that the users may hold on to the initial pointer and use it later on to bypass the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - I agree its the best usage pattern to have all of these CMEs beautified in the begin run transitions - but if the code allows it elsewhere someone will do...
Your thoughts about transferring ownership of a TH1F are easier in the sense that there are a lot of functions that would need wrapping if the direct access to the TH* is removed (a good thing) - but the tradeoff is the ease of any future underlying histogram interface changes.
The second thought is beyond this PR - do you want to add locks for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather start with the present interface, and update it as needed when we start transitioning the analyzers to use it - we may encounter some patterns I haven't anticipated yet...
hi @fwyzard - sorry for my slow answer - that sounds ok assuming someone thinkings they are responsible for catching developers.. is that you or @dmitrijus ? |
I'm updating myself the modules that run inside the HLT. |
sounds good - @dmitrijus ok for you for the online/offline dqm side of this? |
My "simple" / "one" solution requires DQM modules to be scheduled differently than they are now. But I am fine with this PR. Even after we move on to "simple / one module" for rest of the DQM, DQM should still support the concurrent filling of histograms (Which is what this PR introduces.) Once this PR / CME interface finalizes, we will move many tracker histograms to this too. |
@davidlange6 |
hi @fwyzard - thanks for following up - instead of a confirmation from @dmitrijus I was told something new was coming next week that meant he didn't need to answer my original question. If thats not the plan, the question above still stands. |
From my side, I am working on further updates, but they are orthogonal to the changes in this PR, so I think this should be merged, and I will open a separate one later on. As for @dmitrijus, I don't know what they plan to push next week. |
okay - so i get confusing messages - so back to my question who thinks they are responsible for catching developers.. is that @dmitrijus ? |
(offline dqm developers that is) |
@davidlange6 , here is my understanding, based on the discussion with @dmitrijus from a couple of weeks ago. Currently all DQM modules are My proposal to reduce the memory consumption is to implement a "concurrent" interface to the As I said earlier, I am already migrating the modules used within the HLT to this approach; I will look into the HLT DQM modules used offline together with the STEAM group. The proposal from @dmitrijus is somewhat alternative, somewhat complementary: change the DQM modules from I assume that either the DQM Core team or the individual developers will be in charge of implementing this migration. |
(bottom line, can you please merge this PR ?) |
seems difficult to get answers to simple questions - maybe @vazzolini, @kmaeshima, @jfernan2 can help - or perhaps the DQM meeting tomorrow will resolve it - I'm hoping we don't really plan two directions forward to solve this problem (I prefer @fwyzard's unless proven to have contention issues at some low # of threads [and even then, maybe]) |
The framework provides |
Two solutions provides twice the opportunity for people to debug problems. But anyway, its more a maintenance/responsibility question that I've asked.
… On Dec 18, 2017, at 10:00 AM, Andrea Bocci ***@***.***> wrote:
The framework provides global, stream, one (and now limited) modules... why would it be a problem if some DQM modules are global while others are stream or one ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
+1 |
> @Dr15Jones > I have another theory as to what caused the problem. There are two > different phases in the framework as a lumi is ending. There is the > 'global end lumi' phase and the 'write lumi' phase. The 'write lumi' > phase is only seen by the OutputModules and is when they "write the > lumi to output'. So the change in cms-sw#21652 deletes some histograms at > 'end of global end lumi' which is before 'write lumi' gets called. > Therefore if the deleted histograms were supposed to be saved to the > file, they would be missing. A workaround for the issue is to let the `DQMStore` delete the no-longer-needed lumi-based histograms in the "pre global begin lumi" transition instead of the "post global end lumi".
Let the DQMStore delete unused MEs independently from the output modules
The multithreaded
DQMStore
can create a global copy of each by-lumi ME after each lumisection, in order to merge the stream MEs and write them out.Deleting these MEs has been left to the
DQMFileSaver
orDQMOutputModule
.This has the disadvantage that
This PR moves the call to
deleteUnusedLumiHistograms
into theDQMStore
itself, via the postGlobalEndLumi transition, solving both problems.Support global per-lumisection MEs
To save memory, it is desirable to introduce thread-safe global
MonitorElements
, with the only restriction that only one lumisection can be processed at a given time.This PR introduces the code changes in order to support global, per-lumisection MEs:
Support histograms with concurrent filling
Add a
ConcurrentMonitorElement
wrapper around theMonitorElement
class that exposes a concurrent filling interface:fill()
is marked asconst
to indicate concurrency, and uses a lock internally to protect the underlying ROOT object;setTitle()
,setAxis...()
etc. methods are non-const
, to be used only while theConcurrentMonitorElement
is being booked;Add a
ConcurrentBooker
class that extends theIBooker
interface to returnConcurrentMonitorElement
s instead ofMonitorElement
s, and abookConcurrentTransaction()
call to make use of it.Implement a base class for "global" DQM analyzers, using the
ConcurrentBooker
andbookConcurrentTransaction()
calls.For a quick tutorial about using these new classes, see How to migrate a DQMEDanalyzer to a DQMGlobalEDAnalyzer.