New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQMIO and DQM paths need to be serialized #15866
Comments
A new Issue was created by @Dr15Jones Chris Jones. @davidlange6, @smuzaffar, @Dr15Jones can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core |
assign dqm |
New categories assigned: core,dqm @Dr15Jones,@smuzaffar,@vanbesien,@dmitrijus you have been requested to review this Pull request/Issue and eventually sign? Thanks |
ping do people have any preference for the solution? |
Hello, neither of us are the experts on the framework, but I think solution #3 would be the best. |
At the Core meeting I proposed the following The idea is to add a new type, EDSummaryProducer, to the framework and convert all DQMEDAnalyzers to this type (done centrally). From the meeting, this would have to be done within the next week (all the work would be done by Core). Would that be acceptable to you? An alternative, but not well liked, idea was to change EDAnalyzers to allow them to put data into the Run and LuminosityBlock at end transitions. That touches less user code but muddles the behaviors of the module types. |
@fwyzard FYI |
Hi, I should have stayed for the core meeting... If it is before data taking, this could disastrous - DQM is famous for having "out of the loop" sequences. Second point, is what happens to the DQMEDHarvester modules? In official "reco" step, harvesting modules are not run (they run in a separate step), just DQMEDAnalyzers, but in online DQM and in many "custom" user sequences, two are run together. Would DQMEDHarvester have to modified to consume the DQMToken? But it has to be run before the output module is run, but after the histograms are merged into the global histograms. If we leave this for after the data taking, the best would to think about reworking the entire DQM to make it work with multiple containers, instead of one shared container for all the histograms. |
In the meeting, it was stated that the changes had to go in before datataking else they couldn't happen until November because we need to easily backport. Delaying to November would completely stop further framework development until then. Upon further reflection, I think we can compromise. If the framework changes which add the new module type go in for data taking (i.e. CMSSW_9_2) then the rest of the changes can go in to the release following CMSSW_9_2. If a module were changed in the later release, it could still be back ported into CMSSW_9_2 because the framework support would be there. This would allow a more leisurely migration of the DQM code without possible disruption to data taking.
To be completely correct, the DQMEDHarverster should become a EDSummaryProducer (since it makes histograms which ultimately must happen before the OutputModule is run) and it should Your question about the DQMEDHarvester made me go back and look more closely at the code and I found an interesting fact. The DQMEDAnalyzers do all their work on end stream transition (they are also called on end global transition but the DQMEDAnalyzer code does nothing). The DQMRootOutputModule and the DQMEDHarversters do their work on end global transitions. The framework guarantees that all end stream transitions happen before the end global transition. Therefore all DQMEDAnalyzers are guaranteed to have done their job filling histograms before the DQMRootOutputModule or the DQMEDHarvester ever reads those histograms. So if we guarantee that no DQMEDAnalyzer overrides the static methods Therefore it is only the DQMEDHarvesters which need to be converted to EDSummaryProducer because they do their work at global end transition and need to be run before the DQMRootOutputModule is called. |
On 5/10/17 7:27 AM, Chris Jones wrote:
In the meeting, it was stated that the changes had to go in before
datataking
I have interpreted this as "for the summer 93X (previously known as 92X)
release"
rather than by today.
Has this blocking feature been discovered in the past few days only?
Recall that there was no 92X release until end of July on the schedule
until other problems appeared.
|
@slava77 at the meeting I was told any change would have to happen within 2 weeks. Would that mean 9_3_X (which I totally missed hearing about)? This would not be a blocking change (i.e. we would not wait for it). If it didn't make it in then I'd have to figure something else out for the framework to be able to do work for the next 9 months :). |
On 5/10/17 7:58 AM, Chris Jones wrote:
@slava77 <https://github.com/slava77> at the meeting I was told any
change would have to happen within 2 weeks. Would that mean 9_3_X (which
I totally missed hearing about)?
I may have been myself confused with numbering.
Anyways, IMHO, disruptive framework level changes should not go in to
the "major bug fix" release inserted to fix one specific large problem.
This would not be a blocking change (i.e. we would not wait for it). If
it didn't make it in then I'd have to figure something else out for the
framework to be able to do work for the next 9 months :).
So, this is a newly discovered feature/problem, right?
Otherwise I can't understand the urgency given that 92X is an
exceptional release with O(week) time scale to converge.
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15866 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdcbql0qJ6aOw7BUhPM3_9mVD9jlqwcks5r4dCpgaJpZM4J-dAL>.
|
On May 10, 2017, at 5:04 PM, Slava Krutelyov ***@***.***> wrote:
On 5/10/17 7:58 AM, Chris Jones wrote:
> @slava77 <https://github.com/slava77> at the meeting I was told any
> change would have to happen within 2 weeks. Would that mean 9_3_X (which
> I totally missed hearing about)?
I may have been myself confused with numbering.
Anyways, IMHO, disruptive framework level changes should not go in to
the "major bug fix" release inserted to fix one specific large problem.
>
> This would not be a blocking change (i.e. we would not wait for it). If
> it didn't make it in then I'd have to figure something else out for the
> framework to be able to do work for the next 9 months :).
So, this is a newly discovered feature/problem, right?
Otherwise I can't understand the urgency given that 92X is an
exceptional release with O(week) time scale to converge.
was not so disruptive or risky... nor really FWK changes. Rather a small interface change that otherwise people have to back port around. If back port headaches during the run are preferred, I guess it won't affect me:)
…
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#15866 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AEdcbql0qJ6aOw7BUhPM3_9mVD9jlqwcks5r4dCpgaJpZM4J-dAL>.
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@slava77 This issue was first openned Sep 15, 2016 so it is not 'new'. |
On 5/10/17 8:13 AM, Chris Jones wrote:
only introduce a new module type and would change nothing else
sorry, I have misunderstood the significance of the preceding discussion
which was suggesting that existing modules and sequences are going to change
|
@dmitrijus I looked deeper into the DQMEDHarvester and see that users can not override Therefore it is impossible for a class inheriting from DQMEDHarvester to care about Events. In that case, if we changed the DQMEDHarverster to inherit from The upshot is it appears that we do not have a need for the new module |
@dmitrijus @rovere Can the |
Here is a new proposal
These changes can either be done before data taking or can happen in a later summer release. If we do the change 'now' it means it would be easy to backport any changes to DQMEDHarvester from the Summer development release to the data taking release. If we wait, then part of a backport of a configuration file fragment containing a DQMEDHarvester class would entail changing the type back to Comments? |
Just an idea: add inside FWCore.ParameterSet.Config a line like
and change all python files to use |
I had a similar thought but would prefer the line to be in DQMServices/Core/python/DQMEDHarvester.py since that is where the base class is from. The reason is the DQMEDHarvester is not a type known to the framework. |
True - but it would require users to import a file that they currently do not need to import. Maybe we could have that line in For example we could add |
A coulple things I don't like about a python |
Again, true, of course. I can think of some workarounds, but they seems like being more work than this change would be worth... |
@Dr15Jones, all |
I have no idea. They are never used together - at least I don't know anyone using them together. As far as planning goes, the biggest worry is precisely backporting. I just want to avoid the situation where "upstream" DQM changes are no longer trivial to backport into "production". If there no interface changes or if they are minimal, I am fine with it. |
#18701 makes the needed interface change for the DQMEDHarvester. The actually use of the DQMToken would be done in a later step since it is not critical for allowing easy backporting. |
A comment for #18701...
and then use cms.DQMEDHarvserver every time they are instantiated. |
@dmitrijus There are some big drawbacks to doing that which I touch on here If you still want it, we can do it that way since it is your domain. |
@dmitrijus #18717 and #18751 are the two alternate user interfaces. Please let us know which one you wish to support. |
@vanbesien @dmitrijus it is crucial that this issue be resolved before CMSSW_9_2 is released since this is prohibiting the framework from moving forward for greater threading efficiency. We've attempted to give you two options, #18717 and #18751 with explainations about their pros and cons. Since DQM is the one who is responsible for maintaining and documenting that code the final decision is yours. |
@vanbesien @dmitrijus I spoke with @davidlange6 and the timing for the decision is not this week, but within the next two weeks (i.e. for when CMSSW_9_2 closes). |
@vanbesien @dmitrijus which of the two options do you prefer, #18717 or #18751 ? |
Answered with +1, #18751 |
Thanks! |
@Dr15Jones this is also done, as I understand? |
+1 |
With the coming of running Paths and EndPaths concurrently during Lumi and Run transitions, the dqmoffine_step path and the path holding the DQMOutputModule need to be serialized. The problem is the DQM EDAnalyzers have a hidden (from the framework) data dependency with DQMOutputModule. The several ways to fix this:
The text was updated successfully, but these errors were encountered: