Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow up after migration to tasks in #17801 #18018

Closed
slava77 opened this issue Mar 21, 2017 · 10 comments
Closed

Follow up after migration to tasks in #17801 #18018

slava77 opened this issue Mar 21, 2017 · 10 comments

Comments

@slava77
Copy link
Contributor

slava77 commented Mar 21, 2017

Several changes made in #17801 will need updates to resolve comments in code review
#17801 (review)

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 21, 2017

A new Issue was created by @slava77 Slava Krutelyov.

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@slava77
Copy link
Contributor Author

slava77 commented Mar 21, 2017

@Dr15Jones @wddgit

@Dr15Jones
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@wddgit
Copy link
Contributor

wddgit commented Mar 22, 2017

Here are the comments that were not resolved when
the Tasks PR was merged and my replies to them.

  1. A comment to PR Initial Implementation of Tasks in CMS Configurations #17430 : Move MassReplaceInputTag
    and MassReplaceParameter to FWCore/ParameterSet.
    I will do this. Working on it starting today.

The rest were all comments to PR #17801

  1. JetMETCorrections/Configuration/python/JetCorrectorsAllAlgos_cff.py
    These were more comments for schoef and aperloff and
    the improvements suggested were outside the scope of the
    Task PR and Core responsibilities, so I was planning to do nothing
    related to this comment unless Chris asks me to.

  2. JetMETCorrections/Type1MET/python/correctionTermsCaloMet_cff.py
    In general, it seems to me that in cff files that create
    the lowest level sequences only the specific modules required
    on the sequences should be imported and "import *" should not
    be used. But at a higher level, when a cff file imports another
    cff that contains a sequence, then one should use "import *".
    This is to improve maintainability so that when a new module
    is added to a sequence, one only has to modify the file defining
    the sequence and not go through every cff file that uses
    the sequence to ensure that all the needed modules are
    imported. That is the logic I followed in this file.
    I don't think anything should be done here for now.
    There are a couple options I can think of to improve
    this.

Maybe one could reorganize so each Corrector sequence was in
a different cff file or split the Corrector cff into a few cff
files with a smaller number of sequences in each file. Although
that will create many files and I am not sure that is better
or worth the effort. Also this is out of the scope of what
the Core group usually does so I probably would not do this.

It might be possible to define some special function
that is like "load" but which is smart enough to load
a sequence, examine it and load only the modules on
the sequence. But there is nothing like that now and creating
such a function would require some development work ...
And I am not sure what it would do to the CPU time.

To answer the specific question in the comment, after the Python
code is processed and the ParameterSet delivered to the C++ part
of cmsRun, all the Python memory is purged. One could
consider pruning at the end to be wasted effort
because soon after the entire Python interpreter instance
is deleted. At least I think that is how it works. Before Tasks,
it was different because in unscheduled mode all those modules
would get passed into the C++ ParameterSet if they were not pruned.

  1. PhysicsTools/PatAlgos/python/slimming/metFilterPaths_cff.py
    I asked the question "Can the Task be deleted because these are
    EDFilters which do not need to be run unscheduled?" I'm waiting
    for an answer. I'll delete the Task and submit a PR if the answer
    is yes. Otherwise, we can discuss what should be done and who
    should do it.

  2. PhysicsTools/PatAlgos/python/slimming/miniAOD_tools.py
    I will do this. Working on it starting today.

  3. PhysicsTools/PatAlgos/python/slimming/slimming_cff.py
    Having the imports at the top of the file and defining the
    Sequence/Task under all the imports is a common pattern in
    CMSSW. It seems an OK pattern. The alternative is also OK.
    I do not plan to change it, but whoever is maintaining these
    files could make the suggested change, if they think it is worth
    the effort. If applied consistently to PAT or CMSSW there would
    be a lot of files to modify.

  4. PhysicsTools/PatAlgos/python/triggerLayer1/triggerMatcherExamples_cfi.py
    Same reply as 6.

  5. PhysicsTools/PatAlgos/test/miniAOD/patTuple_mini_data.py
    inclusiveVertexing_cff originally had 3 sequences
    defined in it. I just followed the pattern we agreed
    on and made 3 Tasks that corresponded exactly to the
    original 3 sequences.

  6. RecoBTag/SoftLepton/python/softLepton_cff.py
    Same reply as 6.

  7. RecoTauTag/Configuration/python/boostedHPSPFTaus_cfi.py
    I will do this. Working on it starting today.

  8. "I notice that the edmConfigDump of step3 in 136.725 now
    takes a bit longer and it produces 70% more lines."
    Is the concern that there are modules defined that
    are not used? That is not a new problem. Before the Tasks PR,
    all that extra stuff was being created and then was pruned.
    I agree it would be good if unnecessary things were not in
    configurations. Or are you concerned about the size of the
    output of edmConfigDump. If before calling edmConfigDump
    you added a call to the Process prune function at the end
    of the configuration, that would remove the unneeded modules.
    That function still exists.

@slava77
Copy link
Contributor Author

slava77 commented Mar 22, 2017 via email

@wddgit
Copy link
Contributor

wddgit commented Mar 23, 2017

Before the implementation of Tasks it was important to prune when unscheduled was enabled, because if you did not prune then all those extra modules would get constructed in the C++ part of cmsRun and be there waiting to run unscheduled. That reason is now gone, because anything not on a scheduled Task/Path/EndPath does not get passed into the C++. One motivation for not pruning is that it takes CPU time to prune. There is a lot of work going on there identifying what can be pruned and then actually doing the deletes. I've never profiled it so I do not know if it is significant. And in a cmsRun job when the ParameterSet is passed to the C++ part, the Python interpreter gets deleted anyway so the memory is cleaned up then.

There would also be a problem if you prune before the absolute end the configuration, one might be concerned that a customize function would need one of the modules that would have been pruned. other than that, I think the prune function is safe and I am not aware of any problem with it (and if there is one we should fix it).

But I can see your point about extra time in customise and similar methods. And while not configuring unneeded modules in the first place would be better, practically that kind of cleanup may never happen ... I will not do anything on this until I hear from Chris.

@wddgit
Copy link
Contributor

wddgit commented Mar 23, 2017

PR #18057 takes care of items 5 and 10 from my list.

@wddgit
Copy link
Contributor

wddgit commented Mar 24, 2017

PR #18081 Takes care of item 1 from my list. Items 4 and 11 I am waiting for a response from someone else before I do anything and I consider the rest to be the responsibility of someone else. So I am done with my part of this for the moment.

@slava77
Copy link
Contributor Author

slava77 commented Mar 8, 2018

closing as partially resolved

@slava77 slava77 closed this as completed Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants