Add SwitchProducer mechanism to allow runtime decision which algorithm implementation to run #25439

makortel · 2018-12-06T21:49:39Z

This PR provides an implementation for a SwitchProducer to allow a runtime decision on which EDProducer from a given set to run for a given module label. The main use case in mind are heterogeneous/offloaded/accelerated algorithms, where we would run the accelerated EDProducer if the accelerator device is available in the system, and run the "legacy" CPU EDProducer otherwise.

For a given accelerator type, the generic SwitchProducer is supposed to be inherited along

def _switch_foo():
    return (isFooDeviceEnabled(), 2)

class SwitchProducerFoo(cms.SwitchProducer):
    def __init__(self, **kargs):
        super(SwitchProducerFoo,self).__init__(
            dict(cpu = SwitchProducer.getCpu(),
                 foo = _switch_foo),
            **kargs
        )

Here the deriving class provides a dictionary of possible cases along with functions that return a (bool, int) tuple. The bool tells whether that device type is enabled on the machine, and the int gives a priority of that device wrt. other devices. SwitchProducer.getCpu() returns a function for the CPU (always enabled, priority 1). Note that in order to be able to pickle the configuration, these functions have to be at (python) module level (i.e. e.g. lambdas or static methods are not sufficient).

The decision logic chooses the case that has the largest priority among those that are enabled. The decision is made at the point where the python configuration is transformed for C++, i.e. in case of production at the worker node.

The switch decision does not affect the configuration hash, meaning that the framework will consider a run/lumi/event produced with isFooDeviceEnabled() == False the same as with True.

The concrete SwitchProducerFoo would then be used along (e.g. in a cfi file)

bar = SwitchProducerFoo(
    cpu = cms.EDProducer("Bar", ...),
    foo = cms.EDProducer("BarOnFooDevice", ...)
)

In this example, if isFooDeviceEnabled() returns

True => BarOnFooDevice EDProducer is run
False => Bar EDProducer is run

The switched EDProducers are required to declare exactly the same products with the produces() call.

When a client asks for a products if bar, it will get pointed to the exactly the same edm::Wrapper that contains the product by the chosen EDProducer. The additional "layer' is, however, tracked in provenance, such that the parent of a product bar is bar@cpu or bar@foo, depending which one of the EDProducers was run.

The python interface is likely to evolve with more experience with it, with accelerated EDProducers, and with devices beyond CUDA. Note that this PR adds only the generic infrastructure, a CUDA-specific implementation will follow in a separate PR (or e.g. as part of the effort of #25353).

Tested in 10_4_0_pre3, no changes expected.

cmsbuild · 2018-12-06T21:49:58Z

The code-checks are being triggered in jenkins.

cmsbuild · 2018-12-06T22:02:12Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25439/7508

cmsbuild · 2018-12-06T22:02:33Z

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

DataFormats/Common
DataFormats/Provenance
FWCore/Framework
FWCore/Integration
FWCore/Modules
FWCore/ParameterSet

@cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @rovere, @wddgit this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

fwyzard · 2018-12-06T22:04:11Z

The switched EDProducers are required to declare exactly the same products with the produces() call.

Doesn't this go in the opposite direction of what we discussed earlier, with the possibility of having different "chain" of modules used for e.g. the CPU vs GPU reconstruction ?

Or do you expect the switch to be used only for the last module in the chain, and relay on the dependencies to trigger different chains ?

But, wouldn't that lead to different provenances ?

makortel · 2018-12-06T22:04:43Z

@cmsbuild, please test

cmsbuild · 2018-12-06T22:05:02Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/32044/console Started: 2018/12/06 23:05

makortel · 2018-12-06T22:29:13Z

@fwyzard

Doesn't this go in the opposite direction of what we discussed earlier, with the possibility of having different "chain" of modules used for e.g. the CPU vs GPU reconstruction ?

No, enabling that has been the plan all along.

Or do you expect the switch to be used only for the last module in the chain, and relay on the dependencies to trigger different chains ?

Exactly, the switch is used only for the last module in the chain, and the prefetching (along data dependencies) takes care of the rest.

Or to be more precise, the switch should be used for each step of the chain for which a product is (possibly) needed in the CPU (such that a CPU version of the algorithm exists as well). E.g. for the current state of patatrack (#25353) there would be switches for

siPixelDigis: between SiPixelRawToDigi and (e.g.) SiPixelDigisFromCUDA
siPixelClustersPreSplitting: between SiPixelClusterProducer and (e.g.) SiPixelClustersFromCUDA
siPixelRecHitsPreSplitting: between SiPixelRecHitConverter and (e.g.) SiPixelRecHitsFromCUDA
pixelTracksHitQuadruplets: between CAHitQuadrupletEDProducer and (e.g.) CAHitQuadrupletsFromCUDA

Here all *FromCUDA are the modules that convert the CPU-side SoA to the "legacy" data formats (and are "to be done", naming convention can be discussed).

But, wouldn't that lead to different provenances ?

At the event level provenance (ProductProvenance), yes, but that is also intentional, so that we can check afterwards what exactly was run. But it's really not different from a (hypothetical) case now where an EDProducer (producing C) reads a product A on some events and B on the rest. In such a case the ProductProvenance shows that in "A" events the parent of C is A, and in "B" events the parent is B.

cmsbuild · 2018-12-07T04:30:45Z

+1
Tested at: 6ea40f9
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25439/32044/summary.html

cmsbuild · 2018-12-07T04:30:49Z

Comparison job queued.

cmsbuild · 2018-12-07T06:04:01Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25439/32044/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 33
DQMHistoTests: Total histograms compared: 3136422
DQMHistoTests: Total failures: 100
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3136118
DQMHistoTests: Total skipped: 204
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
Checked 137 log files, 14 edm output root files, 33 DQM output files

cmsbuild · 2019-01-08T15:56:14Z

The code-checks are being triggered in jenkins.

makortel · 2019-01-08T15:57:54Z

Tidied up the history. Apparently nowadays GitHub provides a nice link to show the diff of a force-push. That provides a way to see a diff of an update even if some earlier commits got updated (as long as the base commit stays the same).

cmsbuild · 2019-01-08T16:02:01Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25439/7862

This PR adds an extra 152KB to repository

cmsbuild · 2019-01-08T16:02:29Z

Pull request #25439 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again.

makortel · 2019-01-08T16:03:16Z

@cmsbuild, please test

cmsbuild · 2019-01-08T16:03:44Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/32471/console Started: 2019/01/08 17:04

cmsbuild · 2019-01-09T00:05:08Z

+1
Tested at: 3cfce3d
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25439/32471/summary.html

cmsbuild · 2019-01-09T00:05:13Z

Comparison job queued.

cmsbuild · 2019-01-09T01:48:56Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25439/32471/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 2 differences found in the comparisons
DQMHistoTests: Total files compared: 33
DQMHistoTests: Total histograms compared: 3153717
DQMHistoTests: Total failures: 232
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3153281
DQMHistoTests: Total skipped: 204
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 32 files compared)
Checked 137 log files, 14 edm output root files, 33 DQM output files

Dr15Jones · 2019-01-10T14:49:58Z

+1

cmsbuild · 2019-01-10T14:50:30Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

fabiocos · 2019-01-10T21:28:22Z

+1

makortel added 2 commits December 6, 2018 15:36

Allow a module to insert multiple items to a list

9eb83db

Add SwitchProducer to python configuration

e8d5090

cmsbuild added this to the CMSSW_10_4_X milestone Dec 6, 2018

cmsbuild added code-checks-pending comparison-pending core-pending orp-pending pending-signatures tests-pending labels Dec 6, 2018

cmsbuild added code-checks-approved and removed code-checks-pending labels Dec 6, 2018

cmsbuild added tests-started and removed tests-pending labels Dec 6, 2018

cmsbuild added tests-approved and removed tests-started labels Dec 7, 2018

cmsbuild removed the comparison-pending label Dec 7, 2018

cmsbuild added code-checks-approved and removed code-checks-pending labels Jan 8, 2019

cmsbuild added tests-started and removed tests-pending labels Jan 8, 2019

cmsbuild added tests-approved and removed tests-started labels Jan 9, 2019

cmsbuild added comparison-available and removed comparison-pending labels Jan 9, 2019

cmsbuild added core-approved fully-signed and removed core-pending pending-signatures labels Jan 10, 2019

cmsbuild added orp-approved and removed orp-pending labels Jan 10, 2019

cmsbuild merged commit 308403e into cms-sw:master Jan 10, 2019

makortel mentioned this pull request Jan 25, 2019

Actually run the SwitchProducer tests #25769

Merged

makortel mentioned this pull request Mar 12, 2019

Next prototype of the framework integration cms-patatrack/cmssw#119

Open

5 tasks

makortel deleted the switchProducer branch April 7, 2022 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SwitchProducer mechanism to allow runtime decision which algorithm implementation to run #25439

Add SwitchProducer mechanism to allow runtime decision which algorithm implementation to run #25439

makortel commented Dec 6, 2018

cmsbuild commented Dec 6, 2018

cmsbuild commented Dec 6, 2018

cmsbuild commented Dec 6, 2018

fwyzard commented Dec 6, 2018

makortel commented Dec 6, 2018

cmsbuild commented Dec 6, 2018 •

edited

makortel commented Dec 6, 2018

cmsbuild commented Dec 7, 2018

cmsbuild commented Dec 7, 2018

cmsbuild commented Dec 7, 2018

cmsbuild commented Jan 8, 2019

makortel commented Jan 8, 2019

cmsbuild commented Jan 8, 2019

cmsbuild commented Jan 8, 2019

makortel commented Jan 8, 2019

cmsbuild commented Jan 8, 2019 •

edited

cmsbuild commented Jan 9, 2019

cmsbuild commented Jan 9, 2019

cmsbuild commented Jan 9, 2019

Dr15Jones commented Jan 10, 2019

cmsbuild commented Jan 10, 2019

fabiocos commented Jan 10, 2019

Add SwitchProducer mechanism to allow runtime decision which algorithm implementation to run #25439

Add SwitchProducer mechanism to allow runtime decision which algorithm implementation to run #25439

Conversation

makortel commented Dec 6, 2018

cmsbuild commented Dec 6, 2018

cmsbuild commented Dec 6, 2018

cmsbuild commented Dec 6, 2018

fwyzard commented Dec 6, 2018

makortel commented Dec 6, 2018

cmsbuild commented Dec 6, 2018 • edited

makortel commented Dec 6, 2018

cmsbuild commented Dec 7, 2018

cmsbuild commented Dec 7, 2018

cmsbuild commented Dec 7, 2018

cmsbuild commented Jan 8, 2019

makortel commented Jan 8, 2019

cmsbuild commented Jan 8, 2019

cmsbuild commented Jan 8, 2019

makortel commented Jan 8, 2019

cmsbuild commented Jan 8, 2019 • edited

cmsbuild commented Jan 9, 2019

cmsbuild commented Jan 9, 2019

cmsbuild commented Jan 9, 2019

Dr15Jones commented Jan 10, 2019

cmsbuild commented Jan 10, 2019

fabiocos commented Jan 10, 2019

cmsbuild commented Dec 6, 2018 •

edited

cmsbuild commented Jan 8, 2019 •

edited