Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept const session in TF interface. #40161

Merged
merged 2 commits into from
Dec 7, 2022

Conversation

riga
Copy link
Contributor

@riga riga commented Nov 27, 2022

PR description

This PR adds five new functions to the TensorFlow interface in PhysicsTools that accept const sessions, four of which are related to performing model inference and one function handles the session deletion. There's also an additional test case that covers the handling of const sessions.

With this PR, there is no need for them to be stored once per stream module instance, but they can rather be moved to global caches (as is the case for tf graphs at the moment), reducing copies in memory.

The latter changes are not contained in this PR, but are subject to a future one(s). For completeness, in the following there are three lists of files where they'd need to happen (with the latter one being optional) and I can open a dedicated PR for them once this one is merged (or a separate PR per subsystem, depending on what's easier to sign off on).

Move sessions to global cache
  • L1Trigger/Phase2L1ParticleFlow/src/TauNNId.cc
  • RecoMET/METPUSubtraction/plugins/DeepMETProducer.cc
  • RecoTauTag/HLTProducers/src/L2TauTagNNProducer.cc
  • RecoTauTag/RecoTau/plugins/DeepTauId.cc
  • RecoTracker/TkSeedGenerator/plugins/DeepCoreSeedGenerator.cc
Remove now obsolete const_cast's
  • RecoHGCal/TICL/plugins/PatternRecognitionbyCA.cc
  • RecoHGCal/TICL/plugins/PatternRecognitionbyCLUE3D.cc
  • RecoHGCal/TICL/plugins/PatternRecognitionbyFastJet.cc
  • RecoHGCal/TICL/plugins/TrackstersMergeProducer.cc
  • RecoHGCal/TICL/plugins/TrackstersMergeProducerV3.cc
  • RecoTracker/FinalTrackSelectors/plugins/TrackTfClassifier.cc
  • RecoTracker/MkFit/plugins/MkFitOutputConverter.cc
Use const session (only if needed)
  • DQM/DTMonitorClient/src/DTOccupancyTestML.cc
  • L1Trigger/L1THGCal/src/concentrator/HGCalConcentratorAutoEncoderImpl.cc
  • L1Trigger/L1TMuonEndCap/src/PtAssignmentEngineDxy.cc
  • PhysicsTools/PatAlgos/interface/BaseMVAValueMapProducer.h
  • RecoEcal/EgammaCoreTools/src/DeepSCGraphEvaluation.cc
  • RecoMuon/TrackerSeedGenerator/plugins/TSGForOIDNN.cc

PR validation

I added test cases for handling const sessions.

@valsdav @yongbinfeng @jeongeun @jpata @clacaputo

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40161/33164

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @riga (Marcel R.) for master.

It involves the following packages:

  • PhysicsTools/TensorFlow (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks.
@makortel this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

const std::vector<std::string>& outputNames,
std::vector<Tensor>* outputs,
const thread::ThreadPoolOptions& threadPoolOptions) {
run(const_cast<Session*>(session), inputs, outputNames, outputs, threadPoolOptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to add a comment (either general, or copied to every const_cast) that the Session::Run() itself is thread safe (and logically const), but is (unfortunately) non-const in the TensorFlow interface and therefore the const_cast is needed but is ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in b83516c.

@makortel
Copy link
Contributor

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-526732/29305/summary.html
COMMIT: fcaa742
CMSSW: CMSSW_13_0_X_2022-11-28-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40161/29305/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3417311
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3417283
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 206 log files, 48 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40161/33181

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

Pull request #40161 was updated. @cmsbuild, @mandrenguyen, @clacaputo can you please check and sign again.

@makortel
Copy link
Contributor

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-526732/29326/summary.html
COMMIT: b83516c
CMSSW: CMSSW_13_0_X_2022-11-29-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40161/29326/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3421159
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3421131
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 206 log files, 48 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@clacaputo
Copy link
Contributor

I can open a dedicated PR for them once this one is merged (or a separate PR per subsystem, depending on what's easier to sign off on)

Hi @riga , thanks a lot. I would suggest using a "per-subsystem" approach for the next PRs

@riga
Copy link
Contributor Author

riga commented Dec 2, 2022

@clacaputo Ok, will do.

Shall I also adjust the files in the third list? In these files, it seems like there is only one session used without concurrency (*), and the change would only consist of making it const. However, it could be good to open PRs for these cases anyway, mainly to get confirmation that the assumption (*) is actually correct.

@clacaputo
Copy link
Contributor

Shall I also adjust the files in the third list? In these files, it seems like there is only one session used without concurrency (), and the change would only consist of making it const. However, it could be good to open PRs for these cases anyway, mainly to get confirmation that the assumption () is actually correct.

We could directly ping the relevant POG/DPG, or open an issue

@clacaputo clacaputo mentioned this pull request Dec 6, 2022
18 tasks
@clacaputo
Copy link
Contributor

+reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 6, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants