Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepCore (NN jetCore seeding) implementation #32222

Merged

Conversation

vberta
Copy link
Contributor

@vberta vberta commented Nov 21, 2020

PR description:

This PR include DeepCore, the NN-based seeding algorithm for JetCore iteration. The PR should be considered technical, thus DeepCore is disabled by default. The possibility to enable DeepCore is given with a modifier: seedingDeepCore. Anyway, it should not be used for practical purposes because DeepCore has not been fully tested.

Recostruction side:

The seedingDeepCore code is mainly contained in the plugin: DeepCoreSeedGenerator, which use a pre-trained model (contained in the .pb file from the additiona PR: link). The training code (the NN and the training sample preparation scripts) will be provided in another PR in the future. Several modifications are done in JetCoreRegionalStep_cff.py, all protected by the modifier.

There are also some features enable by default (which do not have an impact in the default reconstruction):

  • some protection against 0-hit seeds (needed only in DeepCore case), in LowPtGsfElectronSeedProducer.cc and GoodSeedProducer.cc,TrackFromSeedProducer.cc
  • some minor modification in TrackAssociatorByPositionImpl (I don't use it anymore, I can drop it, but I think is a useful feature to have)

Validation Side:

A dedicated directory in the Track_validation_cff, under tracking/JetCore which contains some MTV instance plots associated byChi2. The production of this plot is also enabled with seedingDeepCore modifier, but this is a temporary solution, since they are useful also in standard JetCore case. These plots are relevant mostly on high qT QCD samples, almost empty in case of no-dense environment samples).

In addition JetCoreMCtruthSeedGenerator plugin is also provided (under validation). It produces a list of seeds for JetCore iteration using the MC information, in the same fashion of DeepCore (i.e. as a list of hitless seed as seed-track parameters).

This PR includes also the removal of several not-needed tracking plots, completely not related to DeepCore o jetCore iteration, but since we're close to the memory limit they have been removed. Essentially have been removed all the plots related to:

  • dzpvcut_pt
  • dzpvsigcut_pt

Currenlty the workflwow 11723.17 is the only one with DeepCore enabled:
11723.17 2021_seedingDeepCore+QCD_Pt_1800_2400_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST

PR validation:

The content of this PR (enabling DeepCore) has been validated on 9k events with:

cmsDriver.py step3 --conditions auto:phase1_2024_realistic -n -1 --era Run3 --eventcontent RECOSIM,MINIAODSIM,DQM --runUnscheduled -s RAW2DIGI,L1Reco,RECO,RECOSIM,EI,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQM+@ExtraHLT+@miniAODDQM --datatier GEN-SIM-RECO,MINIAODSIM,DQMIO --geometry DB:Extended --filein file:step2.root --fileout file:step3.root --no_exec --procModifiers seedingDeepCore

and (trackingOnly workflow):
cmsDriver.py step3_trackingOnly --conditions auto:phase1_2024_realistic -n -1 --era Run3 --eventcontent RECOSIM,MINIAODSIM,DQM --runUnscheduled -s RAW2DIGI,RECO,RECOSIM,EI,PAT,VALIDATION:@trackingOnlyValidation,DQM:@trackingOnlyDQM --datatier GEN-SIM-RECO,MINIAODSIM,DQMIO --geometry DB:Extended --filein file:step2.root --fileout file:step3.root --no_exec --procModifiers seedingDeepCore

using as input:
/RelValQCD_Pt_1800_2400_14/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM-DIGI-RAW

PR documentation:

Documentation about DeepCore: https://twiki.cern.ch/twiki/bin/view/CMSPublic/NNJetCoreAtCtD2019
(DPS, talk, proceeding at References section)

Presentation of this PR at Reco meeting (11/12/2020): https://indico.cern.ch/event/983567/contributions/4142430/attachments/2161501/3647084/presentation_deepcorePRnov2020_forReco.pdf

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32222/19971

ERROR: Build errors found during clang-tidy run.

  std::vector<GlobalVector> splittedClusterDirections(const reco::Candidate&, const TrackerTopology* const, auto pp, const reco::Vertex& jetVertex, int );
                                                                                                            ^
RecoTracker/TkSeedGenerator/plugins/JetCoreMCtruthSeedGenerator.h:176:40: error: 'auto' not allowed in function prototype [clang-diagnostic-error]
  std::vector<PSimHit> coreHitsFilling(auto,const GeomDet*,GlobalVector,const reco::Vertex&);
                                       ^
RecoTracker/TkSeedGenerator/plugins/JetCoreMCtruthSeedGenerator.h:177:105: error: 'auto' not allowed in function prototype [clang-diagnostic-error]
  std::pair<std::vector<SimTrack>,std::vector<SimVertex>> coreTracksFilling(std::vector<PSimHit>, const auto &, const auto &);
                                                                                                        ^
RecoTracker/TkSeedGenerator/plugins/JetCoreMCtruthSeedGenerator.h:177:119: error: 'auto' not allowed in function prototype [clang-diagnostic-error]
  std::pair<std::vector<SimTrack>,std::vector<SimVertex>> coreTracksFilling(std::vector<PSimHit>, const auto &, const auto &);
                                                                                                                      ^
RecoTracker/TkSeedGenerator/plugins/JetCoreMCtruthSeedGenerator.h:181:90: error: 'auto' not allowed in function prototype [clang-diagnostic-error]
  std::pair<std::vector<SimTrack>,std::vector<SimVertex>> coreTracksFillingDeltaR( const auto &, const auto &,const GeomDet* , const reco::Candidate&,auto );
                                                                                         ^
RecoTracker/TkSeedGenerator/plugins/JetCoreMCtruthSeedGenerator.h:181:104: error: 'auto' not allowed in function prototype [clang-diagnostic-error]
  std::pair<std::vector<SimTrack>,std::vector<SimVertex>> coreTracksFillingDeltaR( const auto &, const auto &,const GeomDet* , const reco::Candidate&,auto );
                                                                                                       ^
RecoTracker/TkSeedGenerator/plugins/JetCoreMCtruthSeedGenerator.h:181:151: error: 'auto' not allowed in function prototype [clang-diagnostic-error]
  std::pair<std::vector<SimTrack>,std::vector<SimVertex>> coreTracksFillingDeltaR( const auto &, const auto &,const GeomDet* , const reco::Candidate&,auto );
                                                                                                                                                      ^
Suppressed 3887 warnings (3883 in non-user code, 3 NOLINT, 1 with check filters).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:128: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@slava77
Copy link
Contributor

slava77 commented Nov 21, 2020

IMHO, "technical" refers mainly to refactoring changes which do not introduce any new features and preserve physics/output behavior of all runnable code.
This is a new feature. I suggest to drop "technical" from the title to avoid confusion.

@vberta
Copy link
Contributor Author

vberta commented Nov 21, 2020

IMHO, "technical" refers mainly to refactoring changes which do not introduce any new features and preserve physics/output behavior of all runnable code.
This is a new feature. I suggest to drop "technical" from the title to avoid confusion.

I called the PR "technical" because the feature (seedingDeepCore) is disabled by default and should not be enabled for the moment, excluding PR testing purposes. I add the possibility to enable it, by the modifier, to allow the eventual testing of the code of the PR only. DeepCore at the current status is working, but is not fully tested, therefore using it can bring to inefficiencies/unexpected behaviour of the reconstruction.
Anyway, I got your point and I removed the word "technical" from the title, but I would like that everybody knows that seedingDeepCore should not be used.

@vberta vberta changed the title DeepCore (NN jetCore seeding) implementation - technical DeepCore (NN jetCore seeding) implementation Nov 21, 2020
@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32222/19973

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32222/19974

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @vberta (Valerio Bertacchi) for master.

It involves the following packages:

Configuration/ProcessModifiers
Configuration/PyReleaseValidation
RecoEgamma/EgammaElectronProducers
RecoParticleFlow/PFTracking
RecoTracker/IterativeTracking
RecoTracker/TkSeedGenerator
SimTracker/TrackAssociatorProducers
TrackingTools/GsfTracking
Validation/RecoTrack

@perrotta, @andrius-k, @silviodonato, @jordan-martins, @civanch, @chayanit, @wajidalikhan, @ErnestaP, @mdhildreth, @cmsbuild, @kmaeshima, @franzoni, @jfernan2, @fioriNTU, @slava77, @jpata, @qliphy, @fabiocos, @kpedro88, @davidlange6 can you please review it and eventually sign? Thanks.
@felicepantaleo, @jainshilpi, @hatakeyamak, @Martin-Grunewald, @lgray, @threus, @varuns23, @seemasharmafnal, @mmarionncern, @makortel, @JanFSchulte, @dgulhan, @slomeo, @Sam-Harper, @cbernet, @GiacomoSguazzoni, @rovere, @VinInn, @bellan, @wmtford, @mschrode, @ebrondol, @mtosi, @fabiocos, @sobhatta, @lecriste, @afiqaize, @gpetruc, @ram1123 this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@civanch
Copy link
Contributor

civanch commented Nov 21, 2020

please test

@jpata
Copy link
Contributor

jpata commented Dec 16, 2020

+reconstruction

  • implements the NN-based jet core seeding algo DeepCore, full details in the PR description
  • only enabled by the seedingDeepCore modifier, a test workflow 11723.17 is available in the matrix
  • we see no reco differences, but there are DQM differences, which are understood to come from new DQM plots added by this PR. The changes to the existing DQM plots are coming from the fitter that derives the plots being stateful and thus depending on previous fits (DeepCore (NN jetCore seeding) implementation  #32222 (comment))
  • igprof results have shown the memory and CPU consumption to be under control and well understood.
  • requires deepcore barrel model 2017 cms-data/RecoTracker-TkSeedGenerator#1

@civanch
Copy link
Contributor

civanch commented Dec 16, 2020

+1

@kpedro88
Copy link
Contributor

+upgrade

@silviodonato
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

@cmsbuild cmsbuild merged commit 1911f2f into cms-sw:master Dec 17, 2020
@silviodonato
Copy link
Contributor

dasgoclient --limit 0 --query 'file dataset=/RelValQCD_Pt_1800_2400_14/CMSSW_11_2_0_pre8-112X_mcRun3_2021_realistic_v10-v1/GEN-SIM'
gives no results, and then we get error in 11723.17, see https://cmssdt.cern.ch/SDT/html/cmssdt-ib/#/relVal/CMSSW_11_3/2020-12-17-2300?selectedArchs=slc7_amd64_gcc900&selectedFlavors=X&selectedStatus=failed

@silviodonato
Copy link
Contributor

Indeed the new test by Shahzad caught the error
image

@silviodonato
Copy link
Contributor

I've created an issue #32543

@@ -1,3 +1,4 @@
<use name="PhysicsTools/TensorFlow"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really that hard to add the dependencies in alphabetical order ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cms-sw/reconstruction-l2 could you add this suggestion to the list of requests you make of people ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really sorry, nobody ever told me this requirement before (probably it is written in some twiki, but I missed it)

@fwyzard
Copy link
Contributor

fwyzard commented Dec 24, 2020 via email

@vberta
Copy link
Contributor Author

vberta commented Dec 24, 2020

hi Valerio, sorry for jumping at you - I don't think it's a requirement not it is documented anywhere; it's just (IMHO) a reasonable practice with two advantages and no downside: - it makes it easier to find the dependencies and avoid duplicates - it makes it easier to avoid conflicts from different PRs, since different additions or deletions are more likely to end up in different places I only noticed because a long standing PR had a conflict after this was merged :-( Easy to fix, of course.

Yes, sure, I fully agree with you. I will follow these suggestions in the future.
If you would like to know the full story: I moved this line at the top due to several (silly) conflicts in the quite intricate rebasing path of this PR. In the long list of fix, I solved this one in this (as you explain not optimal) way. It was not the best choice, you're right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet