New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XeXe collision era and relval workflow #20749
Conversation
The code-checks are being triggered in jenkins. |
+code-checks |
A new Pull Request was created by @mandrenguyen (Matthew Nguyen) for master. It involves the following packages: Configuration/Eras @perrotta, @prebello, @vazzolini, @dmitrijus, @kmaeshima, @civanch, @perrozzi, @efeyazgan, @kpedro88, @fabozzi, @cmsbuild, @GurpreetSinghChahal, @franzoni, @thuer, @slava77, @mdhildreth, @vanbesien, @govoni, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
peripheralPbPb.toModify(PixelTripletLargeTipGenerator, maxElement = 1000000) | ||
from Configuration.Eras.Modifier_pp_on_XeXe_2017_cff import pp_on_XeXe_2017 | ||
for e in [peripheralPbPb, pp_on_XeXe_2017]: | ||
e.toModify(PixelTripletLargeTipGenerator, maxElement = 1000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifications in this file have no effect for pp reco (though consistency doesn't hurt until this file gets cleaned up).
useFoundVertices = True, | ||
originRadius = 1.5 | ||
)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block (or most of it) is copy-pasted along the concerned files. Would it be possible to have one definition somewhere and use that in toReplaceWith()
calls?
useFoundVertices = True, | ||
originRadius = 1.5 | ||
)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly copy-paste
ptMin = 0.6, | ||
useFoundVertices = True, | ||
originRadius = 0.02 | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly copy-paste
) | ||
) | ||
|
||
pp_on_XeXe_2017.toReplaceWith(firstStepPrimaryVerticesUnsorted.TkClusParameters, _pp_on_XeXe_2017_TkClusParameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change to
pp_on_XeXe_2017.toModify(firstStepPrimaryVerticesUnsorted,
TkFilterParameters = dict(maxD0Significance = 3.0),
TkClusParameters = cms.PSet(
algorithm = cms.string("gap"),
TkGapClusParameters = cms.PSet(
zSeparation = cms.double(1.0)
)
)
)
useFoundVertices = True, | ||
originRadius = 0.02 | ||
)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly copy-paste
useFoundVertices = True, | ||
originRadius = 0.02 | ||
)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly copy-paste
please test |
Comparison is ready Comparison Summary:
|
+1 |
hi @slava77 @perrotta @mandrenguyen - is this and its 92x back port nearly ready to go? |
On my side, it’s ready
…Sent from my iPhone
On Oct 8, 2017, at 19:21, David Lange ***@***.***> wrote:
hi @slava77 @perrotta @mandrenguyen - is this and its 92x back port nearly ready to go?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@davidlange6 : this is ok for me, as well as the 92X backport |
merge |
@slava77 I ran 100 events starting from GEN-SIM in 9_2_13. On a 2 GHz machine I'm only clocking in at 20 sec/event. We'll have to create a huge number of PDs to accomodate the 60 sec/evt, so it's important to understand which is more accurate. Is there something missing from my approach? step3 --runUnscheduled --conditions auto:phase1_2017_realistic -s RAW2DIGI,L1Reco,RECO,EI,PAT --datatier GEN-SIM-RECO,MINIAODSIM -n 2 --era Run2_2017_pp_on_XeXe --eventcontent RECOSIM,MINIAODSIM --filein /store/user/mnguyen//hydjetDrum5_XeXe_MB_921p12/hydjetDrum5_XeXe_MB_921p12/crab_Hydjet_Quenched_MB_XeXe_5442GeV_DIGI_9212p1/171009_174655/0000step2_DIGI_L1_DIGI2RAW_HLT_1.root --no_exec |
Lets see what I reproduce on a "standard" cern box (which are slow..)
btw- any idea why we need to read this file every event?
Begin processing the 15th record. Run 1, Event 15, LumiSection 1 at 10-Oct-2017 10:11:50.703 CEST
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
… On Oct 10, 2017, at 7:55 AM, Matthew Nguyen ***@***.***> wrote:
@slava77 I ran 100 events starting from GEN-SIM in 9_2_13. On a 2 GHz machine I'm only clocking in at 20 sec/event. We'll have to create a huge number of PDs to accomodate the 60 sec/evt, so it's important to understand which is more accurate. Is there something missing from my approach?
I simply ran the following driver, adding only process.Timing = cms.Service("Timing")
step3 --runUnscheduled --conditions auto:phase1_2017_realistic -s RAW2DIGI,L1Reco,RECO,EI,PAT --datatier GEN-SIM-RECO,MINIAODSIM -n 2 --era Run2_2017_pp_on_XeXe --eventcontent RECOSIM,MINIAODSIM --filein /store/user/mnguyen//hydjetDrum5_XeXe_MB_921p12/hydjetDrum5_XeXe_MB_921p12/crab_Hydjet_Quenched_MB_XeXe_5442GeV_DIGI_9212p1/171009_174655/0000step2_DIGI_L1_DIGI2RAW_HLT_1.root --no_exec
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
so eventually I get something more like 25cpu seconds/event - I turned off everything aside from RECO for my test as I had understood we were trying not to run miniAOD and it seems DQM has picked up a minIAOD dependency (despite defining separate sequences for miniAOD monitoring:( )
so maybe we are comparing apples and oranges with @slava77?
An exception of category 'ProductNotFound' occurred while
[0] Processing Event run: 1 lumi: 1 event: 5 stream: 7
[1] Running path 'dqmoffline_1_step'
[2] Prefetching for module MuonRecoAnalyzer/'muonRecoAnalyzer_miniAOD'
[3] Prefetching for module PATMuonSlimmer/'slimmedMuons'
[4] Calling method for module PATPackedCandidateProducer/'packedPFCandidates'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: std::vector<reco::PFCandidate>
Looking for module label: puppiNoLep
Looking for productInstanceName:
… On Oct 10, 2017, at 10:13 AM, David Lange ***@***.***> wrote:
Lets see what I reproduce on a "standard" cern box (which are slow..)
btw- any idea why we need to read this file every event?
Begin processing the 15th record. Run 1, Event 15, LumiSection 1 at 10-Oct-2017 10:11:50.703 CEST
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
> On Oct 10, 2017, at 7:55 AM, Matthew Nguyen ***@***.***> wrote:
>
> @slava77 I ran 100 events starting from GEN-SIM in 9_2_13. On a 2 GHz machine I'm only clocking in at 20 sec/event. We'll have to create a huge number of PDs to accomodate the 60 sec/evt, so it's important to understand which is more accurate. Is there something missing from my approach?
> I simply ran the following driver, adding only process.Timing = cms.Service("Timing")
>
> step3 --runUnscheduled --conditions auto:phase1_2017_realistic -s RAW2DIGI,L1Reco,RECO,EI,PAT --datatier GEN-SIM-RECO,MINIAODSIM -n 2 --era Run2_2017_pp_on_XeXe --eventcontent RECOSIM,MINIAODSIM --filein /store/user/mnguyen//hydjetDrum5_XeXe_MB_921p12/hydjetDrum5_XeXe_MB_921p12/crab_Hydjet_Quenched_MB_XeXe_5442GeV_DIGI_9212p1/171009_174655/0000step2_DIGI_L1_DIGI2RAW_HLT_1.root --no_exec
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
ciao
i think i do not see the log here
can you please point me to the dependency you are mentioning ?
thanks
mia
… On 10 Oct 2017, at 12:38, David Lange ***@***.***> wrote:
so eventually I get something more like 25cpu seconds/event - I turned off everything aside from RECO for my test as I had understood we were trying not to run miniAOD and it seems DQM has picked up a minIAOD dependency (despite defining separate sequences for miniAOD monitoring:( )
so maybe we are comparing apples and oranges with @slava77?
An exception of category 'ProductNotFound' occurred while
[0] Processing Event run: 1 lumi: 1 event: 5 stream: 7
[1] Running path 'dqmoffline_1_step'
[2] Prefetching for module MuonRecoAnalyzer/'muonRecoAnalyzer_miniAOD'
[3] Prefetching for module PATMuonSlimmer/'slimmedMuons'
[4] Calling method for module PATPackedCandidateProducer/'packedPFCandidates'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: std::vector<reco::PFCandidate>
Looking for module label: puppiNoLep
Looking for productInstanceName:
> On Oct 10, 2017, at 10:13 AM, David Lange ***@***.***> wrote:
>
> Lets see what I reproduce on a "standard" cern box (which are slow..)
>
> btw- any idea why we need to read this file every event?
>
> Begin processing the 15th record. Run 1, Event 15, LumiSection 1 at 10-Oct-2017 10:11:50.703 CEST
> ==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
> LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
> cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
> ==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
> LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
> cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
> ==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
> LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
> cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
> ==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
>
>
> > On Oct 10, 2017, at 7:55 AM, Matthew Nguyen ***@***.***> wrote:
> >
> > @slava77 I ran 100 events starting from GEN-SIM in 9_2_13. On a 2 GHz machine I'm only clocking in at 20 sec/event. We'll have to create a huge number of PDs to accomodate the 60 sec/evt, so it's important to understand which is more accurate. Is there something missing from my approach?
> > I simply ran the following driver, adding only process.Timing = cms.Service("Timing")
> >
> > step3 --runUnscheduled --conditions auto:phase1_2017_realistic -s RAW2DIGI,L1Reco,RECO,EI,PAT --datatier GEN-SIM-RECO,MINIAODSIM -n 2 --era Run2_2017_pp_on_XeXe --eventcontent RECOSIM,MINIAODSIM --filein /store/user/mnguyen//hydjetDrum5_XeXe_MB_921p12/hydjetDrum5_XeXe_MB_921p12/crab_Hydjet_Quenched_MB_XeXe_5442GeV_DIGI_9212p1/171009_174655/0000step2_DIGI_L1_DIGI2RAW_HLT_1.root --no_exec
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub, or mute the thread.
> >
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@davidlange6 So you removed the PAT step? I initially had that removed, but then we put back mini-AOD and the associated DQM in case it's being produced. You can see that I have that in my driver, and i'm still getting timing close to yours. |
I added back pat and dqm. 40 CPU seconds per event including the start up time / first event time / shutdown time, so likely 30 seconds in the event loop.
also I don't see such a high RSS - more like 8-9GB/8 cores at least for 100 events aside from an increase up to 11 near the end of the job that might be a sign of trouble...
[Maybe Slava left the crossing frames production on?]
… On Oct 10, 2017, at 12:44 PM, Matthew Nguyen ***@***.***> wrote:
@davidlange6 So you removed the PAT step? I initially had that removed, but then we put back mini-AOD and the associated DQM in case it's being produced. You can see that I have that in my driver, and i'm still getting timing close to yours.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
process.dqmoffline_step = cms.EndPath(process.DQMOffline)
process.dqmoffline_1_step = cms.EndPath(process.DQMOfflineMiniAOD)
process.dqmofflineOnPAT_step = cms.EndPath(process.PostDQMOffline)
process.dqmofflineOnPAT_1_step = cms.EndPath(process.PostDQMOfflineMiniAOD)
now i see the reason for my confusion - nonsense path names that don't reflect whats being run on them...
… On Oct 10, 2017, at 12:44 PM, mia tosi ***@***.***> wrote:
ciao
i think i do not see the log here
can you please point me to the dependency you are mentioning ?
thanks
mia
> On 10 Oct 2017, at 12:38, David Lange ***@***.***> wrote:
>
> so eventually I get something more like 25cpu seconds/event - I turned off everything aside from RECO for my test as I had understood we were trying not to run miniAOD and it seems DQM has picked up a minIAOD dependency (despite defining separate sequences for miniAOD monitoring:( )
>
> so maybe we are comparing apples and oranges with @slava77?
>
> An exception of category 'ProductNotFound' occurred while
> [0] Processing Event run: 1 lumi: 1 event: 5 stream: 7
> [1] Running path 'dqmoffline_1_step'
> [2] Prefetching for module MuonRecoAnalyzer/'muonRecoAnalyzer_miniAOD'
> [3] Prefetching for module PATMuonSlimmer/'slimmedMuons'
> [4] Calling method for module PATPackedCandidateProducer/'packedPFCandidates'
> Exception Message:
> Principal::getByToken: Found zero products matching all criteria
> Looking for type: std::vector<reco::PFCandidate>
> Looking for module label: puppiNoLep
> Looking for productInstanceName:
>
>
>
> > On Oct 10, 2017, at 10:13 AM, David Lange ***@***.***> wrote:
> >
> > Lets see what I reproduce on a "standard" cern box (which are slow..)
> >
> > btw- any idea why we need to read this file every event?
> >
> > Begin processing the 15th record. Run 1, Event 15, LumiSection 1 at 10-Oct-2017 10:11:50.703 CEST
> > ==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
> > LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
> > cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
> > ==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
> > LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
> > cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
> > ==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
> > LHAPDF 6.1.6 loading /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/lhapdf/6.1.6-njopjo/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
> > cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
> > ==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
> >
> >
> > > On Oct 10, 2017, at 7:55 AM, Matthew Nguyen ***@***.***> wrote:
> > >
> > > @slava77 I ran 100 events starting from GEN-SIM in 9_2_13. On a 2 GHz machine I'm only clocking in at 20 sec/event. We'll have to create a huge number of PDs to accomodate the 60 sec/evt, so it's important to understand which is more accurate. Is there something missing from my approach?
> > > I simply ran the following driver, adding only process.Timing = cms.Service("Timing")
> > >
> > > step3 --runUnscheduled --conditions auto:phase1_2017_realistic -s RAW2DIGI,L1Reco,RECO,EI,PAT --datatier GEN-SIM-RECO,MINIAODSIM -n 2 --era Run2_2017_pp_on_XeXe --eventcontent RECOSIM,MINIAODSIM --filein /store/user/mnguyen//hydjetDrum5_XeXe_MB_921p12/hydjetDrum5_XeXe_MB_921p12/crab_Hydjet_Quenched_MB_XeXe_5442GeV_DIGI_9212p1/171009_174655/0000step2_DIGI_L1_DIGI2RAW_HLT_1.root --no_exec
> > >
> > > —
> > > You are receiving this because you were mentioned.
> > > Reply to this email directly, view it on GitHub, or mute the thread.
> > >
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub, or mute the thread.
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
On 10/9/17 10:55 PM, Matthew Nguyen wrote:
@slava77 <https://github.com/slava77> I ran 100 events starting from
GEN-SIM in 9_2_13. On a 2 GHz machine I'm only clocking in at 20
sec/event. We'll have to create a huge number of PDs to accomodate the
60 sec/evt, so it's important to understand which is more accurate. Is
there something missing from my approach?
I simply ran the following driver, adding only process.Timing =
cms.Service("Timing")
step3 --runUnscheduled --conditions auto:phase1_2017_realistic -s
RAW2DIGI,L1Reco,RECO,EI,PAT --datatier GEN-SIM-RECO,MINIAODSIM -n 2
--era Run2_2017_pp_on_XeXe --eventcontent RECOSIM,MINIAODSIM --filein
/store/user/mnguyen//hydjetDrum5_XeXe_MB_921p12/hydjetDrum5_XeXe_MB_921p12/crab_Hydjet_Quenched_MB_XeXe_5442GeV_DIGI_9212p1/171009_174655/0000step2_DIGI_L1_DIGI2RAW_HLT_1.root
--no_exec
It is unclear from this if you have processed all of the events in the
lumisections available in the parent GEN-SIM file.
On my machine the ttbar MC production (RECO+miniAOD) takes about 17 s/event.
You can use this to normalize to your 20 sec
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20749 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdcbn3oAfJ_rXR9RzAxnG-b9k9Iketzks5sqwa6gaJpZM4Pu_ZN>.
|
@slava77 I processed every one of those events, and those were all that were GS'd. So there is no "smaller files finish first" issue. |
On 10/10/17 5:47 AM, Matthew Nguyen wrote:
@slava77 <https://github.com/slava77> I processed every one of those
events, and those were all that were GS'd. So there is no "smaller files
finish first" issue.
OK.
Please check the distribution of the number of tracks
and also the time per event in a similar ordered way that I did to make
a comparison.
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20749 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdcbu9uEW-Unu4BQV36ekm1K1nT4QB0ks5sq2dwgaJpZM4Pu_ZN>.
|
I see that the recent checks were made in CMSSW_9_2_13. |
I still see slightly more than a factor of 2 larger CPU time compared to wf 10224.0, which is just north of 20 sec/event on my 2 GHz machine. I see no large difference running in the 94X IB, compared to 9213. I have confirmed that I see events with reconstructed tracks up to O(10000), the highest being about 7,000 tracks in my sample. |
Customizes the pp reconstruction to keep timing in check.
To be used for XeXe data taking on October 12th (92X version).
Adds a relVal worklow to test (148)
Supersedes #20715