Performance improvement in quadruplet seeding (70X) #791

makortel · 2013-09-11T13:35:56Z

These changes improve the CPU performance of the quadruplet seed generation used with Phase1 pixel geometry. Only the implementation of the QuadrupletSeedMerger is altered, the results should stay the same.

Forward port of #784.

cmsbuild · 2013-09-11T13:37:36Z

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_7_0_X.

Performance improvement in quadruplet seeding (70X)

It involves the following packages:

RecoPixelVertexing/PixelTriplets

@thspeer, @slava77 can you please review it and eventually sign? Thanks.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.

slava77 · 2013-09-11T13:48:27Z

@makortel
Hi Matti,

Does this mean that we now have a GT which would allow to run a workflow with this seeding enabled?
(Does this show up in the UPG2017 scenarios?)

makortel · 2013-09-11T14:18:46Z

Hi Slava,

We have a GT in 620_SLHCX at least, about 70X I'm not sure. The 'upgrade2017' in autoCond.py of 70X still points to 612_SLHCX GTs.

If I interpret runTheMatrix.py --show --what upgrade -e correctly, UPG2017 should include quadruplet seeding.

I tested the changes only in 620_SLHC1, and since there were no conflicts between 620_SLHC1 and 70X in this area, I'd expect the patches to work in 70X too (but of course actually testing them would be better).

nclopezo · 2013-09-12T08:32:14Z

Hi,

I ran the usual tests on top of CMSSW_7_0_X_2013-09-12-0200, all tests passed:

https://cmssdt.cern.ch/jenkins/job/Pull-Request-Integration/ARCHITECTURE=slc5_amd64_gcc481/461/console

you can see the artifacts here:

https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/461

slava77 · 2013-09-24T23:22:36Z

@ianna @demattia @davidlange6
Hi
We will need an appropriate pixel upgrade geometry/GT to make this test to work.
If a GT for 70X actually exists, then all that's needed is a matching entry in the autoCond.py
Also, back in the days, the goal was to have this all working in the mainstream development release 😄

Cheers

Slava

davidlange6 · 2013-09-24T23:25:39Z

Hi Slava

Its still the goal. but its not yet achieved... (and not likely to be soon)

On 9/24/2013 4:22 PM, slava77 wrote:

@ianna https://github.com/ianna @demattia
https://github.com/demattia @davidlange6 https://github.com/davidlange6
Hi
We will need an appropriate pixel upgrade geometry/GT to make this test
to work.
If a GT for 70X actually exists, then all that's needed is a matching
entry in the autoCond.py
Also, back in the days, the goal was to have this all working in the
mainstream development release 😄

Cheers

Slava

—
Reply to this email directly or view it on GitHub
#791 (comment).

makortel · 2013-09-26T07:12:57Z

Hi,

Because #772 got merged, this one will no longer compile. @ktf, what would be the preferred way to fix that? Merge #772 to this branch and commit a fix, rebase this on top of recent 70X IB (and make a new PR?), or something else?

I have also a more general question on the integration of this PR (@davidlange6, @slava77, @thspeer). Given that this PR can not be meaningfully tested in 70X any time soon (interpreting David's comment), should these changes be tested then in 620_SLHCX first (#784)? Or what would be a good way to proceed (also with possibly other upgrade-motivated performance improvements)?

slava77 · 2013-09-26T07:28:54Z

Hi Matti
Please rebase this branch onto a more fresh 70X.
Merges in pull requests are often quite inconvenient

Thank you

Slava

ktf · 2013-09-26T08:06:53Z

Agreed. A rebase is usually a much cleaner way of doing things.

makortel · 2013-09-26T08:08:34Z

OK, will do. Just to be clear, should I then push the rebased changes to a new branch and PR?

ktf · 2013-09-26T08:09:23Z

I personally prefer to reuse the same Pull Requests, but it's up to Slava / Thomas.

CPU performance improvement, results stayed the same.

…riplet pairs. CPU performance improvement, results stay the same.

CPU performance improvement, results stay the same.

… hits being shared. CPU performance improvement, results stay the same.

CPU performance improvement, results stay the same.

slava77 · 2013-09-26T09:06:38Z

Hi Matti,
Please use the same PR.

Thanks

Slava

makortel · 2013-09-26T09:57:30Z

Rebased on top of CMSSW_7_0_X_2013-09-26-0200. I fixed the compilation error within a69d6ce.

cmsbuild · 2013-09-26T09:57:33Z

Pull request #791 was updated. @nclopezo, @smuzaffar, @thspeer, @slava77 can you please check and sign again.

nclopezo · 2013-10-02T10:09:46Z

+1
https://cmssdt.cern.ch/jenkins/job/Pull-Request-Integration/ARCHITECTURE=slc5_amd64_gcc481/739/console
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/739/build.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/739/unitTests.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/739/runTheMatrix-results
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/739/llvm-analysis/
Baseline comparison:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_7_0_X_2013-10-02-0200+791/92/

ktf · 2013-10-04T08:56:09Z

@slava77 @thspeer any news on this?

slava77 · 2013-10-04T09:02:07Z

@ianna @demattia @davidlange6

Hi Giulio,
I was hoping that we will get a valid GT for 2017 in 70X. Two weeks ago it sounded like "yes, it's possible",
but then, see David's comment above, apparently it's not yet testable directly in 70X.
I will check the 62XSLHC version first and then assume it should work in 70X as well.
Should be able to get to this before pre6 deadline.

Slava

slava77 · 2013-10-09T09:04:07Z

@slava77 working on it

slava77 · 2013-10-09T10:49:41Z

@davidlange6
Hi David,
Is the regular runTheMatrix.py supposed to work in 62XSLHC?
I'm trying CMSSW_6_2_0_SLHC1: runTheMatrix.py -s --useInput all
fails in all workflows for various reasons.

davidlange6 · 2013-10-09T12:57:09Z

Hi Slava

No (thus the name)

-----Original Message-----
From: slava77 [notifications@github.commailto:notifications@github.com]
Sent: Wednesday, October 09, 2013 03:49 AM Pacific Standard Time
To: cms-sw/cmssw
Cc: davidlange6
Subject: Re: [cmssw] Performance improvement in quadruplet seeding (70X) (#791)

@davidlange6 https://github.com/davidlange6
Hi David,
Is the regular runTheMatrix.py supposed to work in 62XSLHC?
I'm trying CMSSW_6_2_0_SLHC1: runTheMatrix.py -s --useInput all
fails in all workflows for various reasons.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/791#issuecomment-25961962.

slava77 · 2013-10-09T20:49:25Z

@davidlange6
Hi David,
Now, sticking to the upgrade scenarios.

I suppose the quadruplets are for the 2017 setup.
Is there a working matrix version with pileup that would include quadruplets?

(I tried, but still keep failing: first the matrix PU dataset was missing, after switching to an existing GEN-SIM minbias input the mixer is still broken with a more obscure "vector::_M_range_check" exception).
Is there a magic spell ?

I'm trying to run 8100.0
the crash is in what's below.
Does any of it look familiar?

Exception of type St12out_of_range (address 0x2b05ac538b40)
Destructor std::out_of_range::~out_of_range() (0x2b056feb7e60)
Stack:
1: 0x2b0584408028 HcalSimParameters::samplingFactor(DetId const&) const + 72 [/afs/cern.ch/cms/slc5_amd64_gcc472/cms/cmssw/CMSSW_6_2_0_SLHC1/lib/slc5_amd64_gcc472/libSimCalorimetryHcalSimAlgos.so + 196648]
2: 0x2b0584408797 HcalSimParameters::simHitToPhotoelectrons(DetId const&) const + 55 [/afs/cern.ch/cms/slc5_amd64_gcc472/cms/cmssw/CMSSW_6_2_0_SLHC1/lib/slc5_amd64_gcc472/libSimCalorimetryHcalSimAlgos.so + 198551]

davidlange6 · 2013-10-09T20:54:09Z

Hi Slava

We usually test with

runTheMatrix.py -w upgrade -l 3300,3400,4100,4400

(where you would care about the first one for 2017)

Maybe Gaelle can clarify about the pileup scenarios - I'm not sure how the minbias gets picked up by the pyrelval stuff.

From: slava77 [notifications@github.com]
Sent: Wednesday, October 09, 2013 1:49 PM
To: cms-sw/cmssw
Cc: davidlange6
Subject: Re: [cmssw] Performance improvement in quadruplet seeding (70X) (#791)

@davidlange6 https://github.com/davidlange6
Hi David,
Now, sticking to the upgrade scenarios.

I suppose the quadruplets are for the 2017 setup.
Is there a working matrix version with pileup that would include quadruplets?

(I tried, but still keep failing: first the matrix PU dataset was missing, after switching to an existing GEN-SIM minbias input the mixer is still broken with a more obscure "vector::Mrange_check" exception).
Is there a magic spell ?

I'm trying to run 8100.0
the crash is in what's below.
Does any of it look familiar?

Exception of type St12out_of_range (address 0x2b05ac538b40)
Destructor std::out_of_range::~out_of_range() (0x2b056feb7e60)
Stack:
1: 0x2b0584408028 HcalSimParameters::samplingFactor(DetId const&) const + 72 [/afs/cern.ch/cms/slc5_amd64_gcc472/cms/cmssw/CMSSW_6_2_0_SLHC1/lib/slc5_amd64_gcc472/libSimCalorimetryHcalSimAlgos.so + 196648]
2: 0x2b0584408797 HcalSimParameters::simHitToPhotoelectrons(DetId const&) const + 55 [/afs/cern.ch/cms/slc5_amd64_gcc472/cms/cmssw/CMSSW_6_2_0_SLHC1/lib/slc5_amd64_gcc472/libSimCalorimetryHcalSimAlgos.so + 198551]

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/791#issuecomment-26007423.

boudoul · 2013-10-10T01:35:37Z

@slava77 , @davidlange6

Hi Slava,
Indeed the 2017 WFs with PU as defined in the runthematrix cannot be run out of the box, you need to modify a bit the configuration manually :
from 620_SLHC1
git cms-addpkg Configuration/PyReleaseValidation
and edit
Configuration/PyReleaseValidation/python/relval_steps.py

Replace DES17_61_V5::All by auto:upgrade2017 everywhere in this file
Replace /RelValMinBias_TuneZ2star_14TeV/CMSSW_6_1_2_SLHC6-DES17_61_V5_UPG2017-v1/GEN-SIM by
/RelValMinBias_TuneZ2star_14TeV/CMSSW_6_2_0_SLHC1-DES17_62_V7_UPG2017-v1/GEN-SIM
everywhere in this file
The PU scenario is defined line 1646 and line 2277 - If you want a different scenario (presently AVE_20_BX_25ns ) you also need to change those 2 lines with your favorite PU scenario

Finally the command:
runTheMatrix.py -l 5100,8100 --what upgrade
will run 2 WFs (FourMuPt1-200 =5100 and TTbar14TeV =8100 ) with PU in the 2017 scenario
Hope this helps

slava77 · 2013-10-10T05:43:02Z

Thanks, Gaelle.
I was missing the global tag.
The minbias sample choice was a bit more obvious (just because the 61X samples are gone by now)

slava77 · 2013-10-10T08:45:14Z

the GT didn't help, I get the same failure.
You can see my step1 and step2 in cmsDriver terms below (this is a slightly modified 8100 wf).
Where did it all go wrong?

cmsDriver.py TTbar_Tauola_14TeV_cfi --conditions auto:upgrade2017 --eventcontent FEVTDEBUG --relval 9000,100 -s GEN,SIM --datatier GEN-SIM --beamspot Gauss --customise SLHCUpgradeSimulations/Configuration/combinedCustoms.cust_2017 --geometry Extended2017 --magField 38T_PostLS1 -n 100 --pileup AVE_45_BX_25ns --customise Validation/Performance/TimeMemoryInfo.py --fileout file:step1.root > step1_TTbar_14TeV+TTbar_UPG2017PU20_14_DES+DIGIPUUP17DES+RECOPUUP17DES+HARVESTUP17DES.log 2>&1

in: /store/disk00/slava77/reltest/CMSSW_6_2_0_SLHC1-orig going to execute cd 8100_TTbar_14TeV+TTbar_UPG2017PU20_14_DES+DIGIPUUP17DES+RECOPUUP17DES+HARVESTUP17DES

cmsDriver.py step2 --conditions auto:upgrade2017 --pileup_input dbs:/RelValMinBias_TuneZ2star_14TeV/CMSSW_6_2_0_SLHC1-DES17_62_V7_UPG2017-v1/GEN-SIM --eventcontent FEVTDEBUGHLT -s DIGI,L1,DIGI2RAW --datatier GEN-SIM-DIGI-RAW --customise SLHCUpgradeSimulations/Configuration/combinedCustoms.cust_2017 --geometry Extended2017 --magField 38T_PostLS1 -n 100 --pileup AVE_45_BX_25ns --customise Validation/Performance/TimeMemoryInfo.py --filein file:step1.root --fileout file:step2.root > step2_TTbar_14TeV+TTbar_UPG2017PU20_14_DES+DIGIPUUP17DES+RECOPUUP17DES+HARVESTUP17DES.log 2>&1

boudoul · 2013-10-10T09:02:08Z

Ah i think it's because you have twice the option --customize in your cmsdriver commands:
--customise SLHCUpgradeSimulations/Configuration/combinedCustoms.cust_2017
--customise Validation/Performance/TimeMemoryInfo.py

Only the last one is taken into account and the cust_2017 is ignored , hence the crash.

Try to replace those 2 customise options by a single one:

--customise SLHCUpgradeSimulations/Configuration/combinedCustoms.cust_2017,Validation/Performance/TimeMemoryInfo.py

slava77 · 2013-10-10T14:37:23Z

Thanks a lot. This works now.

pow() is supposedly (much) slower, although in the big picture the effect on timing is very small. Physics results should stay the same.

cmsbuild · 2013-10-14T11:32:37Z

Pull request #791 was updated. @nclopezo, @smuzaffar, @thspeer, @slava77 can you please check and sign again.

nclopezo · 2013-10-14T12:55:45Z

+1
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/882/summary.html

slava77 · 2013-11-05T20:12:38Z

@makortel
Hi Matti

(before this pull request dies from an old age)
I'd like to sign off on this based on just compilation and the fact that an equivalent was shown to work in 62XSLHC.

The equivalent is in #784, right?
If so, I see differences in changes from this PR and in #784
The code in "foundNodes" is not present in the 62XSLHC version of the file
https://github.com/makortel/cmssw/blame/79db1eabf1abc76d8a0d928bf7f54f18a89b6b75/RecoPixelVertexing/PixelTriplets/src/QuadrupletSeedMerger.cc#L171

https://github.com/cms-sw/cmssw/blob/CMSSW_6_2_X_SLHC/RecoPixelVertexing/PixelTriplets/src/QuadrupletSeedMerger.cc#L171

If this change with foundNodes is important, it should probably go to 62XSLHC as well.
Would you like to make a change for 62XSLHC, so that it can be tested there,
or would you argue that his will actually work without a need to test.

Please advise

Thanks

Slava

makortel · 2013-11-06T07:20:15Z

@slava77 @davidlange6
Hi Slava,

This PR is indeed functionally equivalent to #784. The difference regarding foundNodes is due to #772, which was integrated to 70X, but not backported to 62XSLHC. Of course I can make a PR for backport of #772 to 62XSLHC too if you and/or David want (the backport already exists and I have used it in my further tests on 62XSLHC, so it is just a matter of making a PR).

Maybe the best way to proceed would be to backport #772 to 62XSLHC and test it there? Then the QuadrupletSeedMerger.cc would again be equal between 62XSLHC and 70X.

Cheers,
Matti

slava77 · 2013-11-06T09:55:35Z

Hi Matti,

Thanks for the reminder about #772.
Now I remember that I've seen that code before ;)

I guess this is a good enough explanation/reference to proceed with
signoff of #791

Cheers

    --slava

On 11/6/13, 8:20 AM, Matti Kortelainen wrote:

@slava77 https://github.com/slava77 @davidlange6
https://github.com/davidlange6
Hi Slava,

This PR is indeed functionally equivalent to #784
#784. The difference regarding
|foundNodes| is due to #772 #772,
which was integrated to 70X, but not backported to 62XSLHC. Of course I
can make a PR for backport of #772
#772 to 62XSLHC too if you and/or
David want (the backport already exists and I have used it in my further
tests on 62XSLHC, so it is just a matter of making a PR).

Maybe the best way to proceed would be to backport #772
#772 to 62XSLHC and test it there?
Then the QuadrupletSeedMerger.cc would again be equal between 62XSLHC
and 70X.

Cheers,
Matti

—
Reply to this email directly or view it on GitHub
#791 (comment).

Vyacheslav (Slava) Krutelyov
TAMU: Physics Dept Texas A&M MS4242, College Station, TX 77843-4242
CERN: 42-R-027
AIM/Skype: siava16 googleTalk: slava77@gmail.com
(630) 291-5128 Cell (US) +41 76 275 7116 Cell (CERN)

makortel · 2013-11-06T10:05:25Z

Hi Slava,

Ok, fine with me :)

Thanks,
Matti

slava77 · 2013-11-06T10:08:27Z

+1

compiled 79db1ea in CMSSW_7_0_X_2013-11-04-0200
Signing off based on tests in 62XSLHC done with #784

cmsbuild · 2013-11-06T10:12:39Z

This pull request is fully signed and it will be integrated in one of the next IBs unless changes or unless it breaks tests. @ktf can you please take care of it?

ktf · 2013-11-06T11:34:05Z

I guess we consider this a bugfix / technical enhancement right? In theory pre9
should be bug fixes only...

slava77 · 2013-11-06T12:07:07Z

Here we are talking about a piece of code that doesn't run in regular
workflows.
So, it's mostly in the "technical" land now, in some sense.

If there is a 71X queue, I think it can just go there instead.
When do we get a new devel branch open?

    --slava

On 11/6/13, 12:34 PM, Giulio Eulisse wrote:

I guess we consider this a bugfix / technical enhancement right? In
theory pre9
should be bug fixes only...

—
Reply to this email directly or view it on GitHub
#791 (comment).

Vyacheslav (Slava) Krutelyov
TAMU: Physics Dept Texas A&M MS4242, College Station, TX 77843-4242
CERN: 42-R-027
AIM/Skype: siava16 googleTalk: slava77@gmail.com
(630) 291-5128 Cell (US) +41 76 275 7116 Cell (CERN)

Reco improvements -- Performance improvement in quadruplet seeding (70X)

makortel added 6 commits September 26, 2013 10:11

Ask geographicalId() directly from TransientTrackingRecHit.

afd31de

CPU performance improvement, results stayed the same.

Require triplets to share two hits before inserting them to list of t…

b432f4c

…riplet pairs. CPU performance improvement, results stay the same.

Use k-d tree for search of compatible triplets.

a69d6ce

CPU performance improvement, results stay the same.

Identify the (non-)shared hits of triplets already when requiring two…

9b2e8e9

… hits being shared. CPU performance improvement, results stay the same.

Removed one repeated std::vector allocation+copy in mySort().

fc9d1a5

CPU performance improvement, results stay the same.

Removed the rest of relevant TransientTrackingRecHit::hit() calls.

a453fcd

CPU performance improvement, results stay the same.

makortel mentioned this pull request Oct 14, 2013

Performance improvement in quadruplet seeding (62X_SLHC) #784

Merged

Change pow(x, 2.) to x*x

79db1ea

pow() is supposedly (much) slower, although in the big picture the effect on timing is very small. Physics results should stay the same.

ktf added a commit that referenced this pull request Nov 23, 2013

Merge pull request #791 from makortel/quadrupletseeds70x

f51888d

Reco improvements -- Performance improvement in quadruplet seeding (70X)

ktf merged commit f51888d into cms-sw:CMSSW_7_0_X Nov 23, 2013

makortel deleted the quadrupletseeds70x branch October 20, 2016 10:15

Performance improvement in quadruplet seeding (70X) #791

Performance improvement in quadruplet seeding (70X) #791

Conversation

makortel commented Sep 11, 2013

cmsbuild commented Sep 11, 2013

slava77 commented Sep 11, 2013

makortel commented Sep 11, 2013

nclopezo commented Sep 12, 2013

slava77 commented Sep 24, 2013

davidlange6 commented Sep 24, 2013

makortel commented Sep 26, 2013

slava77 commented Sep 26, 2013

ktf commented Sep 26, 2013

makortel commented Sep 26, 2013

ktf commented Sep 26, 2013

slava77 commented Sep 26, 2013

makortel commented Sep 26, 2013

cmsbuild commented Sep 26, 2013

nclopezo commented Oct 2, 2013

ktf commented Oct 4, 2013

slava77 commented Oct 4, 2013

slava77 commented Oct 9, 2013

slava77 commented Oct 9, 2013

davidlange6 commented Oct 9, 2013

slava77 commented Oct 9, 2013

davidlange6 commented Oct 9, 2013

boudoul commented Oct 10, 2013

slava77 commented Oct 10, 2013

slava77 commented Oct 10, 2013

in: /store/disk00/slava77/reltest/CMSSW_6_2_0_SLHC1-orig going to execute cd 8100_TTbar_14TeV+TTbar_UPG2017PU20_14_DES+DIGIPUUP17DES+RECOPUUP17DES+HARVESTUP17DES

boudoul commented Oct 10, 2013

slava77 commented Oct 10, 2013

cmsbuild commented Oct 14, 2013

nclopezo commented Oct 14, 2013

slava77 commented Nov 5, 2013

makortel commented Nov 6, 2013

slava77 commented Nov 6, 2013

makortel commented Nov 6, 2013

slava77 commented Nov 6, 2013

cmsbuild commented Nov 6, 2013

ktf commented Nov 6, 2013

slava77 commented Nov 6, 2013