Switch FastCircleFit to use Eigen, generalize FastCircleFit and RZLine interfaces #15260

makortel · 2016-07-22T12:51:07Z

This PR switches FastCircleFit from TMatrixD to Eigen and floats (roughly a 3x speedup in the circle fit, but not really visible in global picture). It also generalizes the FastCircleFit and RZLine interfaces to work with e.g. std::array. On the same go RZLine

no longer uses std::vector, which reduces the number of memory allocations from SeedGeneratorFromRegionHitsEDProducer by ~11 % (phase1 ttbar+35PU; for the full job with tracking-only RECO,DQM,VALIDATION the reduction is ~1 %); no noticeable effect on CPU time though (with igprof)
allows to pass the square of errZ, to avoid sqr(sqrt(X)) in PixelQuadrupletGenerator and CAHitQuadrupletGenerator

Tested in 8_1_0_pre8 and CMSSW_8_1_X_2016-07-19-2300. Tiny changes are expected in 2017 workflows from moving from double to float in FastCircleFit and removing sqr(sqrt(X)). No changes are expected in 2016 or phase2 workflows.

Here is a set of MTV plots for phase1 (1000 ttbar+35PU events; with an intermediate point after double->float in FastCircleFit)
https://mkortela.web.cern.ch/mkortela/tracking/validation/CMSSW_8_1_0_pre8_fcf_rzline/

@rovere @VinInn @felicepantaleo

…loat With double precision there would be no changes. Moving at the same to float incurs numerical differences, but in simple tests those seem to be smaller than 1 % in the circle parameters. I think this is acceptable given that the class is called "Fast...".

Needed for generic interface, especially for avoiding DynArray when std::array's are given, i.e. we know statically the size.

Since it's already #included outside this package, interface is the proper place (especially now that the class is header-only)

cmsbuild · 2016-07-22T12:51:17Z

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_8_1_X.

It involves the following packages:

CommonTools/Utils
RecoPixelVertexing/PixelLowPtUtilities
RecoPixelVertexing/PixelTrackFitting
RecoPixelVertexing/PixelTriplets
RecoTracker/TkSeedGenerator
Validation/RecoTrack

@cvuosalo, @dmitrijus, @cmsbuild, @slava77, @vanbesien, @davidlange6 can you please review it and eventually sign? Thanks.
@ghellwig, @GiacomoSguazzoni, @rovere, @VinInn, @mschrode, @wmtford, @gpetruc, @dgulhan this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

VinInn · 2016-07-22T12:52:43Z

@cmsbuild , please test

cmsbuild · 2016-07-22T12:53:05Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14209/console

cmsbuild · 2016-07-22T16:23:46Z

+1
Tested at: aa86788
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15260/14209/summary.html

cmsbuild · 2016-07-22T17:16:54Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15260/14209/summary.html

dmitrijus · 2016-07-26T13:01:53Z

+1

cvuosalo · 2016-07-26T23:04:26Z

RecoPixelVertexing/PixelTrackFitting/interface/RZLine.h

+      z[i] = p.z();
+    }
+
+    float simpleCot2 = sqr( (z[n-1]-z[0])/ (r[n-1]-r[0]) );


Is there a division by zero possibility here if (r[n-1]-r[0]) == 0? Or would that be prevented from happening? Or would it take too long to check for the zero value?

(Note that this code is exactly as before) I guess it could, in principle, happen that the first and last hit would have exactly the same r, if all hits come from FPix/TID/TEC. But I'd expect this situation to be very unlikely, especially given the constraints of seeds to point towards beamspot (even if loosely on strip-triplet steps).

cvuosalo · 2016-07-27T19:33:39Z

RecoPixelVertexing/PixelTrackFitting/interface/RZLine.h

+    linearFit(r.data(), z.data(), n, errZ2.data(), cotTheta_, intercept_, covss_, covii_, covsi_);
+    chi2_ = 0.f;
+    for(size_t i=0; i<n; ++i) {
+      chi2_ += sqr( ((z[i]-intercept_) - cotTheta_*r[i]) ) / errZ2[i];


How safe is this division? Could unusual circumstances cause errZ2 to contain some zero values? Is it worth some possible performance cost to protect against division by zero?

(Note that this code is exactly as before) In practice errZ2 can be zero only if some TrackingRecHit has zero czz or zero rerr. I guess those would be a sign of something going wrong elsewhere.

cvuosalo · 2016-07-27T22:00:38Z

TrackingNtuple.cc has two old problems: It doesn't support multi-threading, and it uses std::isnan. On line 175, the EDAnalyzer inheritance should be changed to the proper multi-threading version. Also, for line 211, the static analyzer reports:
std::isnan / std::isinf does not work when fast-math is used. Please use edm::isNotFinite from 'FWCore/Utilities/interface/isNotFinite.h'

These fixes could be done in this PR or a later one.

cvuosalo · 2016-07-27T22:20:49Z

Jenkins tests show no differences except tiny ones for workflow 10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017, as expected. Extended tests with 70 events each against baseline CMSSW_8_1_0_pre9 for workflows 1313.0_QCD_Pt_3000_3500_13, 1316.0_SingleElectronPt1000, and 10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017 also show no differences, except for the 2017 one that has numerous, tiny, insignificant differences.

makortel · 2016-07-28T15:33:25Z

@cvuosalo

TrackingNtuple.cc has two old problems: It doesn't support multi-threading, and it uses std::isnan. On line 175, the EDAnalyzer inheritance should be changed to the proper multi-threading version. Also, for line 211, the static analyzer reports:
std::isnan / std::isinf does not work when fast-math is used. Please use edm::isNotFinite from 'FWCore/Utilities/interface/isNotFinite.h'

These fixes could be done in this PR or a later one.

I'll follow up in a separate PR as this PR is not about TrackingNtuple and I have some other updates for TrackingNtuple in pipeline. The std::isnan should be sufficient here as the variable being checked for NaN is explicitly set to std::numeric_limits<float>::quiet_NaN() (unless SimHit::timeOfFlight() can return NaN). But I'll change it to edm::isFinite.

By the way, the message from static analyzer is a bit misleading. In 8_1_X there is no FWCore/Utilities/interface/isNotFinite.h, but FWCore/Utilities/interface/isFinite.h, which has both edm::isFinite and edm::isNotFinite functions.

cvuosalo · 2016-07-28T20:50:11Z

+1

For #15620 aa86788

For tracking, switching FastCircleFit from using TMatrixD to Eigen.

The code changes are satisfactory. Jenkins tests and extended tests described above show no significant differences except for tiny differences in a 2017 Phase 1 workflow. CPU timing tests on workflow 10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017 with 70 events against baseline CMSSW_8_1_0_pre9 show a possible very slight improvement for affected modules.

Measured with DQM and Validation:

    pixelLessStepSeeds   168.827 ms/ev -> 165.48 ms/ev
    tobTecStepSeedsTripl 34.2539 ms/ev -> 34.0271 ms/ev

Another measure without DQM and Validation:

 -0.013014      -0.04%       249.03 ms/ev ->       245.81 ms/ev pixelLessStepSeeds
 -0.019924      -0.01%        52.13 ms/ev ->        51.10 ms/ev tobTecStepSeedsTripl

These values imply about a 1% improvement in timing, but the measurements should not be taken as definitive since the uncertainty is probably at least comparable to the seeming improvement.

makortel · 2016-07-29T07:42:46Z

@cvuosalo Replying to the question you sent via e-mail regarding the timing improvements on pixelLessStepSeeds and tobTecStepSeedsTripl. My MTV plots above actually show improvement on the same modules, although so small that they could be mainly fluctuations. Nevertheless some improvement (even if tiny) is expected because of removing std::vector (and hence heap allocations) from RZLine.

The modules affected by this PR are strip triplet seeding (i.e. the ones above) and pixel quadruplet seeding (initialStepPreSplitting, lowPtQuad, detachedQuad). But as I mentioned in the description, the time spent in FastCircleFit or RZLine is so small that the improvements are smaller than fluctuations of small-scale tests.

cvuosalo · 2016-07-29T17:09:41Z

Here are the timing changes for the affected modules mentioned by @makortel above. These results come from the same tests described in my approval message.

   -0.014277      -0.01%        80.15 ms/ev ->        79.01 ms/ev initialStepSeedsPreSplitting
   -0.004937      -0.01%       213.00 ms/ev ->       211.95 ms/ev lowPtQuadStepSeeds
   -0.014897      -0.04%       212.21 ms/ev ->       209.07 ms/ev detachedQuadStepSeeds

These all show about a 1% timing improvement, again with the caveat that the measurement uncertainty is probably at least comparable to this change.

makortel added 7 commits July 20, 2016 11:04

Fully inline FastCircleFit

a798082

Needed for generic interface, especially for avoiding DynArray when std::array's are given, i.e. we know statically the size.

Add DynArray::data()

38185d9

Restructure RZLine calculations, generalize construction, inline

8c94717

Move to std::array, avoid sqr(sqrt())

d8a7fe3

Move RZLine.h to interface

d8ab96a

Since it's already #included outside this package, interface is the proper place (especially now that the class is header-only)

Document constructor variants

aa86788

cmsbuild added this to the Next CMSSW_8_1_X milestone Jul 22, 2016

cmsbuild added reconstruction-pending dqm-pending analysis-pending pending-signatures tests-pending orp-pending comparison-pending labels Jul 22, 2016

cmsbuild added tests-started and removed tests-pending labels Jul 22, 2016

cmsbuild added tests-approved and removed tests-started labels Jul 22, 2016

cmsbuild added comparison-available and removed comparison-pending labels Jul 22, 2016

cmsbuild added dqm-approved and removed dqm-pending labels Jul 26, 2016

cvuosalo reviewed Jul 26, 2016
View reviewed changes

cvuosalo reviewed Jul 27, 2016
View reviewed changes

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Jul 28, 2016

davidlange6 merged commit f43e8f6 into cms-sw:CMSSW_8_1_X Jul 29, 2016

makortel deleted the improveFastCircleRZLine branch February 12, 2018 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch FastCircleFit to use Eigen, generalize FastCircleFit and RZLine interfaces #15260

Switch FastCircleFit to use Eigen, generalize FastCircleFit and RZLine interfaces #15260

makortel commented Jul 22, 2016

cmsbuild commented Jul 22, 2016

VinInn commented Jul 22, 2016

cmsbuild commented Jul 22, 2016 •

edited

cmsbuild commented Jul 22, 2016

cmsbuild commented Jul 22, 2016

dmitrijus commented Jul 26, 2016

cvuosalo Jul 26, 2016

makortel Jul 28, 2016

cvuosalo Jul 27, 2016

makortel Jul 28, 2016

cvuosalo commented Jul 27, 2016

cvuosalo commented Jul 27, 2016

makortel commented Jul 28, 2016

cvuosalo commented Jul 28, 2016

makortel commented Jul 29, 2016

cvuosalo commented Jul 29, 2016

Switch FastCircleFit to use Eigen, generalize FastCircleFit and RZLine interfaces #15260

Switch FastCircleFit to use Eigen, generalize FastCircleFit and RZLine interfaces #15260

Conversation

makortel commented Jul 22, 2016

cmsbuild commented Jul 22, 2016

VinInn commented Jul 22, 2016

cmsbuild commented Jul 22, 2016 • edited

cmsbuild commented Jul 22, 2016

cmsbuild commented Jul 22, 2016

dmitrijus commented Jul 26, 2016

cvuosalo Jul 26, 2016

Choose a reason for hiding this comment

makortel Jul 28, 2016

Choose a reason for hiding this comment

cvuosalo Jul 27, 2016

Choose a reason for hiding this comment

makortel Jul 28, 2016

Choose a reason for hiding this comment

cvuosalo commented Jul 27, 2016

cvuosalo commented Jul 27, 2016

makortel commented Jul 28, 2016

cvuosalo commented Jul 28, 2016

makortel commented Jul 29, 2016

cvuosalo commented Jul 29, 2016

cmsbuild commented Jul 22, 2016 •

edited