Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch FastCircleFit to use Eigen, generalize FastCircleFit and RZLine interfaces #15260

Merged
merged 7 commits into from Jul 29, 2016

Conversation

makortel
Copy link
Contributor

This PR switches FastCircleFit from TMatrixD to Eigen and floats (roughly a 3x speedup in the circle fit, but not really visible in global picture). It also generalizes the FastCircleFit and RZLine interfaces to work with e.g. std::array. On the same go RZLine

  • no longer uses std::vector, which reduces the number of memory allocations from SeedGeneratorFromRegionHitsEDProducer by ~11 % (phase1 ttbar+35PU; for the full job with tracking-only RECO,DQM,VALIDATION the reduction is ~1 %); no noticeable effect on CPU time though (with igprof)
  • allows to pass the square of errZ, to avoid sqr(sqrt(X)) in PixelQuadrupletGenerator and CAHitQuadrupletGenerator

Tested in 8_1_0_pre8 and CMSSW_8_1_X_2016-07-19-2300. Tiny changes are expected in 2017 workflows from moving from double to float in FastCircleFit and removing sqr(sqrt(X)). No changes are expected in 2016 or phase2 workflows.

Here is a set of MTV plots for phase1 (1000 ttbar+35PU events; with an intermediate point after double->float in FastCircleFit)
https://mkortela.web.cern.ch/mkortela/tracking/validation/CMSSW_8_1_0_pre8_fcf_rzline/

@rovere @VinInn @felicepantaleo

…loat

With double precision there would be no changes. Moving at the same to
float incurs numerical differences, but in simple tests those seem to
be smaller than 1 % in the circle parameters. I think this is
acceptable given that the class is called "Fast...".
Needed for generic interface, especially for avoiding DynArray when
std::array's are given, i.e. we know statically the size.
Since it's already #included outside this package, interface is the
proper place (especially now that the class is header-only)
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_8_1_X.

It involves the following packages:

CommonTools/Utils
RecoPixelVertexing/PixelLowPtUtilities
RecoPixelVertexing/PixelTrackFitting
RecoPixelVertexing/PixelTriplets
RecoTracker/TkSeedGenerator
Validation/RecoTrack

@cvuosalo, @dmitrijus, @cmsbuild, @slava77, @vanbesien, @davidlange6 can you please review it and eventually sign? Thanks.
@ghellwig, @GiacomoSguazzoni, @rovere, @VinInn, @mschrode, @wmtford, @gpetruc, @dgulhan this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@VinInn
Copy link
Contributor

VinInn commented Jul 22, 2016

@cmsbuild , please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 22, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14209/console

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

@dmitrijus
Copy link
Contributor

+1

z[i] = p.z();
}

float simpleCot2 = sqr( (z[n-1]-z[0])/ (r[n-1]-r[0]) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a division by zero possibility here if (r[n-1]-r[0]) == 0? Or would that be prevented from happening? Or would it take too long to check for the zero value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that this code is exactly as before) I guess it could, in principle, happen that the first and last hit would have exactly the same r, if all hits come from FPix/TID/TEC. But I'd expect this situation to be very unlikely, especially given the constraints of seeds to point towards beamspot (even if loosely on strip-triplet steps).

linearFit(r.data(), z.data(), n, errZ2.data(), cotTheta_, intercept_, covss_, covii_, covsi_);
chi2_ = 0.f;
for(size_t i=0; i<n; ++i) {
chi2_ += sqr( ((z[i]-intercept_) - cotTheta_*r[i]) ) / errZ2[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How safe is this division? Could unusual circumstances cause errZ2 to contain some zero values? Is it worth some possible performance cost to protect against division by zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that this code is exactly as before) In practice errZ2 can be zero only if some TrackingRecHit has zero czz or zero rerr. I guess those would be a sign of something going wrong elsewhere.

@cvuosalo
Copy link
Contributor

TrackingNtuple.cc has two old problems: It doesn't support multi-threading, and it uses std::isnan. On line 175, the EDAnalyzer inheritance should be changed to the proper multi-threading version. Also, for line 211, the static analyzer reports:
std::isnan / std::isinf does not work when fast-math is used. Please use edm::isNotFinite from 'FWCore/Utilities/interface/isNotFinite.h'

These fixes could be done in this PR or a later one.

@cvuosalo
Copy link
Contributor

Jenkins tests show no differences except tiny ones for workflow 10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017, as expected. Extended tests with 70 events each against baseline CMSSW_8_1_0_pre9 for workflows 1313.0_QCD_Pt_3000_3500_13, 1316.0_SingleElectronPt1000, and 10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017 also show no differences, except for the 2017 one that has numerous, tiny, insignificant differences.

@makortel
Copy link
Contributor Author

@cvuosalo

TrackingNtuple.cc has two old problems: It doesn't support multi-threading, and it uses std::isnan. On line 175, the EDAnalyzer inheritance should be changed to the proper multi-threading version. Also, for line 211, the static analyzer reports:
std::isnan / std::isinf does not work when fast-math is used. Please use edm::isNotFinite from 'FWCore/Utilities/interface/isNotFinite.h'

These fixes could be done in this PR or a later one.

I'll follow up in a separate PR as this PR is not about TrackingNtuple and I have some other updates for TrackingNtuple in pipeline. The std::isnan should be sufficient here as the variable being checked for NaN is explicitly set to std::numeric_limits<float>::quiet_NaN() (unless SimHit::timeOfFlight() can return NaN). But I'll change it to edm::isFinite.

By the way, the message from static analyzer is a bit misleading. In 8_1_X there is no FWCore/Utilities/interface/isNotFinite.h, but FWCore/Utilities/interface/isFinite.h, which has both edm::isFinite and edm::isNotFinite functions.

@cvuosalo
Copy link
Contributor

+1

For #15620 aa86788

For tracking, switching FastCircleFit from using TMatrixD to Eigen.

The code changes are satisfactory. Jenkins tests and extended tests described above show no significant differences except for tiny differences in a 2017 Phase 1 workflow. CPU timing tests on workflow 10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017 with 70 events against baseline CMSSW_8_1_0_pre9 show a possible very slight improvement for affected modules.

Measured with DQM and Validation:

    pixelLessStepSeeds   168.827 ms/ev -> 165.48 ms/ev
    tobTecStepSeedsTripl 34.2539 ms/ev -> 34.0271 ms/ev

Another measure without DQM and Validation:

 -0.013014      -0.04%       249.03 ms/ev ->       245.81 ms/ev pixelLessStepSeeds
 -0.019924      -0.01%        52.13 ms/ev ->        51.10 ms/ev tobTecStepSeedsTripl

These values imply about a 1% improvement in timing, but the measurements should not be taken as definitive since the uncertainty is probably at least comparable to the seeming improvement.

@makortel
Copy link
Contributor Author

@cvuosalo Replying to the question you sent via e-mail regarding the timing improvements on pixelLessStepSeeds and tobTecStepSeedsTripl. My MTV plots above actually show improvement on the same modules, although so small that they could be mainly fluctuations. Nevertheless some improvement (even if tiny) is expected because of removing std::vector (and hence heap allocations) from RZLine.

The modules affected by this PR are strip triplet seeding (i.e. the ones above) and pixel quadruplet seeding (initialStepPreSplitting, lowPtQuad, detachedQuad). But as I mentioned in the description, the time spent in FastCircleFit or RZLine is so small that the improvements are smaller than fluctuations of small-scale tests.

@davidlange6 davidlange6 merged commit f43e8f6 into cms-sw:CMSSW_8_1_X Jul 29, 2016
@cvuosalo
Copy link
Contributor

Here are the timing changes for the affected modules mentioned by @makortel above. These results come from the same tests described in my approval message.

   -0.014277      -0.01%        80.15 ms/ev ->        79.01 ms/ev initialStepSeedsPreSplitting
   -0.004937      -0.01%       213.00 ms/ev ->       211.95 ms/ev lowPtQuadStepSeeds
   -0.014897      -0.04%       212.21 ms/ev ->       209.07 ms/ev detachedQuadStepSeeds

These all show about a 1% timing improvement, again with the caveat that the measurement uncertainty is probably at least comparable to this change.

@makortel makortel deleted the improveFastCircleRZLine branch February 12, 2018 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants