Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecal multifit optimized #5321

Merged
merged 8 commits into from Sep 23, 2014

Conversation

bendavid
Copy link
Contributor

Move to Eigen for vectors/matrices, remove std::set, and some optimizations of the algorithm itself (avoid explicit matrix inversion and make some of required matrix solve operations less complex)

Didn't manage to run igprof yet, but based on a test of an earlier eigen version which @VinInn did there shouldn't be anything very bad here.

Output should be strictly identical down to numerical precision. (Verified on one event that all rechits are the same down to 5 decimal places, but there can be smaller numerical differences clearly.)

Speedup of the EcalUncalibRecHitMultiFitAlgo is just short of a factor of 3. (~800ms -> ~280ms for pu40bx25 photon gun)

The EcalUncalibRecHitProducer has also been made a stream module.

@emanueledimarco @argiro @lgray

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @bendavid (Josh Bendavid) for CMSSW_7_2_X.

Ecal multifit optimized

It involves the following packages:

RecoLocalCalo/EcalRecAlgos
RecoLocalCalo/EcalRecProducers

@cmsbuild, @nclopezo, @StoyanStoynev, @slava77 can you please review it and eventually sign? Thanks.
@argiro this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
@nclopezo you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@bendavid
Copy link
Contributor Author

igprof results here
https://bendavid.web.cern.ch/bendavid/cgi-bin/perfResults/igprof-navigator/testecaligprof_0_CMSSW_7_2_X_2014-09-12-1400_slc6_amd64_gcc481/29

This looks really good in the sense that virtually all cpu time is in eigen internal functions, with most of it spent on "real work" (matrix decomposition, matrix solve, and matrix multiplication)

I played a bit with compiler flags without much luck.

I also tried moving to completely fixed size matrices, but this only gave a ~1% speedup, and this would prevent having a configurable (or in the future variable) number of populated bunch crossings in the algorithm. (The current implementation is using matrices with a fixed maximum size on the stack where there is a dynamic size such that a subset of the storage is used. This is done with the built-in eigen functionality of the "MaxColsAtCompileTime" template parameter.)

@VinInn
Copy link
Contributor

VinInn commented Sep 17, 2014

Excellent results. I think this is also an interesting showcase for the use of eigen.

@slava77
Copy link
Contributor

slava77 commented Sep 17, 2014

It seems that this will also fix most of the valgrid errors that I've seen

@bendavid
Copy link
Contributor Author

@slava77 the valgrind errors were within TMatrix/TVector classes themselves?

@slava77
Copy link
Contributor

slava77 commented Sep 17, 2014

@bendavid
not really (I ignore the uninitialized value in TObject default constructor),
apparently the std::arrays were not initialized.

@bendavid
Copy link
Contributor Author

Ok that was intentional (array memory was "hijacked" by the TMatrix/TVectors and initialized from there) Anyway, less relevant now.

@StoyanStoynev
Copy link
Contributor

@cmsbuild please test

@@ -84,7 +87,7 @@ template<class C> class EcalUncalibRecHitTimeWeightsAlgo
int ipulse = std::distance(amplitudes.begin(),amplit);
int bx = ipulse - 5;
int firstsamplet = std::max(0,bx + 3);
int offset = -3-bx;
int offset = 7-3-bx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This offset appears in exactly the same form three times. Can you make it a simple function of bx or at least define "7-3" once somewhere (could you elaborate for me where is the transition -3 -> 7-3 coming from? having to do with time slots I guess but still not clear why the result).

@bendavid
Copy link
Contributor Author

This is because the "FullSampleVector" and "FullSampleMatrix" were extended from 12 to 19 elements (with the extra elements zero), in order to allow the template matrix and covariance matrix to be filled purely using subvectors and submatrices with built-in eigen functionality.

(In the previous implementation this was done with hand-coded loops over the elements, where the zero elements were simply skipped when filling the templates and covariance matrices)

Could be made common someplace (but a more general solution will be needed if/when the ecal readout is shifted by one bx as is being considered, then this would have to move to the conditions db)

@bendavid
Copy link
Contributor Author

(these numbers are all having to do with offsets between the bx number and vector/matrix index, which is related to the position of a pulse at bx=X in the readout window)

std::swap(_ampvec.coeffRef(_nP-1),_ampvec.coeffRef(ipulseintime));
std::swap(_bxs.coeffRef(_nP-1),_bxs.coeffRef(ipulseintime));
ipulseintime = _nP - 1;
--_nP;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another candidate for a function - see line 285 and above it. Now it is just rewriting (most of) the code.

@bendavid
Copy link
Contributor Author

Yes, the exact set of vectors/matrices which are row/column swapped is not exactly the same in the two places, but the common part which touches the class members could be made common I agree.

@StoyanStoynev
Copy link
Contributor

@bendavid I leave to you to decide what is worth doing now and what later (with conditions, etc.), just let me know.

@bendavid
Copy link
Contributor Author

Indeed I would prefer to leave these issues for further cleanup and reorganization of inputs, migration to conditions, etc, for 73x.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_2_X IBs unless changes (tests are also fine). @nclopezo can you please take care of it?

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_2_X IBs unless changes (tests are also fine). @nclopezo can you please take care of it?

@slava77
Copy link
Contributor

slava77 commented Sep 23, 2014

are we waiting for a decision in 72X? (similar to other PRs)
If so, maybe a more obvious is to add an orp signature?

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_2_X IBs unless changes (tests are also fine). @nclopezo can you please take care of it?

davidlange6 added a commit that referenced this pull request Sep 23, 2014
@davidlange6 davidlange6 merged commit ea9127e into cms-sw:CMSSW_7_2_X Sep 23, 2014
bendavid pushed a commit to bendavid/cmssw that referenced this pull request Oct 13, 2014
Ecal multifit optimized
Resolved Conflicts:
	RecoLocalCalo/EcalRecAlgos/BuildFile.xml
	RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitProducer.h
	RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerMultiFit.cc
	RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerMultiFit.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants