New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ecal multifit optimized #5321
Ecal multifit optimized #5321
Conversation
…ion with optimized code
A new Pull Request was created by @bendavid (Josh Bendavid) for CMSSW_7_2_X. Ecal multifit optimized It involves the following packages: RecoLocalCalo/EcalRecAlgos @cmsbuild, @nclopezo, @StoyanStoynev, @slava77 can you please review it and eventually sign? Thanks. |
igprof results here This looks really good in the sense that virtually all cpu time is in eigen internal functions, with most of it spent on "real work" (matrix decomposition, matrix solve, and matrix multiplication) I played a bit with compiler flags without much luck. I also tried moving to completely fixed size matrices, but this only gave a ~1% speedup, and this would prevent having a configurable (or in the future variable) number of populated bunch crossings in the algorithm. (The current implementation is using matrices with a fixed maximum size on the stack where there is a dynamic size such that a subset of the storage is used. This is done with the built-in eigen functionality of the "MaxColsAtCompileTime" template parameter.) |
Excellent results. I think this is also an interesting showcase for the use of eigen. |
It seems that this will also fix most of the valgrid errors that I've seen |
@slava77 the valgrind errors were within TMatrix/TVector classes themselves? |
@bendavid |
Ok that was intentional (array memory was "hijacked" by the TMatrix/TVectors and initialized from there) Anyway, less relevant now. |
@cmsbuild please test |
@@ -84,7 +87,7 @@ template<class C> class EcalUncalibRecHitTimeWeightsAlgo | |||
int ipulse = std::distance(amplitudes.begin(),amplit); | |||
int bx = ipulse - 5; | |||
int firstsamplet = std::max(0,bx + 3); | |||
int offset = -3-bx; | |||
int offset = 7-3-bx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This offset appears in exactly the same form three times. Can you make it a simple function of bx or at least define "7-3" once somewhere (could you elaborate for me where is the transition -3 -> 7-3 coming from? having to do with time slots I guess but still not clear why the result).
This is because the "FullSampleVector" and "FullSampleMatrix" were extended from 12 to 19 elements (with the extra elements zero), in order to allow the template matrix and covariance matrix to be filled purely using subvectors and submatrices with built-in eigen functionality. (In the previous implementation this was done with hand-coded loops over the elements, where the zero elements were simply skipped when filling the templates and covariance matrices) Could be made common someplace (but a more general solution will be needed if/when the ecal readout is shifted by one bx as is being considered, then this would have to move to the conditions db) |
(these numbers are all having to do with offsets between the bx number and vector/matrix index, which is related to the position of a pulse at bx=X in the readout window) |
std::swap(_ampvec.coeffRef(_nP-1),_ampvec.coeffRef(ipulseintime)); | ||
std::swap(_bxs.coeffRef(_nP-1),_bxs.coeffRef(ipulseintime)); | ||
ipulseintime = _nP - 1; | ||
--_nP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another candidate for a function - see line 285 and above it. Now it is just rewriting (most of) the code.
Yes, the exact set of vectors/matrices which are row/column swapped is not exactly the same in the two places, but the common part which touches the class members could be made common I agree. |
@bendavid I leave to you to decide what is worth doing now and what later (with conditions, etc.), just let me know. |
Indeed I would prefer to leave these issues for further cleanup and reorganization of inputs, migration to conditions, etc, for 73x. |
This pull request is fully signed and it will be integrated in one of the next CMSSW_7_2_X IBs unless changes (tests are also fine). @nclopezo can you please take care of it? |
This pull request is fully signed and it will be integrated in one of the next CMSSW_7_2_X IBs unless changes (tests are also fine). @nclopezo can you please take care of it? |
are we waiting for a decision in 72X? (similar to other PRs) |
This pull request is fully signed and it will be integrated in one of the next CMSSW_7_2_X IBs unless changes (tests are also fine). @nclopezo can you please take care of it? |
Ecal multifit optimized Resolved Conflicts: RecoLocalCalo/EcalRecAlgos/BuildFile.xml RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitProducer.h RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerMultiFit.cc RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerMultiFit.h
Move to Eigen for vectors/matrices, remove std::set, and some optimizations of the algorithm itself (avoid explicit matrix inversion and make some of required matrix solve operations less complex)
Didn't manage to run igprof yet, but based on a test of an earlier eigen version which @VinInn did there shouldn't be anything very bad here.
Output should be strictly identical down to numerical precision. (Verified on one event that all rechits are the same down to 5 decimal places, but there can be smaller numerical differences clearly.)
Speedup of the EcalUncalibRecHitMultiFitAlgo is just short of a factor of 3. (~800ms -> ~280ms for pu40bx25 photon gun)
The EcalUncalibRecHitProducer has also been made a stream module.
@emanueledimarco @argiro @lgray