Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(For MRPT 2.0) Discussion on Eigen, matrix classes and alternatives for a faster build time #496

Closed
2 tasks done
jlblancoc opened this issue Mar 30, 2017 · 9 comments · Fixed by #916
Closed
2 tasks done

Comments

@jlblancoc
Copy link
Member

jlblancoc commented Mar 30, 2017

Tasks:

  • Refactor MRPT matrix classes into C++14 containers + an asEigen() getter.
  • Move "Eigen plugin header" stuff to those new containers, minimizing the use of in-header template implementations. This will avoid the need to always include MRPT headers before Eigen, so the plugin mechanism works.

------- Original discussion --------
(cc: @jolting ) I noticed a large part of compilation time slow-down comes from the use of extensive, header-only libs, such as Eigen.

I wonder if there might exist an efficient solution that fulfills all:

  • Allows us to get Eigen #includes out of MRPT headers,
  • But does not enforce using dynamic memory allocation.

Perhaps one of the value variant containers may do the work?

I thought of using the PIMPL idiom, then allowing users that really need to use all Eigen methods to do the #include on their side and call a templatized getter method, or something alike.
Obviously, this is not acceptable since replacing all statically allocated matrices (they are inside all CPose3D, etc. classes) with dynamic memory is a shot in the shoe!

I think it's not possible, but just wanted to start this conversation in case there is some remote possibility of doing this without a huge efficiency cost...

@jolting
Copy link
Member

jolting commented Mar 30, 2017

Maybe something like the PCL methodology?

They instantiate template types in a CPP file.
https://github.com/PointCloudLibrary/pcl/blob/master/filters/src/approximate_voxel_grid.cpp

They implement it in a impl header
https://github.com/PointCloudLibrary/pcl/blob/master/filters/include/pcl/filters/impl/approximate_voxel_grid.hpp

Typically you don't need the impl header(which takes the longest to compile) unless you need to instantiate versions.
https://github.com/PointCloudLibrary/pcl/blob/master/filters/include/pcl/filters/approximate_voxel_grid.h

@jlblancoc
Copy link
Member Author

good! 👍

Will check if Eigen headers are already in a suitable form...

@jolting
Copy link
Member

jolting commented Mar 30, 2017

I don't believe they are. This will make deriving from Eigen types almost impossible. Possibly using composition instead of derivation will be the only solution. Basically, only use Eigen in the impl header.

@jlblancoc
Copy link
Member Author

jlblancoc commented Apr 2, 2017

As a reference to think about, here is the output from gcc -ftime-report for:

  1. A "simple" file without any heavy use of templates (0.34 s to compile)
[ 25%] Building CXX object libs/base/CMakeFiles/mrpt-base.dir/src/utils/CServerTCPSocket_common.cpp.o

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall    1383 kB ( 5%) ggc
 phase parsing           :   0.09 (26%) usr   0.03 (38%) sys   0.12 (28%) wall   11235 kB (37%) ggc
 phase lang. deferred    :   0.08 (24%) usr   0.03 (38%) sys   0.11 (26%) wall   11698 kB (39%) ggc
 phase opt and generate  :   0.17 (50%) usr   0.02 (25%) sys   0.19 (44%) wall    5822 kB (19%) ggc
 |name lookup            :   0.03 ( 9%) usr   0.00 ( 0%) sys   0.05 (12%) wall    2738 kB ( 9%) ggc
 |overload resolution    :   0.06 (18%) usr   0.02 (25%) sys   0.08 (19%) wall    8013 kB (27%) ggc
 callgraph construction  :   0.02 ( 6%) usr   0.00 ( 0%) sys   0.04 ( 9%) wall    1419 kB ( 5%) ggc
 callgraph optimization  :   0.00 ( 0%) usr   0.01 (13%) sys   0.01 ( 2%) wall     151 kB ( 1%) ggc
 ipa SRA                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall     263 kB ( 1%) ggc
 df live regs            :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall       0 kB ( 0%) ggc
 alias stmt walking      :   0.00 ( 0%) usr   0.01 (12%) sys   0.00 ( 0%) wall      14 kB ( 0%) ggc
 preprocessing           :   0.02 ( 6%) usr   0.01 (12%) sys   0.03 ( 7%) wall     386 kB ( 1%) ggc
 parser (global)         :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 5%) wall    2844 kB ( 9%) ggc
 parser struct body      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1271 kB ( 4%) ggc
 parser function body    :   0.01 ( 3%) usr   0.01 (12%) sys   0.01 ( 2%) wall    1241 kB ( 4%) ggc
 parser inl. func. body  :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      61 kB ( 0%) ggc
 parser inl. meth. body  :   0.02 ( 6%) usr   0.01 (12%) sys   0.03 ( 7%) wall    1998 kB ( 7%) ggc
 template instantiation  :   0.10 (29%) usr   0.01 (13%) sys   0.14 (33%) wall   14901 kB (49%) ggc
 integration             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall     598 kB ( 2%) ggc
 tree gimplify           :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall     293 kB ( 1%) ggc
 tree CFG cleanup        :   0.02 ( 6%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall       4 kB ( 0%) ggc
 tree PTA                :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      31 kB ( 0%) ggc
 dominator optimization  :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      23 kB ( 0%) ggc
 tree CCP                :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      21 kB ( 0%) ggc
 tree PRE                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall      22 kB ( 0%) ggc
 tree FRE                :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      56 kB ( 0%) ggc
 CSE                     :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall       9 kB ( 0%) ggc
 dead store elim1        :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      15 kB ( 0%) ggc
 CPROP                   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall      22 kB ( 0%) ggc
 combiner                :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall      58 kB ( 0%) ggc
 integrated RA           :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall     314 kB ( 1%) ggc
 LRA non-specific        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall       6 kB ( 0%) ggc
 scheduling 2            :   0.01 ( 3%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall      17 kB ( 0%) ggc
 rest of compilation     :   0.02 ( 6%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall      39 kB ( 0%) ggc
 unaccounted todo        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 2%) wall       0 kB ( 0%) ggc
 TOTAL                 :   0.34             0.08             0.43              30167 kB
  1. A file making extensive use of templates (Eigen, mostly), taking 16.6 s to build:
[  9%] Building CXX object libs/base/CMakeFiles/mrpt-base.dir/src/random/RandomGenerator.cpp.o

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1383 kB ( 0%) ggc
 phase parsing           :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall    4374 kB ( 0%) ggc
 phase lang. deferred    :   5.70 (34%) usr   0.99 (50%) sys   6.69 (36%) wall  620261 kB (63%) ggc
 phase opt and generate  :  10.85 (65%) usr   0.98 (50%) sys  11.83 (64%) wall  355111 kB (36%) ggc
 |name lookup            :   0.64 ( 4%) usr   0.22 (11%) sys   0.95 ( 5%) wall   48747 kB ( 5%) ggc
 |overload resolution    :   3.40 (20%) usr   0.66 (34%) sys   4.15 (22%) wall  441614 kB (45%) ggc
 garbage collection      :   1.37 ( 8%) usr   0.00 ( 0%) sys   1.39 ( 7%) wall       0 kB ( 0%) ggc
 dump files              :   0.13 ( 1%) usr   0.05 ( 3%) sys   0.17 ( 1%) wall       0 kB ( 0%) ggc
 PCH preprocessor state restore:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 callgraph construction  :   0.35 ( 2%) usr   0.03 ( 2%) sys   0.36 ( 2%) wall   12859 kB ( 1%) ggc
 callgraph optimization  :   0.27 ( 2%) usr   0.06 ( 3%) sys   0.39 ( 2%) wall   13989 kB ( 1%) ggc
 ipa dead code removal   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ipa cp                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1417 kB ( 0%) ggc
 ipa inlining heuristics :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    1857 kB ( 0%) ggc
 ipa function splitting  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     213 kB ( 0%) ggc
 ipa profile             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const          :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      43 kB ( 0%) ggc
 ipa icf                 :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       9 kB ( 0%) ggc
 ipa SRA                 :   0.16 ( 1%) usr   0.04 ( 2%) sys   0.23 ( 1%) wall   21851 kB ( 2%) ggc
 cfg construction        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1006 kB ( 0%) ggc
 cfg cleanup             :   0.10 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall    1264 kB ( 0%) ggc
 trivially dead code     :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 df scan insns           :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      12 kB ( 0%) ggc
 df multiple defs        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs        :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 1%) wall       0 kB ( 0%) ggc
 df live regs            :   0.35 ( 2%) usr   0.00 ( 0%) sys   0.39 ( 2%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   0.12 ( 1%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.19 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall    1981 kB ( 0%) ggc
 register information    :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 1%) wall       0 kB ( 0%) ggc
 alias analysis          :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 1%) wall    6709 kB ( 1%) ggc
 alias stmt walking      :   0.14 ( 1%) usr   0.01 ( 1%) sys   0.13 ( 1%) wall     546 kB ( 0%) ggc
 register scan           :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      88 kB ( 0%) ggc
 parser (global)         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     488 kB ( 0%) ggc
 parser function body    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     577 kB ( 0%) ggc
 parser inl. meth. body  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     790 kB ( 0%) ggc
 template instantiation  :   5.00 (30%) usr   0.99 (50%) sys   5.98 (32%) wall  622199 kB (63%) ggc
 early inlining heuristics:   0.08 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 1%) wall    3896 kB ( 0%) ggc
 inline parameters       :   0.07 ( 0%) usr   0.01 ( 1%) sys   0.16 ( 1%) wall    7784 kB ( 1%) ggc
 integration             :   0.45 ( 3%) usr   0.01 ( 1%) sys   0.48 ( 3%) wall   49620 kB ( 5%) ggc
 tree gimplify           :   0.10 ( 1%) usr   0.05 ( 3%) sys   0.19 ( 1%) wall   17035 kB ( 2%) ggc
 tree eh                 :   0.03 ( 0%) usr   0.03 ( 2%) sys   0.02 ( 0%) wall    3865 kB ( 0%) ggc
 tree CFG construction   :   0.01 ( 0%) usr   0.02 ( 1%) sys   0.05 ( 0%) wall   13758 kB ( 1%) ggc
 tree CFG cleanup        :   0.15 ( 1%) usr   0.01 ( 1%) sys   0.10 ( 1%) wall     434 kB ( 0%) ggc
 tree VRP                :   0.15 ( 1%) usr   0.01 ( 1%) sys   0.18 ( 1%) wall    5203 kB ( 1%) ggc
 tree copy propagation   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     114 kB ( 0%) ggc
 tree PTA                :   0.31 ( 2%) usr   0.07 ( 4%) sys   0.31 ( 2%) wall    2265 kB ( 0%) ggc
 tree PHI insertion      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     843 kB ( 0%) ggc
 tree SSA rewrite        :   0.05 ( 0%) usr   0.02 ( 1%) sys   0.04 ( 0%) wall    9867 kB ( 1%) ggc
 tree SSA other          :   0.05 ( 0%) usr   0.02 ( 1%) sys   0.06 ( 0%) wall    1373 kB ( 0%) ggc
 tree SSA incremental    :   0.09 ( 1%) usr   0.01 ( 1%) sys   0.11 ( 1%) wall    1274 kB ( 0%) ggc
 tree operand scan       :   0.13 ( 1%) usr   0.07 ( 4%) sys   0.13 ( 1%) wall   27810 kB ( 3%) ggc
 dominator optimization  :   0.09 ( 1%) usr   0.02 ( 1%) sys   0.04 ( 0%) wall    2253 kB ( 0%) ggc
 tree SRA                :   0.06 ( 0%) usr   0.02 ( 1%) sys   0.12 ( 1%) wall    1031 kB ( 0%) ggc
 isolate eroneous paths  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree CCP                :   0.06 ( 0%) usr   0.04 ( 2%) sys   0.12 ( 1%) wall     370 kB ( 0%) ggc
 tree split crit edges   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1597 kB ( 0%) ggc
 tree reassociation      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      26 kB ( 0%) ggc
 tree PRE                :   0.27 ( 2%) usr   0.01 ( 1%) sys   0.29 ( 2%) wall    3292 kB ( 0%) ggc
 tree FRE                :   0.26 ( 2%) usr   0.04 ( 2%) sys   0.33 ( 2%) wall    4999 kB ( 1%) ggc
 tree code sinking       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     489 kB ( 0%) ggc
 tree forward propagate  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall     751 kB ( 0%) ggc
 tree aggressive DCE     :   0.10 ( 1%) usr   0.02 ( 1%) sys   0.11 ( 1%) wall   11357 kB ( 1%) ggc
 tree buildin call DCE   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      18 kB ( 0%) ggc
 tree loop bounds        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     238 kB ( 0%) ggc
 tree loop invariant motion:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall      69 kB ( 0%) ggc
 tree canonical iv       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     839 kB ( 0%) ggc
 scev constant prop      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     387 kB ( 0%) ggc
 tree loop unswitching   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     694 kB ( 0%) ggc
 complete unrolling      :   0.12 ( 1%) usr   0.01 ( 1%) sys   0.07 ( 0%) wall    4146 kB ( 0%) ggc
 tree vectorization      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    3806 kB ( 0%) ggc
 tree slp vectorization  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    2809 kB ( 0%) ggc
 tree iv optimization    :   0.11 ( 1%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall    6548 kB ( 1%) ggc
 predictive commoning    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     942 kB ( 0%) ggc
 tree rename SSA copies  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation   :   0.30 ( 2%) usr   0.06 ( 3%) sys   0.28 ( 2%) wall       0 kB ( 0%) ggc
 out of ssa              :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall      35 kB ( 0%) ggc
 expand vars             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1365 kB ( 0%) ggc
 expand                  :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.07 ( 0%) wall    8779 kB ( 1%) ggc
 forward prop            :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall    1840 kB ( 0%) ggc
 CSE                     :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.19 ( 1%) wall     164 kB ( 0%) ggc
 dead code elimination   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1        :   0.10 ( 1%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall    2753 kB ( 0%) ggc
 dead store elim2        :   0.10 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    3116 kB ( 0%) ggc
 loop init               :   0.14 ( 1%) usr   0.02 ( 1%) sys   0.13 ( 1%) wall    8608 kB ( 1%) ggc
 loop invariant motion   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      28 kB ( 0%) ggc
 loop unrolling          :   0.15 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall   16870 kB ( 2%) ggc
 loop fini               :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 CPROP                   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall    3730 kB ( 0%) ggc
 PRE                     :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     206 kB ( 0%) ggc
 web                     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    1137 kB ( 0%) ggc
 CSE 2                   :   0.13 ( 1%) usr   0.01 ( 1%) sys   0.11 ( 1%) wall     271 kB ( 0%) ggc
 branch prediction       :   0.05 ( 0%) usr   0.02 ( 1%) sys   0.08 ( 0%) wall    2222 kB ( 0%) ggc
 combiner                :   0.38 ( 2%) usr   0.02 ( 1%) sys   0.39 ( 2%) wall    9365 kB ( 1%) ggc
 if-conversion           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      86 kB ( 0%) ggc
 integrated RA           :   0.54 ( 3%) usr   0.00 ( 0%) sys   0.58 ( 3%) wall   20063 kB ( 2%) ggc
 LRA non-specific        :   0.26 ( 2%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall    2307 kB ( 0%) ggc
 LRA virtuals elimination:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     642 kB ( 0%) ggc
 LRA reload inheritance  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     265 kB ( 0%) ggc
 LRA create live ranges  :   0.16 ( 1%) usr   0.00 ( 0%) sys   0.24 ( 1%) wall     273 kB ( 0%) ggc
 LRA hard reg assignment :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 LRA rematerialization   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       1 kB ( 0%) ggc
 reload                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 reload CSE regs         :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall    4475 kB ( 0%) ggc
 load CSE after reload   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     199 kB ( 0%) ggc
 ree                     :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall       1 kB ( 0%) ggc
 thread pro- & epilogue  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     417 kB ( 0%) ggc
 peephole 2              :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     446 kB ( 0%) ggc
 rename registers        :   0.15 ( 1%) usr   0.00 ( 0%) sys   0.21 ( 1%) wall    1395 kB ( 0%) ggc
 hard reg cprop          :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall      19 kB ( 0%) ggc
 scheduling 2            :   0.71 ( 4%) usr   0.00 ( 0%) sys   0.72 ( 4%) wall     734 kB ( 0%) ggc
 reorder blocks          :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall    1560 kB ( 0%) ggc
 shorten branches        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 final                   :   0.09 ( 1%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    2542 kB ( 0%) ggc
 straight-line strength reduction:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     157 kB ( 0%) ggc
 rest of compilation     :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.11 ( 1%) wall    1081 kB ( 0%) ggc
 remove unused locals    :   0.06 ( 0%) usr   0.01 ( 1%) sys   0.12 ( 1%) wall      64 kB ( 0%) ggc
 address taken           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall      18 kB ( 0%) ggc
 unaccounted todo        :   0.26 ( 2%) usr   0.11 ( 6%) sys   0.35 ( 2%) wall       5 kB ( 0%) ggc
 repair loop structures  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :  16.60             1.97            18.58             981159 kB

It's relevant, I think, that it takes 6 s wall time to instantiate templates:

 phase opt and generate  :  10.85 (65%) usr   0.98 (50%) sys  11.83 (64%) wall  355111 kB (36%) ggc
...
 template instantiation  :   5.00 (30%) usr   0.99 (50%) sys   5.98 (32%) wall  622199 kB (63%) ggc

Just a crazy idea: might it be useful to have Eigen refactored into declaration headers and implementation files, which might be explicitly instantiated? It might imply a small performance loss, but it should be measured to evaluate if it's acceptable.
I know Eigen is extremely flexible and large, but this idea could be applied to just the most commonly-used instantiations, then linked against in most parts instead of telling the compiler to instantiate, generate and optimize the same code over and over again...

This could be a big workload on itself and no one has spare time to work on this, but... just wanted to share my thoughts!

@jolting
Copy link
Member

jolting commented Apr 2, 2017

My thought was refactoring CMatrix and associated types into a c++14 implementation with allows explicit conversions to Eigen. I think this would be a good start.

C++11 arrays can be stack allocated, which should improve performance.

Ideally, the c++14 implementation should focus on constexpr, so that the compiler can do as many optimization as compile time as possible.

http://codereview.stackexchange.com/questions/136541/simple-matrix-class-c14

@jlblancoc jlblancoc changed the title (For MRPT 2.0) Is it possible to use PIMPL-like idiom for Eigen? (For MRPT 2.0) Discussion on Eigen, matrix classes and alternatives for a faster build time Apr 3, 2017
@jlblancoc
Copy link
Member Author

mmm... it feels like rolling back the decision back taken more than 5 years ago when moving to Eigen from custom matrix classes!

But it's worth considering for 2.0: it might be the simplest way of achieving small, simple templates for storing matrices, then only including <Eigen/...> where really necessary.

That idea would work if we draw a clear border for what operations lie on "our side", with the rest being for Eigen. An idea is to just implement the most basic storage.

By the way: Eigen::Map<> is a way to go for such a custom matrix storage, e.g. (just a tentative idea)

template <...>
class matrix 
{
...
auto toEigen() {
  return Eigen::Map<Eigen::Matrix<...> > (this->m_storage);
}
...
};

(PS: just edited this issue title)

@jlblancoc
Copy link
Member Author

I just realized now that my last code snippet didn't make sense, since it implies having #included all Eigen headers so Eigen::Map<> and such are defined, and that's exactly what we want to avoid! :-)

This one should be better, as it will work only from user sources where Eigen headers are included.
This time the code is tested (in MSVC 2015):

HEADER:

#include <array>

// Frwd decls for Matrix:
namespace Eigen {
	template<typename PlainObjectType, int MapOptions, typename StrideType> class Map;
	template<int Value> class InnerStride;
}

template <typename T, int ROWS, int COLS>
struct Matrix
{
	template <typename EIGEN_MATRIX, typename EIGEN_MAP = Eigen::Map<EIGEN_MATRIX, 1 /*16 align*/, Eigen::InnerStride<1> /* row major */> >
	EIGEN_MAP asEigen() {
		return EIGEN_MAP(&m_matrix[0], ROWS, COLS);
	}

private:
	alignas(16) std::array<T, ROWS*COLS> m_matrix;
};

SOURCE:

#include <mrpt/utils/types_math.h>  // Eigen hdrs

Matrix<double, 4, 4> M;

auto em = M.asEigen< Eigen::Matrix4d >();
em.setIdentity();
std::cout << em << std::endl;

@jlblancoc
Copy link
Member Author

Updated issue first comment to reflect a couple of TO-DO tasks related to this discussion.

@jlblancoc jlblancoc mentioned this issue Jun 14, 2017
31 tasks
@jlblancoc jlblancoc added this to the Release 2.0.0-alpha milestone Feb 14, 2019
@jlblancoc
Copy link
Member Author

WIP for this issue is branch: matrix-hidden-eigen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants