Slow FIT_TO_TEMPLATE in v2.8 #680

GiovanniBussi · 2021-04-09T15:09:36Z

This was reported by @Jayashrita

Apparently, commit ac993af introduces a severe performance hit in the apply() phase of FIT_TO_TEMPLATE. I checked the code and the only relevant change seems this one:

@@ -248,11 +248,11 @@ public:
 ///calculate with close structure, i.e. approximate the RMSD without expensive computation of rotation matrix by reusing saved rotation matrices from previous iterations
   double calculateWithCloseStructure( const std::vector<Vector>& positions, std::vector<Vector> &DDistDPos, Tensor & rotationPosClose, Tensor & rotationRefClose, std::array<std::array<Tensor,3>,3> & drotationPosCloseDrr01, const bool squared=false   );
 /// static convenience method to get the matrix i,a from drotdpos (which is a bit tricky)
-  static  Tensor getMatrixFromDRot(Matrix< std::vector<Vector> > &drotdpos, const unsigned &i, const unsigned &a) {
+  static  Tensor getMatrixFromDRot(const Matrix< std::vector<Vector> > &drotdpos, const unsigned &i, const unsigned &a) {
     Tensor t;
-    t[0][0]=drotdpos[0][0][i][a]; t[0][1]=drotdpos[0][1][i][a]; t[0][2]=drotdpos[0][2][i][a];
-    t[1][0]=drotdpos[1][0][i][a]; t[1][1]=drotdpos[1][1][i][a]; t[1][2]=drotdpos[1][2][i][a];
-    t[2][0]=drotdpos[2][0][i][a]; t[2][1]=drotdpos[2][1][i][a]; t[2][2]=drotdpos[2][2][i][a];
+    t[0][0]=drotdpos(0,0)[i][a]; t[0][1]=drotdpos(0,1)[i][a]; t[0][2]=drotdpos(0,2)[i][a];
+    t[1][0]=drotdpos(1,0)[i][a]; t[1][1]=drotdpos(1,1)[i][a]; t[1][2]=drotdpos(1,2)[i][a];
+    t[2][0]=drotdpos(2,0)[i][a]; t[2][1]=drotdpos(2,1)[i][a]; t[2][2]=drotdpos(2,2)[i][a];
     return t;
   };
 };

This function is called in FitToTemplate:

    for(unsigned i=0; i<aligned.size(); i++) {
      Vector g;
      for(unsigned k=0; k<3; k++) {
// this could be made faster computing only the diagonal of d
        Tensor d=matmul(ww,RMSD::getMatrixFromDRot(drotdpos,i,k));
        g[k]=(d(0,0)+d(1,1)+d(2,2));
      }

In principle there should be no impact, but perhaps the compiler is fooled by some change and misses an inlining.

I will try to reproduce the performance hit in my computer and then fix it.

The text was updated successfully, but these errors were encountered:

GiovanniBussi · 2021-04-15T09:38:06Z

Small update, I could reproduce this on a colab with gcc-9. Still working on a solution

GiovanniBussi · 2021-04-15T09:51:53Z

It looks like this is the problematic change:

-  static  Tensor getMatrixFromDRot(Matrix< std::vector<Vector> > &drotdpos, const unsigned &i, const unsigned &a) {
+  static  Tensor getMatrixFromDRot(const Matrix< std::vector<Vector> > &drotdpos, const unsigned &i, const unsigned &a) {

Reverting this change alone fixes the problem.

In the colab, see at the end of the notebook the timings.

fast is before the commit
slow is after the commit
slow-test1 is after the commit but reverting the single change above. It's actually fast

@Jayashrita could you please test this on your system? You should just apply this change (likely working with master branch):

cd src/tools
# possible fix
sed 's/Tensor getMatrixFromDRot(const Matrix/Tensor getMatrixFromDRot(Matrix/' RMSD.h > RMSD.h.tmp
mv RMSD.h.tmp RMSD.h

Jayashrita · 2021-04-15T10:13:58Z

I tested this on my system and this solves the problem. The simulations are as fast as before.

GiovanniBussi · 2021-04-15T10:16:06Z

I better checked and found the bug. There is an incorrect implementation of operator () in class Matrix<>. Namely this line:

inline T operator () (const unsigned& i, const unsigned& j) const { return data[j+i*cl]; }

should have been:

inline const T & operator () (const unsigned& i, const unsigned& j) const { return data[j+i*cl]; }

What happens in the original implementation is that whenever one tries to access an element of a const Matrix object, a copy of the element is taken. This is not a problem if the element is a double. But if the element is a vector or something big, this adds a large overhead. The correct implementation instead returns a const reference, that can be read without taking copies.

@Jayashrita you might want to try this (instead of what I suggested above). I made a test at the end of the colab and this little change alone seems to fix the problem.

Since this issue was present in the Matrix class since the very beginning and it might significantly affect performance in some operation, I will backport it to v2.5

Jayashrita · 2021-04-15T10:27:09Z

The change in the Matrix class fixes the problem too.

GiovanniBussi self-assigned this Apr 9, 2021

GiovanniBussi closed this as completed in ab1a3b9 Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow FIT_TO_TEMPLATE in v2.8 #680

Slow FIT_TO_TEMPLATE in v2.8 #680

GiovanniBussi commented Apr 9, 2021

GiovanniBussi commented Apr 15, 2021

GiovanniBussi commented Apr 15, 2021

Jayashrita commented Apr 15, 2021

GiovanniBussi commented Apr 15, 2021

Jayashrita commented Apr 15, 2021

Slow FIT_TO_TEMPLATE in v2.8 #680

Slow FIT_TO_TEMPLATE in v2.8 #680

Comments

GiovanniBussi commented Apr 9, 2021

GiovanniBussi commented Apr 15, 2021

GiovanniBussi commented Apr 15, 2021

Jayashrita commented Apr 15, 2021

GiovanniBussi commented Apr 15, 2021

Jayashrita commented Apr 15, 2021