Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow FIT_TO_TEMPLATE in v2.8 #680

Closed
GiovanniBussi opened this issue Apr 9, 2021 · 5 comments
Closed

Slow FIT_TO_TEMPLATE in v2.8 #680

GiovanniBussi opened this issue Apr 9, 2021 · 5 comments
Assignees

Comments

@GiovanniBussi
Copy link
Member

This was reported by @Jayashrita

Apparently, commit ac993af introduces a severe performance hit in the apply() phase of FIT_TO_TEMPLATE. I checked the code and the only relevant change seems this one:

@@ -248,11 +248,11 @@ public:
 ///calculate with close structure, i.e. approximate the RMSD without expensive computation of rotation matrix by reusing saved rotation matrices from previous iterations
   double calculateWithCloseStructure( const std::vector<Vector>& positions, std::vector<Vector> &DDistDPos, Tensor & rotationPosClose, Tensor & rotationRefClose, std::array<std::array<Tensor,3>,3> & drotationPosCloseDrr01, const bool squared=false   );
 /// static convenience method to get the matrix i,a from drotdpos (which is a bit tricky)
-  static  Tensor getMatrixFromDRot(Matrix< std::vector<Vector> > &drotdpos, const unsigned &i, const unsigned &a) {
+  static  Tensor getMatrixFromDRot(const Matrix< std::vector<Vector> > &drotdpos, const unsigned &i, const unsigned &a) {
     Tensor t;
-    t[0][0]=drotdpos[0][0][i][a]; t[0][1]=drotdpos[0][1][i][a]; t[0][2]=drotdpos[0][2][i][a];
-    t[1][0]=drotdpos[1][0][i][a]; t[1][1]=drotdpos[1][1][i][a]; t[1][2]=drotdpos[1][2][i][a];
-    t[2][0]=drotdpos[2][0][i][a]; t[2][1]=drotdpos[2][1][i][a]; t[2][2]=drotdpos[2][2][i][a];
+    t[0][0]=drotdpos(0,0)[i][a]; t[0][1]=drotdpos(0,1)[i][a]; t[0][2]=drotdpos(0,2)[i][a];
+    t[1][0]=drotdpos(1,0)[i][a]; t[1][1]=drotdpos(1,1)[i][a]; t[1][2]=drotdpos(1,2)[i][a];
+    t[2][0]=drotdpos(2,0)[i][a]; t[2][1]=drotdpos(2,1)[i][a]; t[2][2]=drotdpos(2,2)[i][a];
     return t;
   };
 };

This function is called in FitToTemplate:

    for(unsigned i=0; i<aligned.size(); i++) {
      Vector g;
      for(unsigned k=0; k<3; k++) {
// this could be made faster computing only the diagonal of d
        Tensor d=matmul(ww,RMSD::getMatrixFromDRot(drotdpos,i,k));
        g[k]=(d(0,0)+d(1,1)+d(2,2));
      }

In principle there should be no impact, but perhaps the compiler is fooled by some change and misses an inlining.

I will try to reproduce the performance hit in my computer and then fix it.

@GiovanniBussi GiovanniBussi self-assigned this Apr 9, 2021
@GiovanniBussi
Copy link
Member Author

Small update, I could reproduce this on a colab with gcc-9. Still working on a solution

@GiovanniBussi
Copy link
Member Author

It looks like this is the problematic change:

-  static  Tensor getMatrixFromDRot(Matrix< std::vector<Vector> > &drotdpos, const unsigned &i, const unsigned &a) {
+  static  Tensor getMatrixFromDRot(const Matrix< std::vector<Vector> > &drotdpos, const unsigned &i, const unsigned &a) {

Reverting this change alone fixes the problem.

In the colab, see at the end of the notebook the timings.

  • fast is before the commit
  • slow is after the commit
  • slow-test1 is after the commit but reverting the single change above. It's actually fast

@Jayashrita could you please test this on your system? You should just apply this change (likely working with master branch):

cd src/tools
# possible fix
sed 's/Tensor getMatrixFromDRot(const Matrix/Tensor getMatrixFromDRot(Matrix/' RMSD.h > RMSD.h.tmp
mv RMSD.h.tmp RMSD.h

@Jayashrita
Copy link

I tested this on my system and this solves the problem. The simulations are as fast as before.

@GiovanniBussi
Copy link
Member Author

I better checked and found the bug. There is an incorrect implementation of operator () in class Matrix<>. Namely this line:

inline T operator () (const unsigned& i, const unsigned& j) const { return data[j+i*cl]; }

should have been:

inline const T & operator () (const unsigned& i, const unsigned& j) const { return data[j+i*cl]; }

What happens in the original implementation is that whenever one tries to access an element of a const Matrix object, a copy of the element is taken. This is not a problem if the element is a double. But if the element is a vector or something big, this adds a large overhead. The correct implementation instead returns a const reference, that can be read without taking copies.

@Jayashrita you might want to try this (instead of what I suggested above). I made a test at the end of the colab and this little change alone seems to fix the problem.

Since this issue was present in the Matrix class since the very beginning and it might significantly affect performance in some operation, I will backport it to v2.5

@Jayashrita
Copy link

The change in the Matrix class fixes the problem too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants