Make U matrices not persistent to reduce `state_dict` size. #124

pfebrer · 2023-06-28T16:40:17Z

Is your feature request related to a problem? Please describe.
We are using mace with max angular momentum 4 and correlation 3 and the checkpoint files are huge because the U matrices of the Contraction class are stored in it (~400MB when the parameters of the model only occupy ~5MB).

Describe the solution you'd like
We would like that U matrices weren't stored in the checkpoint file.

Describe alternatives you've considered
We think that passing persistent=False in this line:

mace/mace/modules/symmetric_contraction.py

Line 110 in 44b6da4

self.register_buffer(f"U_matrix_{nu}", U_matrix)

should solve the problem and doesn't cause any harm to the model. Otherwise maybe allowing the user to choose whether they are persistent or not would be nice as well.

The text was updated successfully, but these errors were encountered:

ilyes319 · 2023-06-28T19:10:32Z

Hey @pfebrer,

That is a reasonable solution. Could you try it and check if it solves your issues?

pfebrer · 2023-06-29T11:48:52Z

It indeed solved the problem, my checkpoint files size has been reduced from 680 MB to 26 MB :) I could also restart the training without problems.

But maybe in some case it is useful to store them precomputed?

pfebrer · 2023-06-29T18:34:14Z

I now noticed that this change breaks compatibility with loading models.

E.g.: if you load a model that stored the U matrices in a version of mace that has them set to persistent=False this will result in an error. And the same happens in the opposite case.

peterbjorgensen · 2023-07-10T13:56:25Z

I now noticed that this change breaks compatibility with loading models.

E.g.: if you load a model that stored the U matrices in a version of mace that has them set to persistent=False this will result in an error. And the same happens in the opposite case.

This could maybe be fixed with non-strict loading. The torch load function has a keyword strict=False, but it might be better to just break the backwards compatibility or explicitly remove the U matrices from old checkpoints.

pfebrer · 2023-07-13T15:06:45Z

@ilyes319 do you think you can make them persistent=False? To load an old model on the new implementation it would just be a matter of "cleaning" the checkpoint file, i.e. removing the matrices from it.

ilyes319 · 2023-07-13T16:28:19Z

I wonder how this interacts with torchscript though and libtorch. I guess the safest would be to make an argument and keep the default to true. Would this alright?

pfebrer · 2023-07-13T16:45:06Z

Yes, if it can be configured from an argument of the SymmetricContraction module (not just Contraction) I think it would be fine for us 👍

pfebrer · 2023-08-24T12:26:28Z

Could we add this? :) (a persistent_U_matrices or something similar argument to SymmetricContraction that defaults to True)

I can submit a PR.

pfebrer mentioned this issue Jun 28, 2023

Make checkpoint files smaller BIG-MAP/graph2mat#3

Open

ilyes319 added the enhancement New feature or request label Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make U matrices not persistent to reduce `state_dict` size. #124

Make U matrices not persistent to reduce `state_dict` size. #124

pfebrer commented Jun 28, 2023 •

edited

Loading

ilyes319 commented Jun 28, 2023

pfebrer commented Jun 29, 2023 •

edited

Loading

pfebrer commented Jun 29, 2023

peterbjorgensen commented Jul 10, 2023

pfebrer commented Jul 13, 2023

ilyes319 commented Jul 13, 2023

pfebrer commented Jul 13, 2023

pfebrer commented Aug 24, 2023 •

edited

Loading

Make U matrices not persistent to reduce state_dict size. #124

Make U matrices not persistent to reduce state_dict size. #124

Comments

pfebrer commented Jun 28, 2023 • edited Loading

ilyes319 commented Jun 28, 2023

pfebrer commented Jun 29, 2023 • edited Loading

pfebrer commented Jun 29, 2023

peterbjorgensen commented Jul 10, 2023

pfebrer commented Jul 13, 2023

ilyes319 commented Jul 13, 2023

pfebrer commented Jul 13, 2023

pfebrer commented Aug 24, 2023 • edited Loading

Make U matrices not persistent to reduce `state_dict` size. #124

Make U matrices not persistent to reduce `state_dict` size. #124

pfebrer commented Jun 28, 2023 •

edited

Loading

pfebrer commented Jun 29, 2023 •

edited

Loading

pfebrer commented Aug 24, 2023 •

edited

Loading