Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make U matrices not persistent to reduce state_dict size. #124

Open
pfebrer opened this issue Jun 28, 2023 · 8 comments
Open

Make U matrices not persistent to reduce state_dict size. #124

pfebrer opened this issue Jun 28, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@pfebrer
Copy link

pfebrer commented Jun 28, 2023

Is your feature request related to a problem? Please describe.
We are using mace with max angular momentum 4 and correlation 3 and the checkpoint files are huge because the U matrices of the Contraction class are stored in it (~400MB when the parameters of the model only occupy ~5MB).

Describe the solution you'd like
We would like that U matrices weren't stored in the checkpoint file.

Describe alternatives you've considered
We think that passing persistent=False in this line:

self.register_buffer(f"U_matrix_{nu}", U_matrix)
should solve the problem and doesn't cause any harm to the model. Otherwise maybe allowing the user to choose whether they are persistent or not would be nice as well.

@ilyes319
Copy link
Contributor

Hey @pfebrer,

That is a reasonable solution. Could you try it and check if it solves your issues?

@pfebrer
Copy link
Author

pfebrer commented Jun 29, 2023

It indeed solved the problem, my checkpoint files size has been reduced from 680 MB to 26 MB :) I could also restart the training without problems.

But maybe in some case it is useful to store them precomputed?

@pfebrer
Copy link
Author

pfebrer commented Jun 29, 2023

I now noticed that this change breaks compatibility with loading models.

E.g.: if you load a model that stored the U matrices in a version of mace that has them set to persistent=False this will result in an error. And the same happens in the opposite case.

@peterbjorgensen
Copy link

I now noticed that this change breaks compatibility with loading models.

E.g.: if you load a model that stored the U matrices in a version of mace that has them set to persistent=False this will result in an error. And the same happens in the opposite case.

This could maybe be fixed with non-strict loading. The torch load function has a keyword strict=False, but it might be better to just break the backwards compatibility or explicitly remove the U matrices from old checkpoints.

@pfebrer
Copy link
Author

pfebrer commented Jul 13, 2023

@ilyes319 do you think you can make them persistent=False? To load an old model on the new implementation it would just be a matter of "cleaning" the checkpoint file, i.e. removing the matrices from it.

@ilyes319
Copy link
Contributor

I wonder how this interacts with torchscript though and libtorch. I guess the safest would be to make an argument and keep the default to true. Would this alright?

@pfebrer
Copy link
Author

pfebrer commented Jul 13, 2023

Yes, if it can be configured from an argument of the SymmetricContraction module (not just Contraction) I think it would be fine for us 👍

@ilyes319 ilyes319 added the enhancement New feature or request label Jul 31, 2023
@pfebrer
Copy link
Author

pfebrer commented Aug 24, 2023

Could we add this? :) (a persistent_U_matrices or something similar argument to SymmetricContraction that defaults to True)

I can submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants