-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make U matrices not persistent to reduce state_dict
size.
#124
Comments
Hey @pfebrer, That is a reasonable solution. Could you try it and check if it solves your issues? |
It indeed solved the problem, my checkpoint files size has been reduced from 680 MB to 26 MB :) I could also restart the training without problems. But maybe in some case it is useful to store them precomputed? |
I now noticed that this change breaks compatibility with loading models. E.g.: if you load a model that stored the U matrices in a version of mace that has them set to |
This could maybe be fixed with non-strict loading. The torch load function has a keyword |
@ilyes319 do you think you can make them |
I wonder how this interacts with torchscript though and libtorch. I guess the safest would be to make an argument and keep the default to true. Would this alright? |
Yes, if it can be configured from an argument of the |
Could we add this? :) (a I can submit a PR. |
Is your feature request related to a problem? Please describe.
We are using mace with max angular momentum 4 and correlation 3 and the checkpoint files are huge because the U matrices of the
Contraction
class are stored in it (~400MB when the parameters of the model only occupy ~5MB).Describe the solution you'd like
We would like that U matrices weren't stored in the checkpoint file.
Describe alternatives you've considered
We think that passing
persistent=False
in this line:mace/mace/modules/symmetric_contraction.py
Line 110 in 44b6da4
The text was updated successfully, but these errors were encountered: