Lean Model Checkpointing #504

dorukhansergin · 2019-02-08T20:54:01Z

Hi,

I ran a Trial and have my model saves my model using torchbearer.callbacks.checkpointers.Best to a file model.pt.

When I load the file with torch.load and run try to make a forward pass with it, I get the following error:

model = MyModule()
state_dict = torch.load('vae.pt')
model.load_state_dict(state_dict) # <== I get the error here

AttributeError: 'StateKey' object has no attribute 'startswith'

I get that model is being saved so that I can be recovered to be ready for torchbearer, but how can we save the model lean?

It seems like here, the model is only saved for reusability by torchbearer.

Thanks a lot!

The text was updated successfully, but these errors were encountered:

ethanwharris · 2019-02-08T22:20:54Z

Hello,

Thanks for your feedback :) we need to document this better, but you hopefully should be able to load the model with model.load_state_dict(state_dict[torchbearer.MODEL]) if model is a subclass of nn.module. Let us know if that works. I'll leave the issue open as this should be presented better, any ideas you may have are welcome!

dorukhansergin · 2019-02-08T22:49:46Z

@ethanwharris , thank you so much for the prompt reply.

I will try to create a simple script to replicate the error(s) along with the details of my configuration. I came up with a band-aid solution, which I hope to publish too for people who might be experiencing the same problem. Or maybe I'm just missing something.

I see that you got rid of pass_state in the master repo. This issue might be related to it.

dorukhansergin · 2019-02-09T03:29:47Z

Okay, now I got it.

model.load_state_dict(state_dict[torchbearer.MODEL]) works great. Thanks for that heads-up.
I totally forgot that I defined frowards signature with state to be able to pass state. My post-training operations assume the signature to be forward(x) only, throwing errors about state not being passed. Therefore, I need to overload forward in my Module with the original version to be on the safe side. I don't know if this is a common problem, but this could be added to the document. In fact, as an apology for wasting your time, allow me to do the pull request for it.

I'm a big fan of torchbearer and the team's work. Keep it up!

Thanks for the help

ethanwharris · 2019-02-11T07:35:20Z

No problem! glad to be of help. A PR would be much appreciated as this is definitely something we should document better

ethanwharris · 2019-02-15T14:20:28Z

Closed by #508

ethanwharris added the docs More/better docs label Feb 11, 2019

ethanwharris mentioned this issue Feb 11, 2019

Model checkpointers save_weights_only #505

Closed

ethanwharris assigned dorukhansergin Feb 11, 2019

ethanwharris closed this as completed Feb 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lean Model Checkpointing #504

Lean Model Checkpointing #504

dorukhansergin commented Feb 8, 2019

ethanwharris commented Feb 8, 2019

dorukhansergin commented Feb 8, 2019

dorukhansergin commented Feb 9, 2019

ethanwharris commented Feb 11, 2019

ethanwharris commented Feb 15, 2019

Lean Model Checkpointing #504

Lean Model Checkpointing #504

Comments

dorukhansergin commented Feb 8, 2019

ethanwharris commented Feb 8, 2019

dorukhansergin commented Feb 8, 2019

dorukhansergin commented Feb 9, 2019

ethanwharris commented Feb 11, 2019

ethanwharris commented Feb 15, 2019