Checkpoints cannot be loaded in non-pl env #2653

s-rog · 2020-07-21T01:30:04Z

## 🚀 Feature
Add an option to save only state_dict for ModelCheckpoint callbacks

🐛 Bug

PL checkpoints cannot be loaded in non-pl envs

Motivation

To be able to move trained models and weights into pytorch only environments

Additional context

Currently when you do torch.load() on a pl generated checkpoint in an environment without pl, there is a pickling error. For my current use case I have to load the checkpoints in my training environment and save them again with only state_dict for the weights.

See reply below for more info

The text was updated successfully, but these errors were encountered:

rohitgr7 · 2020-07-21T18:04:55Z

You can use save_weights_only parameter in ModelCheckpoint to save weights only. Although it will save epoch, global_step and pl_version but that won't be a problem there, I guess. Also can you show the pickling error you are getting?

s-rog · 2020-07-22T00:23:59Z

I am using save_weights_only and that causes a pickling error with module lightning not found (don't have the extact error atm)

rohitgr7 · 2020-07-22T04:46:34Z

Can you check when you load that checkpoint manually in pl env, what keys does that file have?

s-rog · 2020-08-27T02:20:20Z

Error in non-pl env

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-10-dbc5018f5317> in <module>
----> 1 pretrained_dict = torch.load('../input/weights/test.ckpt', map_location=torch.device('cpu'))

/opt/conda/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    591                     return torch.jit.load(f)
    592                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 593         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    594 
    595 

/opt/conda/lib/python3.7/site-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    771     unpickler = pickle_module.Unpickler(f, **pickle_load_args)
    772     unpickler.persistent_load = persistent_load
--> 773     result = unpickler.load()
    774 
    775     deserialized_storage_keys = pickle_module.load(f, **pickle_load_args)

ModuleNotFoundError: No module named 'pytorch_lightning'

keys in pl env
dict_keys(['epoch', 'global_step', 'pytorch-lightning_version', 'state_dict', 'hparams_name', 'hyper_parameters'])

@rohitgr7 sorry about the late reply, completely forgot about this issue

Edit:
found the issue, I'll look into a fix

for k, v in pretrained_dict.items():
    print(type(k), type(v))

<class 'str'> <class 'int'>
<class 'str'> <class 'int'>
<class 'str'> <class 'str'>
<class 'str'> <class 'collections.OrderedDict'>
<class 'str'> <class 'str'>
<class 'str'> pytorch_lightning.utilities.parsing.AttributeDict

Edit 2:
I'll submit a PR after refactor week!

s-rog · 2020-08-31T03:06:08Z

I got around to testing and can load checkpoints now in non-pl envs. The only change needed was to cast hyper_parameters to dict in dump_checkpoint of pytorch_lightning/trainer/training_io.py

- checkpoint[LightningModule.CHECKPOINT_HYPER_PARAMS_KEY] = model.hparams
+ checkpoint[LightningModule.CHECKPOINT_HYPER_PARAMS_KEY] = dict(model.hparams)

Thoughts?

rohitgr7 · 2020-08-31T10:38:08Z

Yeah this looks good to avoid such error since AttributeDict is a PL thing.

rohitgr7 · 2020-08-31T18:04:46Z

@s-rog , I tried on master with save_weights_only=True and these are the dict keys I got. No hyperparams.

dict_keys(['epoch', 'global_step', 'pytorch-lightning_version', 'state_dict'])

s-rog · 2020-09-01T00:37:25Z

@rohitgr7 Did the model have self.hparams?

If you look at dump_checkpoint() the weights_only arg only controls:
callbacks, optimizer_states, lr_schedulers, native_amp_scaling_state and amp_scaling_state

hparams loggging is only controlled by if model.hparams:

rohitgr7 · 2020-09-01T16:56:16Z

ok, yeah my bad :)

s-rog added feature Is an improvement or enhancement help wanted Open to be worked on labels Jul 21, 2020

s-rog changed the title ~~Option to save only 'state_dict' for checkpoints~~ Checkpoints cannot be loaded in non-pl env Aug 27, 2020

Borda added bug Something isn't working and removed feature Is an improvement or enhancement labels Aug 27, 2020

s-rog mentioned this issue Aug 31, 2020

Fix (weights only) checkpoints loading without pl #3287

Merged

7 tasks

Borda closed this as completed in #3287 Sep 2, 2020

rohitgr7 mentioned this issue Nov 6, 2020

Replace AttributeDict in with dict in checkpoint #4542

Closed

Sushobhan04 mentioned this issue Nov 20, 2020

checkpoint cannot be loaded without source code #4792

Closed

wdeback mentioned this issue Jan 11, 2021

Cannot load model without pytorch-lightning (No module named 'pytorch_lightning') ozanciga/self-supervised-histopathology#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoints cannot be loaded in non-pl env #2653

Checkpoints cannot be loaded in non-pl env #2653

s-rog commented Jul 21, 2020 •

edited

rohitgr7 commented Jul 21, 2020

s-rog commented Jul 22, 2020

rohitgr7 commented Jul 22, 2020

s-rog commented Aug 27, 2020 •

edited

s-rog commented Aug 31, 2020

rohitgr7 commented Aug 31, 2020

rohitgr7 commented Aug 31, 2020

s-rog commented Sep 1, 2020

rohitgr7 commented Sep 1, 2020

Checkpoints cannot be loaded in non-pl env #2653

Checkpoints cannot be loaded in non-pl env #2653

Comments

s-rog commented Jul 21, 2020 • edited

🐛 Bug

Motivation

Additional context

rohitgr7 commented Jul 21, 2020

s-rog commented Jul 22, 2020

rohitgr7 commented Jul 22, 2020

s-rog commented Aug 27, 2020 • edited

s-rog commented Aug 31, 2020

rohitgr7 commented Aug 31, 2020

rohitgr7 commented Aug 31, 2020

s-rog commented Sep 1, 2020

rohitgr7 commented Sep 1, 2020

s-rog commented Jul 21, 2020 •

edited

s-rog commented Aug 27, 2020 •

edited