Skip to content

RuntimeError: Error(s) in loading state_dict when adding/updating metrics to a trained model. #4666

@Vichoko

Description

@Vichoko

❓ Questions and Help

For context, I trained a lot of models for many weeks, tracking the loss and accuracy for train, validation, and test steps.
Now, I wanted to evaluate more metrics for the test data set, more specifically, I added recall, confusion matrix, and precision metrics (from ptl.metrics module) to the test_step and test_epoch_end methods in the lighting module.

Also, I replaced my custom accuracy with the class-based Accuracy implemented on the ptl.metrics package.

When I try to test my model and get the metrics for the trained model on the test set, I get this error loading the checkpoint:

Traceback (most recent call last):
  File "model_manager.py", line 283, in <module>
    helper.test()
  File "model_manager.py", line 119, in test
    self.trainer.test(self.module)
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 748, in test
    results = self.__test_given_model(model, test_dataloaders)
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 813, in __test_given_model
    results = self.fit(model)
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 459, in fit
    results = self.accelerator_backend.train()
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 61, in train
    self.trainer.train_loop.setup_training(model)
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 174, in setup_training
    self.trainer.checkpoint_connector.restore_weights(model)
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 75, in restore_weights
    self.restore(self.trainer.resume_from_checkpoint, on_gpu=self.trainer.on_gpu)
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 107, in restore
    self.restore_model_state(model, checkpoint)
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 128, in restore_model_state
    model.load_state_dict(checkpoint['state_dict'])
  File "/data/anaconda3/envs/aidio2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for L_WavenetLSTMClassifier:
        Unexpected key(s) in state_dict: "train_acc.correct", "train_acc.total", "val_acc.correct", "val_acc.total", "test_acc.correct", "test_acc.total".

What is your question?

In my case, it's impossible to train again the models because it takes many weeks. So I just wonder if there is a way to load the already trained model anyway and obtain the updated test metrics by a test cycle.

Actually I just care about loading the parameters of the model to run the test cycle. I can't understand why it's so important to load other things up, those old metrics don't appear so vital to me.

What is you tired?

I read that this exception it's generated by torch model.load_state_dict method, and can be avoided with strict=false parameter.

In my case I load the trained model with the resume_from_checkpoint parameter of the pytorch-lightning trainer class, so i have no clue to try to get closer to load this.

What's your environment?

  • OS: [e.g. iOS, Linux, Win]: Win
  • Version [e.g. 0.5.2.1]: Latest master branch (November 13rd, 2020)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions