finetune: Cannot load model parameters from checkpoint #23

FiorellaArtuso · 2022-05-24T13:08:41Z

I was trying to finetune your pretrained model. However, when I launch ./command/finetune/finetune.sh i get the following error:

2022-05-24 12:51:00 | INFO | fairseq_cli.train | task: SimilarityTask
2022-05-24 12:51:00 | INFO | fairseq_cli.train | model: TrexModel
2022-05-24 12:51:00 | INFO | fairseq_cli.train | criterion: SimilarityCriterion
2022-05-24 12:51:00 | INFO | fairseq_cli.train | num. shared model params: 61,787,413 (num. trained: 61,787,413)
2022-05-24 12:51:00 | INFO | fairseq_cli.train | num. expert model params: 0 (num. trained: 0)
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input0/static/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input1/static/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input0/inst_pos_emb/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input1/inst_pos_emb/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input0/op_pos_emb/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input1/op_pos_emb/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input0/arch_emb/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input1/arch_emb/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input0/byte1/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input1/byte1/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input0/byte2/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input1/byte2/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input0/byte3/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input1/byte3/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input0/byte4/valid
2022-05-24 12:51:00 | INFO | fairseq.data.data_utils | loaded 2,005 examples from: data-bin/comp_similarity/input1/byte4/valid
2022-05-24 12:51:00 | INFO | fairseq.tasks.similarity | Loaded valid with #samples: 2005
2022-05-24 12:51:00 | INFO | fairseq.trainer | detected shared parameter: encoder.sentence_encoder.embed_tokens.static.weight <- encoder.lm_code_head.weight
2022-05-24 12:51:00 | INFO | fairseq_cli.train | training on 1 devices (GPUs/TPUs)
2022-05-24 12:51:00 | INFO | fairseq_cli.train | max tokens per device = None and max sentences per device = 16
2022-05-24 12:51:00 | INFO | fairseq.trainer | Preparing to load checkpoint checkpoints/similarity/checkpoint_best.pt
2022-05-24 12:51:01 | INFO | fairseq.models.trex.model | Overwriting classification_heads.similarity.dense.weight
2022-05-24 12:51:01 | INFO | fairseq.models.trex.model | Overwriting classification_heads.similarity.dense.bias
2022-05-24 12:51:01 | INFO | fairseq.models.trex.model | Overwriting classification_heads.similarity.out_proj.weight
2022-05-24 12:51:01 | INFO | fairseq.models.trex.model | Overwriting classification_heads.similarity.out_proj.bias
Traceback (most recent call last):
  File "/home/trex/fairseq/trainer.py", line 460, in load_checkpoint
    self.model.load_state_dict(
  File "/home/trex/fairseq/models/fairseq_model.py", line 125, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for TrexModel:
        Missing key(s) in state_dict: "encoder.sentence_encoder.embed_bytes.weight".
        size mismatch for encoder.sentence_encoder.byte_combine.convolutions.0.weight: copying a param with shape torch.Size([4, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 768, 1]).
        size mismatch for encoder.sentence_encoder.byte_combine.convolutions.0.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for encoder.sentence_encoder.byte_combine.convolutions.1.weight: copying a param with shape torch.Size([8, 1, 2]) from checkpoint, the shape in current model is torch.Size([128, 768, 2]).
        size mismatch for encoder.sentence_encoder.byte_combine.convolutions.1.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for encoder.sentence_encoder.byte_combine.convolutions.2.weight: copying a param with shape torch.Size([12, 1, 3]) from checkpoint, the shape in current model is torch.Size([192, 768, 3]).
        size mismatch for encoder.sentence_encoder.byte_combine.convolutions.2.bias: copying a param with shape torch.Size([12]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for encoder.sentence_encoder.byte_combine.highway.layers.0.weight: copying a param with shape torch.Size([48, 24]) from checkpoint, the shape in current model is torch.Size([768, 384]).
        size mismatch for encoder.sentence_encoder.byte_combine.highway.layers.0.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.sentence_encoder.byte_combine.highway.layers.1.weight: copying a param with shape torch.Size([48, 24]) from checkpoint, the shape in current model is torch.Size([768, 384]).
        size mismatch for encoder.sentence_encoder.byte_combine.highway.layers.1.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for encoder.sentence_encoder.byte_combine.projection.weight: copying a param with shape torch.Size([768, 24]) from checkpoint, the shape in current model is torch.Size([768, 384]).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 14, in <module>
    cli_main()
  File "/home/trex/fairseq_cli/train.py", line 496, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/home/trex/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/home/trex/fairseq_cli/train.py", line 149, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
  File "/home/trex/fairseq/checkpoint_utils.py", line 213, in load_checkpoint
    extra_state = trainer.load_checkpoint(
  File "/home/trex/fairseq/trainer.py", line 472, in load_checkpoint
    raise Exception(
Exception: Cannot load model parameters from checkpoint checkpoints/similarity/checkpoint_best.pt; please ensure that the architectures match.

How can i solve?

The text was updated successfully, but these errors were encountered:

FiorellaArtuso · 2022-05-24T15:30:47Z

Sorry it was a problem from my side!

zztian007 · 2024-01-24T09:09:27Z

I have a similar problem when running the get_embedding.py. How did you solve the size mismatch problem?

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for TrexModel: Unexpected key(s) in state_dict: "encoder.sentence_encoder.embed_bytes.weight". size mismatch for encoder.sentence_encoder.byte_combine.convolutions.0.weight: copying a param with shape torch.Size([64, 768, 1]) from checkpoint, the shape in current model is torch.Size([4, 1, 1]). size mismatch for encoder.sentence_encoder.byte_combine.convolutions.0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([4]). size mismatch for encoder.sentence_encoder.byte_combine.convolutions.1.weight: copying a param with shape torch.Size([128, 768, 2]) from checkpoint, the shape in current model is torch.Size([8, 1, 2]). size mismatch for encoder.sentence_encoder.byte_combine.convolutions.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for encoder.sentence_encoder.byte_combine.convolutions.2.weight: copying a param with shape torch.Size([192, 768, 3]) from checkpoint, the shape in current model is torch.Size([12, 1, 3]). size mismatch for encoder.sentence_encoder.byte_combine.convolutions.2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([12]). size mismatch for encoder.sentence_encoder.byte_combine.highway.layers.0.weight: copying a param with shape torch.Size([768, 384]) from checkpoint, the shape in current model is torch.Size([48, 24]). size mismatch for encoder.sentence_encoder.byte_combine.highway.layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([48]). size mismatch for encoder.sentence_encoder.byte_combine.highway.layers.1.weight: copying a param with shape torch.Size([768, 384]) from checkpoint, the shape in current model is torch.Size([48, 24]). size mismatch for encoder.sentence_encoder.byte_combine.highway.layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([48]). size mismatch for encoder.sentence_encoder.byte_combine.projection.weight: copying a param with shape torch.Size([768, 384]) from checkpoint, the shape in current model is torch.Size([768, 24]).

FiorellaArtuso closed this as completed May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetune: Cannot load model parameters from checkpoint #23

finetune: Cannot load model parameters from checkpoint #23

FiorellaArtuso commented May 24, 2022

FiorellaArtuso commented May 24, 2022

zztian007 commented Jan 24, 2024

finetune: Cannot load model parameters from checkpoint #23

finetune: Cannot load model parameters from checkpoint #23

Comments

FiorellaArtuso commented May 24, 2022

FiorellaArtuso commented May 24, 2022

zztian007 commented Jan 24, 2024