Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in evaluate #4

Closed
Wangpeiyi9979 opened this issue May 19, 2021 · 15 comments
Closed

Bug in evaluate #4

Wangpeiyi9979 opened this issue May 19, 2021 · 15 comments

Comments

@Wangpeiyi9979
Copy link

Hi, when I run
python bin/predict_amrs.py \ --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \ --gold-path data/tmp/amr2.0/gold.amr.txt \ --pred-path data/tmp/amr2.0/pred.amr.txt \ --checkpoint runs/<checkpoint>.pt \ --beam-size 5 \ --batch-size 500 \ --device cuda \ --penman-linearization --use-pointer-tokens

I meet a problem:

RuntimeError: Error(s) in loading state_dict for AMRBartForConditionalGeneration:
size mismatch for final_logits_bias: copying a param with shape torch.Size([1, 53587]) from checkpoint, the shape in current model is torch.Size([1, 53075])

could you help me?

@Wangpeiyi9979
Copy link
Author

Wangpeiyi9979 commented May 24, 2021

the <checkpoint>.pt is the saved checkpoint of the training process.

@mbevila
Copy link
Collaborator

mbevila commented May 24, 2021

So you have trained your own model? Does it work with the pretrained checkpoints we released?

@Wangpeiyi9979
Copy link
Author

Oh, sorry, it works well after I re-download the bart-large. It is strange.

@eloitanguy
Copy link

Hello, I believe I am experiencing the same issue.
I have been trying to use predict_amrs_from_plaintext.py in order to test out text-to-AMR pre-trained checkpoint on my own text files.

I am also meeting Wangpeiyi9979's error:

RuntimeError: Error(s) in loading state_dict for AMRBartForConditionalGeneration:
        size mismatch for final_logits_bias: copying a param with shape torch.Size([1, 53587]) from checkpoint, the shape in current model is torch.Size([1, 53075]).
        size mismatch for model.shared.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]).
        size mismatch for model.encoder.embed_tokens.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]).
        size mismatch for model.decoder.embed_tokens.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]).

Since Wangpeiyi9979 mentioned this issue could come from BART-large, I deleted all the model cache, tried again, and met the same error.

Additional information:

  • I am running a separate conda env (python 3.7)
  • I have followed the README's installation instructions. (notably I do have transformers 2.11.0, which will be a problem on my end later on since I would like to integrate this AMR pipeline in my research project's pipeline)

I would like to apologise if this error comes from a poor use of your code or an improper installation.

Thank you very much for your work on AMR and thanks in advance for your response!

@mbevila
Copy link
Collaborator

mbevila commented Jun 29, 2021

Can you try to redownload the pretrained weights? We have pruned (I think a few months back) a few params in the checkpoint that did not play well with the current code.

@eloitanguy
Copy link

Thank you for your answer! I re-downloaded the 3.0 parsing weights and the same issue arose...

@eloitanguy
Copy link

Hello again, if you have the time i would greatly appreciate some help, my issue still hasn't resolved... thanks in advance!

@mbevila
Copy link
Collaborator

mbevila commented Jul 5, 2021

Try to use this checkpoint:

https://drive.google.com/file/d/1p7oyQPacWSF-WTXapaA55TPuRP_pJ-Rc/view?usp=sharing

Does it work?

@mbevila mbevila reopened this Jul 5, 2021
@eloitanguy
Copy link

Thank you for taking the time to send me this checkpoint, I'm sorry to say I still have the same error...

@mbevila
Copy link
Collaborator

mbevila commented Jul 6, 2021

Clone again the repository and create a new env from scratch. It worked for me! As a last resort, try patching the checkpoint with https://github.com/SapienzaNLP/spring/blob/main/bin/patch_legacy_checkpoint.py, but the checkpoint I sent you was already patched.

@eloitanguy
Copy link

Hello again, I create a new conda env (with python 3.7), installed the requirements using the requirements.txt and ran:

python bin/predict_amrs_from_plaintext.py --texts test_text.txt --checkpoint AMR3.amr-lin3.patched.pt

And observed the usual result:

wandb: WARNING W&B installed but not logged in.  Run `wandb login` or set the WANDB_API_KEY env variable.
Traceback (most recent call last):
  File "bin/predict_amrs_from_plaintext.py", line 90, in <module>
    model.load_state_dict(torch.load(args.checkpoint, map_location='cpu')['model'])
  File "/home/eloi/miniconda3/envs/spring/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for AMRBartForConditionalGeneration:
	size mismatch for final_logits_bias: copying a param with shape torch.Size([1, 53587]) from checkpoint, the shape in current model is torch.Size([1, 53075]).
	size mismatch for model.shared.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]).
	size mismatch for model.encoder.embed_tokens.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]).
	size mismatch for model.decoder.embed_tokens.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]).

I am as puzzled as you are, i really don't see why it would work differently on our different machines...

@eloitanguy
Copy link

FYI, I tried the same thing on a different machine and encountered the same error :(

@mbevila
Copy link
Collaborator

mbevila commented Jul 8, 2021

You are missing two arguments: --penman-linearization --use-pointer-tokens

@eloitanguy
Copy link

Thank you very much, this seems to have been my issue all along, embarrassingly enough. This issue is solved, I'll now be looking into whether it would be possible to run this with the latest version of transformers (I'm telling you in case you have insight of that matter), I'll let you know if it works out! ^^

@mbevila
Copy link
Collaborator

mbevila commented Jul 8, 2021

Thanks a lot!

In the meanwhile I'll close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants