Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with BERT implementation #3

Open
ajfisch opened this issue Oct 4, 2019 · 2 comments
Open

Issue with BERT implementation #3

ajfisch opened this issue Oct 4, 2019 · 2 comments

Comments

@ajfisch
Copy link

ajfisch commented Oct 4, 2019

Hi,

It seems that you're trying to decode auto-regressively using BERT representations as a drop-in replacement for word embeddings. But BERT is bi-directional; the representation at token i has information about all tokens j > i. So, your model already knows what it needs to predict, before it predicts it.

In order for this to be correct you need to mask attention to all tokens j > i, which I don't think you do currently.

@leyuan
Copy link

leyuan commented Mar 29, 2020

@ajfisch I think you are right, by any chance you have fixed the issue?

@enes3774
Copy link

enes3774 commented Aug 2, 2022

@ajfisch I think you are right, by any chance you have fixed the issue?
I think you have to use autoregressive models such as gpt2. I think https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning is a high enough model for image captioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants