Issue with BERT implementation #3

ajfisch · 2019-10-04T19:05:46Z

Hi,

It seems that you're trying to decode auto-regressively using BERT representations as a drop-in replacement for word embeddings. But BERT is bi-directional; the representation at token i has information about all tokens j > i. So, your model already knows what it needs to predict, before it predicts it.

In order for this to be correct you need to mask attention to all tokens j > i, which I don't think you do currently.

leyuan · 2020-03-29T21:15:21Z

@ajfisch I think you are right, by any chance you have fixed the issue?

enes3774 · 2022-08-02T12:01:36Z

@ajfisch I think you are right, by any chance you have fixed the issue?
I think you have to use autoregressive models such as gpt2. I think https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning is a high enough model for image captioning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with BERT implementation #3

Issue with BERT implementation #3

ajfisch commented Oct 4, 2019

leyuan commented Mar 29, 2020

enes3774 commented Aug 2, 2022

Issue with BERT implementation #3

Issue with BERT implementation #3

Comments

ajfisch commented Oct 4, 2019

leyuan commented Mar 29, 2020

enes3774 commented Aug 2, 2022