Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value Error :"Using pad_token, but it is not set yet." While using GPT-Neo model from Hugging Face #1418

Closed
gousemd73 opened this issue Feb 11, 2022 · 4 comments

Comments

@gousemd73
Copy link

Hi,
I am trying to use the GPT-Neo model from Hugging Face library to generate the sentence embedding using the Sentence Transformer Library.

from sentence-transformer import SentenceTransformer
gpt = SentenceTransformer('EleutherAI/gpt-neo-1.3B')
embeddings = gpt.encode(['This is example of using GPT'])

For the above code for generating the sentence embeddings, it is giving the following error.
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})

As far as I can, I didnot find any way to add tokenizer with special token as required using the sentence-transformer library. Can anyone please help me with this error.

Environment :
python - 3.8
sentence-transformer - 2.0.0
transformers - 4.11.1

@nreimers
Copy link
Member

This model is not support. I also don't think it will work well. Using encoder models like bert/roberta/mpnet-base work better

@gousemd73
Copy link
Author

Is there a chance of including this model in sentence-transformer package? So that we can easily generate sentence embeddings as like encoder models

@nreimers
Copy link
Member

Hmm, not sure if if will be easy. As it misses a padding token, it hard to use it in a batched fashion.

@gousemd73
Copy link
Author

The issue has been solved initiating the tokenizer before using for embedding generation.
modified code :

from sentence_transformers import SentenceTransformer
gpt = SentenceTransformer('EleutherAI/gpt-neo-1.3B')
gpt.tokenizer.pad_token = gpt.tokenizer.eos_token
embeddings = gpt.encode(['This is example of using GPT']

Thanks @nreimers ,For your quick reply and help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants