Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue using BertTokenizer (AttributeError) #119

Open
Khazmadoz opened this issue Dec 13, 2022 · 2 comments
Open

Issue using BertTokenizer (AttributeError) #119

Khazmadoz opened this issue Dec 13, 2022 · 2 comments

Comments

@Khazmadoz
Copy link

Hey there,

I'm trying to use your transofmers-interpret package, while using a bert-base-uncased model with the corresponding tokenizer. I'm loading the tokenizer using:
tokenizer = BertTokenizer.from_pretrained( BERT_USED_MODEL_PATH, do_lower_case=True )

When I use the interpreter, using:
from transformers_interpret import SequenceClassificationExplainer cls_explainer = SequenceClassificationExplainer(model=model, tokenizer=tokenizer)

I get an error:
Traceback (most recent call last): File "/home/khazmadoz/ML_GG/main.py", line 232, in <module> cls_explainer = SequenceClassificationExplainer(model=model, File "/home/khazmadoz/anaconda3/lib/python3.9/site-packages/transformers_interpret/explainers/text/sequence_classification.py", line 53, in __init__ super().__init__(model, tokenizer) File "/home/khazmadoz/anaconda3/lib/python3.9/site-packages/transformers_interpret/explainer.py", line 22, in __init__ self.ref_token_id = self.tokenizer.pad_token_id AttributeError: 'BertTokenizer' object has no attribute 'pad_token_id'

Do you know how to fix this? The tokenizer has an attribute 'pad_token' but no 'pad_token_id'.

Thank you,
David

@cdpierse
Copy link
Owner

Hi @Khazmadoz, is this a custom tokenizer? It seems odd that it would have a pad_token but not a pad_token_id which is just its numerical form, would you be able to run the code below and paste your output:

tokenizer = AutoTokenizer.from_pretrained(PATH_TO_YOUR_TOKENIZER)
print(tokenizer.all_special_tokens)
print(tokenizer.all_special_ids)

@Khazmadoz
Copy link
Author

Khazmadoz commented Dec 15, 2022

Hey @cdpierse,

the script I use was one of my colleagues who got it from another guy and I came to the conclusion that it was just not nicely coded. I adapted another example of twitter analysis using a multiclass model using an AutoTokenizer. Now it works nicely 😄 Thank you, for your help anyway. However, I wondered if there was a solution to display all the words in a dataset the fine-tuned model used for a decision in favor of a chosen class (I think in lime it’s kind of a tree). I don’t know, if you already implemented something like that, I will deep dive into the documentation to see, if there is something. Else, it would be a very nice feature to add.😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants