Skip to content

Commit

Permalink
some tokenizers do not have additional_special_tokens_ids attribute (#…
Browse files Browse the repository at this point in the history
…5642) (#5648)

Signed-off-by: arendu <adithya.r@gmail.com>

Signed-off-by: arendu <adithya.r@gmail.com>

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
  • Loading branch information
3 people authored and erastorgueva-nv committed Jan 12, 2023
1 parent bdaf431 commit 4e292c9
Showing 1 changed file with 7 additions and 3 deletions.
Expand Up @@ -128,9 +128,13 @@ def validation_step(self, batch, batch_idx, inference=False):
idx = pred.index(self.tokenizer.eos_id)
pred = pred[:idx]

pred = [id for id in pred if id not in self.tokenizer.tokenizer.additional_special_tokens_ids]
label = [id for id in label if id not in self.tokenizer.tokenizer.additional_special_tokens_ids]
enc_input = [id for id in enc_input if id not in self.tokenizer.tokenizer.additional_special_tokens_ids]
additional_special_tokens_ids = []
if hasattr(self.tokenizer.tokenizer, "additional_special_tokens_ids"):
additional_special_tokens_ids = self.tokenizer.tokenizer.additional_special_tokens_ids

pred = [id for id in pred if id not in additional_special_tokens_ids]
label = [id for id in label if id not in additional_special_tokens_ids]
enc_input = [id for id in enc_input if id not in additional_special_tokens_ids]

pred = self.tokenizer.ids_to_text(pred)
label = self.tokenizer.ids_to_text(label)
Expand Down

0 comments on commit 4e292c9

Please sign in to comment.