Special token treatment. #81

prwoolley · 2024-04-10T13:15:29Z

By default, the tokenizer adds special tokens to the "input_ids", specifically [CLS] at the beginning and [SEP] at the end of each token array. Was DNABERT-2 trained with these tokens present? If so, has the [CLS] token been used for finetuning, as an alternative to mean pooling?

Thanks for the model!

Zhihan1996 · 2024-05-28T06:24:34Z

Sorry for the late reply!

Yes, DNABERT-2 is trained with these tokens present. And by default, it is fine-tuned with [CLS] token instead of mean pooling.

Zhihan1996 closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special token treatment. #81

Special token treatment. #81

prwoolley commented Apr 10, 2024

Zhihan1996 commented May 28, 2024

Special token treatment. #81

Special token treatment. #81

Comments

prwoolley commented Apr 10, 2024

Zhihan1996 commented May 28, 2024