Skip to content

Why only use pre-trained BERT Tokenizer but not the entire pre-trained BERT model(including the pre-trained encoder)? #115

@KevinGoodman

Description

@KevinGoodman

I am not sure why the implementation only use the tokenizer from hugging face but did not use the pre-trained encoder. I mean why need to retrain the BERT-like transformer? Is the text embedding from the original BERT model not good enough? And why not use fine-tune instead of training from scratch?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions