Why only use pre-trained BERT Tokenizer but not the entire pre-trained BERT model(including the pre-trained encoder)?

I am not sure why the implementation only use the tokenizer from hugging face but did not use the pre-trained encoder. I mean why need to retrain the BERT-like transformer? Is the text embedding from the original BERT model not good enough?  And why not use fine-tune instead of training from scratch?