Fix loading custom vocab in transformers style for LM finetuning #155

tholor · 2019-11-20T11:14:23Z

LM finetuning with custom vocab was broken after switching to the transformer style of handling custom vocab (adding tokens instead of using "unused tokens").

The complication here is that with a larger vocab we need to adjust both the size of the embedding layer in the LM and the decoder (bias+weights) in the PH. In addition, the decoder shares the weights with the embedding layer.

We therefore need to supply now an extra arg "n_added_tokens" for loading the PH / LM.

Example:

    ...
    tokenizer.add_tokens(["somecustomtoken", "specialrareword"])

    ...
    language_model = LanguageModel.load(lang_model, n_added_tokens=len(tokenizer.added_tokens_decoder))
    lm_prediction_head = BertLMHead.load(lang_model, n_added_tokens=len(tokenizer.added_tokens_decoder))

brandenchan

Looks good, I just think a clearer info message would be good

farm/modeling/language_model.py

tholor added 2 commits November 20, 2019 12:16

Fix loading custom vocab in transformers style for LM finetuning

ec3d7f9

Make loading of PH weights strict again

3684b37

tholor force-pushed the fix_custom_vocab_lm_finetuning branch from be010ac to 3684b37 Compare November 20, 2019 11:17

tholor requested a review from brandenchan November 20, 2019 11:26

brandenchan reviewed Nov 20, 2019

View reviewed changes

farm/modeling/language_model.py Outdated Show resolved Hide resolved

prettify info message

71b918e

tholor added bug Something isn't working part: model task: LM fine Language model fine-tuning labels Nov 20, 2019

tholor merged commit 484d26c into master Nov 20, 2019

tholor deleted the fix_custom_vocab_lm_finetuning branch April 28, 2020 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix loading custom vocab in transformers style for LM finetuning #155

Fix loading custom vocab in transformers style for LM finetuning #155

tholor commented Nov 20, 2019

brandenchan left a comment

Fix loading custom vocab in transformers style for LM finetuning #155

Fix loading custom vocab in transformers style for LM finetuning #155

Conversation

tholor commented Nov 20, 2019

brandenchan left a comment

Choose a reason for hiding this comment