[Enhancement] Support self-supervised pre-training #3

athewsey · 2021-11-03T08:13:15Z

Notebook 2 currently has a placeholder section discussing how model accuracy might be improved by self-supervised pre-training (such as masked language modelling) on a broader corpus of unlabelled data, before fine-tuning on the end task with annotations.

However, the model training scripts in notebooks/src aren't set up for that yet.

It'd be great if we can flesh this out to a point where users can optionally run a pre-training job with Textract JSON only (ideally still with the ability to start from a pre-trained model from HF model zoo); and then use that training job instead of the model zoo models, as the starting point for the fine-tuning task.

The text was updated successfully, but these errors were encountered:

Refactor LayoutLM model training & inference code to support MLM pre-training before NER fine-tuning, and present steps in notebook. Upgrade HF container versions 4.6->4.11. Draft for #3

athewsey added enhancement New feature or request good first issue Good for newcomers labels Nov 3, 2021

athewsey mentioned this issue Nov 17, 2021

Optional self-supervised pre-training #4

Merged

athewsey self-assigned this Nov 17, 2021

athewsey closed this as completed in f3ae99e Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Support self-supervised pre-training #3

[Enhancement] Support self-supervised pre-training #3

athewsey commented Nov 3, 2021

[Enhancement] Support self-supervised pre-training #3

[Enhancement] Support self-supervised pre-training #3

Comments

athewsey commented Nov 3, 2021