You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Notebook 2 currently has a placeholder section discussing how model accuracy might be improved by self-supervised pre-training (such as masked language modelling) on a broader corpus of unlabelled data, before fine-tuning on the end task with annotations.
However, the model training scripts in notebooks/src aren't set up for that yet.
It'd be great if we can flesh this out to a point where users can optionally run a pre-training job with Textract JSON only (ideally still with the ability to start from a pre-trained model from HF model zoo); and then use that training job instead of the model zoo models, as the starting point for the fine-tuning task.
The text was updated successfully, but these errors were encountered:
Refactor LayoutLM model training & inference code to support MLM
pre-training before NER fine-tuning, and present steps in notebook.
Upgrade HF container versions 4.6->4.11.
Draft for #3
Notebook 2 currently has a placeholder section discussing how model accuracy might be improved by self-supervised pre-training (such as masked language modelling) on a broader corpus of unlabelled data, before fine-tuning on the end task with annotations.
However, the model training scripts in
notebooks/src
aren't set up for that yet.It'd be great if we can flesh this out to a point where users can optionally run a pre-training job with Textract JSON only (ideally still with the ability to start from a pre-trained model from HF model zoo); and then use that training job instead of the model zoo models, as the starting point for the fine-tuning task.
The text was updated successfully, but these errors were encountered: