-
Notifications
You must be signed in to change notification settings - Fork 20
[FEATURE] add pretrained model BERT #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #100 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 46 48 +2
Lines 1371 1488 +117
==========================================
+ Hits 1371 1488 +117
Continue to review full report at Codecov.
|
| gradient_accumulation_steps=gradient_accumulation_steps, | ||
| ) | ||
|
|
||
| trainer = Trainer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the Trainer uses raw items to train, which only use the original AutoTokenizer in BertTokenizer. In this case, the spectial tokens of EduNLP in items are not parsed by the PureTextTokenizer in BertTokenizer. If it should provide a data-prepossing before train?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input items of function finetune_bert needs to be tokenized by BertTokenizer, which means the special tokens have been already mapped to token_ids. In this function, the tokenizer is not used actually, but only used for getting some attributes (e.g. the size of vocabularies). The example of this function can be found in tests/test_vec/test_bert.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will complete the code comments for better understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, i get it.
| return self.len | ||
|
|
||
|
|
||
| def finetune_bert(items, output_dir, pretrain_model="bert-base-chinese", train_params=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please complete the code comments of funcitons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending a pull request!
Please make sure you click the link above to view the contribution guidelines,
then fill out the blanks below.
Description
(Brief description on what this PR is about)
Add pretrained model BERT.
What does this implement/fix? Explain your changes.
Pull request type
Changes
Does this close any currently open issues?
issue #64
Any relevant logs, error output, etc?
N/A
Checklist
Before you submit a pull request, please make sure you have to following:
Essentials
Comments