Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checking existence of file before copy #2578

Merged
merged 4 commits into from
Jun 20, 2022

Conversation

wj-Mcat
Copy link
Contributor

@wj-Mcat wj-Mcat commented Jun 20, 2022

PR types

Bug fixes

PR changes

Models

Description

try to fix: #2577

@guoshengCS
Copy link
Collaborator

The other tokenizers' failure caused by auto-saving in from_pretrained also need fix.

@wj-Mcat
Copy link
Contributor Author

wj-Mcat commented Jun 20, 2022

After update the save_resouce method, there are only three tokenizers will occur the error.

  • error table
+-----------------------+------------------------------------------------------+----------+
| bert-class            | resouce                                              | result   |
+=======================+======================================================+==========+
| BertJapaneseTokenizer | cl-tohoku/bert-base-japanese                         | False    |
+-----------------------+------------------------------------------------------+----------+
| BertJapaneseTokenizer | cl-tohoku/bert-base-japanese-whole-word-masking      | False    |
+-----------------------+------------------------------------------------------+----------+
| BertJapaneseTokenizer | cl-tohoku/bert-base-japanese-char                    | False    |
+-----------------------+------------------------------------------------------+----------+
| BertJapaneseTokenizer | cl-tohoku/bert-base-japanese-char-whole-word-masking | False    |
+-----------------------+------------------------------------------------------+----------+
| ChineseBertTokenizer  | ChineseBERT-base                                     | False    |
+-----------------------+------------------------------------------------------+----------+
| ErnieTokenizer        | rocketqa-v1-marco-query-encoder                      | False    |
+-----------------------+------------------------------------------------------+----------+
| ErnieTokenizer        | rocketqa-v1-marco-para-encoder                       | False    |
+-----------------------+------------------------------------------------------+----------+
| ErnieTokenizer        | rocketqa-v1-marco-cross-encoder                      | False    |
+-----------------------+------------------------------------------------------+----------+

@wj-Mcat
Copy link
Contributor Author

wj-Mcat commented Jun 20, 2022

Please take some time to review this PR. @guoshengCS

Copy link
Collaborator

@guoshengCS guoshengCS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guoshengCS
Copy link
Collaborator

guoshengCS commented Jun 20, 2022

After update the save_resouce method, there are only three tokenizers will occur the error.

  • error table
+-----------------------+------------------------------------------------------+----------+
| bert-class            | resouce                                              | result   |
+=======================+======================================================+==========+
| BertJapaneseTokenizer | cl-tohoku/bert-base-japanese                         | False    |
+-----------------------+------------------------------------------------------+----------+
| BertJapaneseTokenizer | cl-tohoku/bert-base-japanese-whole-word-masking      | False    |
+-----------------------+------------------------------------------------------+----------+
| BertJapaneseTokenizer | cl-tohoku/bert-base-japanese-char                    | False    |
+-----------------------+------------------------------------------------------+----------+
| BertJapaneseTokenizer | cl-tohoku/bert-base-japanese-char-whole-word-masking | False    |
+-----------------------+------------------------------------------------------+----------+
| ChineseBertTokenizer  | ChineseBERT-base                                     | False    |
+-----------------------+------------------------------------------------------+----------+
| ErnieTokenizer        | rocketqa-v1-marco-query-encoder                      | False    |
+-----------------------+------------------------------------------------------+----------+
| ErnieTokenizer        | rocketqa-v1-marco-para-encoder                       | False    |
+-----------------------+------------------------------------------------------+----------+
| ErnieTokenizer        | rocketqa-v1-marco-cross-encoder                      | False    |
+-----------------------+------------------------------------------------------+----------+

Get it. Thanks

@guoshengCS guoshengCS merged commit b3d9348 into PaddlePaddle:develop Jun 20, 2022
@wj-Mcat wj-Mcat deleted the fix-ernie-tiny-tokenizer branch June 21, 2022 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ernie-tiny tokenizer can't save the special_tokens_map.json file when it exists
2 participants