Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NER] Use BertPreTokenizer for pre-tokenization #2989

Merged
merged 3 commits into from
Mar 2, 2023

Conversation

cheungdaven
Copy link
Contributor

The backend pretokenizer will not split punctuation, so we switch it to BertPreTokenizer which will consider both space and punctuation when doing pre-tokenization.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The backend pretokenizer will not split punctuation, so we switch it to BertPreTokenizer which will consider both space and punctuation when doing pre-tokenization
@cheungdaven cheungdaven added the model list checked You have updated the model list after modifying multimodal unit tests/docs label Mar 2, 2023
@github-actions
Copy link

github-actions bot commented Mar 2, 2023

Job PR-2989-27c28cc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2989/27c28cc/index.html

@github-actions
Copy link

github-actions bot commented Mar 2, 2023

Job PR-2989-628d0a5 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2989/628d0a5/index.html

@cheungdaven cheungdaven merged commit 8d1308b into autogluon:master Mar 2, 2023
@cheungdaven cheungdaven deleted the ner_fix branch March 15, 2023 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model list checked You have updated the model list after modifying multimodal unit tests/docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants