Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable more tokenizers, s.a. SentencePiece #2

Open
MarkusSagen opened this issue Mar 12, 2022 · 0 comments
Open

Enable more tokenizers, s.a. SentencePiece #2

MarkusSagen opened this issue Mar 12, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request feature request New feature or functionality wanted

Comments

@MarkusSagen
Copy link
Collaborator

Current implementation relies on converting tokenizers to Tensorflows BertTokenizer/WordPieceToeknizer
Ideally, we would like to map these to work with more tokenizers, starting with SentencePiece

@MarkusSagen MarkusSagen added the enhancement New feature or request label Mar 12, 2022
@MarkusSagen MarkusSagen changed the title Extend tokenizers to no-BERT / WordPiece tokenizers [Feature Request] Enable more tokenizers, s.a. SentencePiece Mar 12, 2022
This was referenced Mar 12, 2022
@MarkusSagen MarkusSagen self-assigned this Mar 12, 2022
@MarkusSagen MarkusSagen added the feature request New feature or functionality wanted label Mar 12, 2022
@MarkusSagen MarkusSagen changed the title [Feature Request] Enable more tokenizers, s.a. SentencePiece Enable more tokenizers, s.a. SentencePiece Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request New feature or functionality wanted
Projects
None yet
Development

No branches or pull requests

1 participant