Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more Tokenizer functionality - Create without download sync/async ; Trim APIs #7043

Closed
tarekgh opened this issue Mar 1, 2024 · 0 comments · Fixed by #7047
Closed

Add more Tokenizer functionality - Create without download sync/async ; Trim APIs #7043

tarekgh opened this issue Mar 1, 2024 · 0 comments · Fixed by #7047
Assignees

Comments

@tarekgh
Copy link
Member

tarekgh commented Mar 1, 2024

We need to incorporate the following enhancements into the tokenizer:

  • Enable the creation of tokenizers with streaming capability to avoid on-demand downloading of vocabulary files.
  • Introduce an API to facilitate encoding up to a specified maximum token count.
  • Introduce API to support encoding text from the end up to the maximum count.
@tarekgh tarekgh self-assigned this Mar 1, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged label Mar 1, 2024
@tarekgh tarekgh added Tokenizers and removed untriaged New issue has not been triaged labels Mar 1, 2024
@ericstj ericstj changed the title Add more Tokenizer functionality Add more Tokenizer functionality - Create without download sync/async ; Trim APIs Mar 4, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant