Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document different package options #48

Open
transitive-bullshit opened this issue Jun 1, 2023 · 2 comments
Open

Document different package options #48

transitive-bullshit opened this issue Jun 1, 2023 · 2 comments

Comments

@transitive-bullshit
Copy link

transitive-bullshit commented Jun 1, 2023

It seems there are multiple NPM packages associated with this tiktoken port, and I wasn't able to find the differences clearly documented anywhere. (@dqbd/tiktoken, js-tiktoken, and tiktoken).

Langchainjs seems to be using js-tiktoken (reference and associated commit langchain-ai/langchainjs@d60eae5), so I'm going with that for now, but the readme on this project uses tiktoken instead of js-tiktoken, and @dqbd/tiktoken looks like it's still around.

@dqbd would love any clarity you can provide here, and thank you again for your amazing work on this project 🙏

Also, what does the js-tiktoken/lite version actually do differently than the other packages?

@dqbd
Copy link
Owner

dqbd commented Jun 2, 2023

Hello!

I got a little swamped with (school) work recently, so my apologies for the lack of documentation and clarity. I will update the README.md soon, but here are the gist of the changes and the rationale:

This repository maintains two packages.

  • tiktoken (formally hosted at @dqbd/tiktoken): WASM bindings for the original Python library, providing full 1-to-1 feature parity.
  • js-tiktoken: Pure JavaScript port of the original library with the core functionality, suitable for environments where WASM is not well supported or not desired (such as edge runtimes).

The reason to port the tiktoken to JS is mainly due to the constraints of edge environments (large WASM bundle, the necessary setup to get WASM working etc.) and toolchain-runtime combinations (#37). The issues are compounded when users are not using the package directly but rather as an dependency of an another library such as LangchainJS (langchain-ai/langchainjs#1239).

The plan going forward is to converge the APIs of both libraries to be interchangeable, allowing isomorphic behaviour (#43) and add appropriate documentation soon (with an additional PR for benchmarking both packages). Will close the issue after that is done :)

Hope that clears up!

@transitive-bullshit
Copy link
Author

First off, you rock @dqbd 🔥

This makes a ton of sense, and no worries about being swamped w/ school / work. Totally understand and it's all part of open source :)

Thanks for the thorough explanation – will update https://github.com/transitive-bullshit/compare-tokenizers and my other projects accordingly 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants