-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default Embedding Function for JS #1382
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Just FYI both web-ai and transformers.js use ONNX runtime under the hood. |
Echoing Hammad, and as we discussed in person, my preference here would be for us to bundle the same onnx model we ship with python into the JS client, using a framework which supports ONNX models, rather than asking a user to pull in a separate dependency (250MB worth) if they want to use our default. |
@atroyn ok we will need to see if that is possible. even the MSFT onnx-runtime which I assume is pretty narrow is very heavy. |
The MSFT onnx runtime appears to be only ~3.65MB? https://www.npmjs.com/package/onnxjs |
without a lot more surgery (rolling our own solution
The other alternative would be to bundle it in |
@xenova im not sure how i got this wrong, i can confirm |
Another idea here is to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me!
how do you use these in js? I tried: import { embeddingFunction } from 'chromadb-default-embed'; |
@Cinemacloud we havent released the JS client fully yet (though probably will tonight) can you try installing |
It works thanks! |
End dx: `npm install chromadb chromadb-default-embed` `chromadb-default-embed` is a fork of `@xenova/transfomers` to maintain stability *** Motivation - good defaults are good DX - good defaults lower the barrier to getting started - currently JS usage is gated on having an API key (seems bad) `npm install --save chromadb @xenova/transformers` - We want to use the same EF as `python` - `all-MiniLM-L6-v2` - We want to keep our default package size small (currently 4.5mb) - We want a happy path for devs getting started - they shouldnt need to create any accounts or get API keys - `@xenova/transformers` is great, but it's huge - ~250MB! - so we can't by default bundle it - To have a happy path, but keep the bundle size small - we just ask users to run `npm install --save chromadb @xenova/transformers` to install chroma. we can add a comment like `// optional default embedding function` I also evaluated `https://github.com/visheratin/web-ai` - which is small (~8MB), but I dont think it supports this model yet? (thought potentially possible) and https://github.com/microsoft/onnxruntime/tree/main, which is also massive (over 100MB). I confirmed that if you just install `chromadb` and pass `OpenAIEmbeddingFunction` (or other) - it doesn't complain or yell at you that you have a missing dep. If you true to use the default and don't have `@xenova/transformers` installed, it will tell you to use it. Todo - [ ] test no require warnings in `nextjs` - this has been an issue in the past Thoughts about this DX?
@Cinemacloud super |
End dx:
npm install chromadb chromadb-default-embed
chromadb-default-embed
is a fork of@xenova/transfomers
to maintain stabilityMotivation
npm install --save chromadb @xenova/transformers
python
-all-MiniLM-L6-v2
@xenova/transformers
is great, but it's huge - ~250MB! - so we can't by default bundle itnpm install --save chromadb @xenova/transformers
to install chroma. we can add a comment like// optional default embedding function
I also evaluated
https://github.com/visheratin/web-ai
- which is small (~8MB), but I dont think it supports this model yet? (thought potentially possible) and https://github.com/microsoft/onnxruntime/tree/main, which is also massive (over 100MB).I confirmed that if you just install
chromadb
and passOpenAIEmbeddingFunction
(or other) - it doesn't complain or yell at you that you have a missing dep. If you true to use the default and don't have@xenova/transformers
installed, it will tell you to use it.Todo
nextjs
- this has been an issue in the pastThoughts about this DX?