-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] URI Data Loader #1294
[ENH] URI Data Loader #1294
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
4bd1ab6
to
07480b6
Compare
c71c3ef
to
5848b67
Compare
Do we support the case where you want to add an image that you have preloaded but also give its uri so you can fetch it later? I think that will be a common usage pattern. |
Good catch. On the one hand, I can see that your image might be already loaded in memory and stored somewhere, so loading it again from the URI to embed it is redundant. We can avoid that by giving Additionally, it might imply the semantics that we store your images if you provide us a URL to a file on |
0ca441c
to
f3b080f
Compare
8e6a37a
to
5f8b197
Compare
## Description of changes This PR introduces multi-modal embeddings into Chroma. - It adds the generic `EmbeddingFunction` which can take various data types. Existing functions take the `Documents` type. - Adds `Images` as a type (numpy NDArray taking ints or floats) - Add `OpenCLIPEmbeddingFunction` which is an `EmbeddingFunction[Union[Documents, Images]]` ## Test Integration tests pass. A new test for multimodal embedding functions: [chromadb/test/ef/test_multimodal_ef.py](https://github.com/chroma-core/chroma/blob/86a9e2620352ee0b2844bc3233f9e001cc4aa3d9/chromadb/test/ef/test_multimodal_ef.py) ## Documentation See #1294 ## TODOs - [x] Tests - [x] ~Wiring through FastAPI~ Nothing to wire through - [x] Documentation - [x] Telemetry - [ ] ~JavaScript~
## Description of changes This PR introduces multi-modal embeddings into Chroma. - It adds the generic `EmbeddingFunction` which can take various data types. Existing functions take the `Documents` type. - Adds `Images` as a type (numpy NDArray taking ints or floats) - Add `OpenCLIPEmbeddingFunction` which is an `EmbeddingFunction[Union[Documents, Images]]` ## Test Integration tests pass. A new test for multimodal embedding functions: [chromadb/test/ef/test_multimodal_ef.py](https://github.com/chroma-core/chroma/blob/86a9e2620352ee0b2844bc3233f9e001cc4aa3d9/chromadb/test/ef/test_multimodal_ef.py) ## Documentation See #1294 ## TODOs - [x] Tests - [x] ~Wiring through FastAPI~ Nothing to wire through - [x] Documentation - [x] Telemetry - [ ] JavaScript
Pending: * chroma-core/chroma#1294 * chroma-core/chroma#1293 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>
Pending: * chroma-core/chroma#1294 * chroma-core/chroma#1293 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>
Pending: * chroma-core/chroma#1294 * chroma-core/chroma#1293 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>
Description of changes
This PR adds URIs and DataLoaders into Chroma.
DataLoader
works likeEmbeddingFunction
, except it takes aURIs
and outputs the specified datatype.DataLoader
usingpillow
for image file loading.uris
as a field onadd
,query
, as well as aninclude
fielddata
as aninclude
fieldURIs specify a place where data can be loaded from, and can be used to load data for embedding, or as the result of retrieval.
This makes multimodal retrieval with data stored externally as files seamless and extensible.
This PR is stacked on #1293
Test
Integration tests pass.
A new unit test for data loaders: https://github.com/chroma-core/chroma/blob/c71c3efa15d2a9252db26470b9ff1cdb10a2b681/chromadb/test/data_loader/test_data_loader.py
Try the notebook: https://github.com/chroma-core/chroma/blob/c71c3efa15d2a9252db26470b9ff1cdb10a2b681/examples/multimodal/multimodal_retrieval.ipynb
Documentation
Documentation for this and #1293 chroma-core/docs#157
TODOs