-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Multimodal Embeddings #1293
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
atroyn
commented
Oct 26, 2023
HammadB
reviewed
Nov 6, 2023
HammadB
reviewed
Nov 6, 2023
HammadB
reviewed
Nov 6, 2023
HammadB
reviewed
Nov 6, 2023
HammadB
approved these changes
Nov 7, 2023
atroyn
added a commit
that referenced
this pull request
Nov 7, 2023
atroyn
added a commit
that referenced
this pull request
Nov 7, 2023
## Description of changes This PR adds URIs and DataLoaders into Chroma. - `DataLoader` works like `EmbeddingFunction`, except it takes a `URIs` and outputs the specified datatype. - A `DataLoader` using `pillow` for image file loading. - Adds `uris` as a field on `add`, `query`, as well as an `include` field - Adds `data` as an `include` field URIs specify a place where data can be loaded from, and can be used to load data for embedding, or as the result of retrieval. This makes multimodal retrieval with data stored externally as files seamless and extensible. This PR is stacked on #1293 ## Test Integration tests pass. A new unit test for data loaders: https://github.com/chroma-core/chroma/blob/c71c3efa15d2a9252db26470b9ff1cdb10a2b681/chromadb/test/data_loader/test_data_loader.py Try the notebook: https://github.com/chroma-core/chroma/blob/c71c3efa15d2a9252db26470b9ff1cdb10a2b681/examples/multimodal/multimodal_retrieval.ipynb ## Documentation Documentation for this and #1293 chroma-core/docs#157 ## TODOs - [x] Concurrent Loading - [x] Tests - [x] Wiring through FastAPI - [x] Documentation
baskaryan
added a commit
to langchain-ai/langchain
that referenced
this pull request
Nov 10, 2023
Pending: * chroma-core/chroma#1294 * chroma-core/chroma#1293 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>
pprados
pushed a commit
to pprados/langchain
that referenced
this pull request
Nov 20, 2023
Pending: * chroma-core/chroma#1294 * chroma-core/chroma#1293 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>
xieqihui
pushed a commit
to xieqihui/langchain
that referenced
this pull request
Nov 21, 2023
Pending: * chroma-core/chroma#1294 * chroma-core/chroma#1293 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes
This PR introduces multi-modal embeddings into Chroma.
EmbeddingFunction
which can take various data types. Existing functions take theDocuments
type.Images
as a type (numpy NDArray taking ints or floats)OpenCLIPEmbeddingFunction
which is anEmbeddingFunction[Union[Documents, Images]]
Test
Integration tests pass.
A new test for multimodal embedding functions: chromadb/test/ef/test_multimodal_ef.py
Documentation
See #1294
TODOs
Wiring through FastAPINothing to wire through