Semantic search #17822
Replies: 6 comments 5 replies
-
Hi @rijkvanzanten, are you planning to introduce support for pgvector extension for PostgrSQL in Directus, in the near future? |
Beta Was this translation helpful? Give feedback.
-
This seems like something you could accomplish through an extension |
Beta Was this translation helpful? Give feedback.
-
Heya! Thanks for opening this feature request! This feature request has received over 15 votes from the community. This means we'll move this feature request to the Under Review state! The Core team will schedule a meeting to review this request as soon as possible. The discussion will then be approved or denied. You may or may not be invited to join this meeting with the core team. For more information, see our Feature Request Process. |
Beta Was this translation helpful? Give feedback.
-
My gosh this would be an excellent feature. Right now we are using Supabase just for the embeddings support, which is just pgvector. We use Directus for literally everything else. If this was natively supported, that would be amazing. I can imagine it being a pretty big piece of marketing as well. You could even easily add a LangChain integration. Which I would imagine, could attract more users or companies currently building AI stacks. Something to think about. I will probably build an extension in the meantime. |
Beta Was this translation helpful? Give feedback.
-
@rijkvanzanten I'm sure you seen this but: https://typesense.org/ is open source and could fulfill this need without needing to solely rely on Postgres or having to pay for Algolia/Elastic |
Beta Was this translation helpful? Give feedback.
-
Any update on this one? It would be amazing to be able to identify a field as a 'vector field' that we could fill with an arbitrary vector value, and then it would magically become vector-searchable through a set of new API endpoints/filters! We could start by supporting SQLite VSS (e.g. through LangChain https://python.langchain.com/docs/integrations/vectorstores/sqlitevss/) to give an awesome out-of-the-box experience for self-hosted. Would be great! |
Beta Was this translation helpful? Give feedback.
-
Summary
Support pgvecotr extension for PostgreSQL or support for the Pinecone database for semantic indexing and search using embeddings.
We need to mark which fields we wish to index for different collections and these will then become searchable using semantic search.
Basic Example
Operator: Semantic search
Finds matches based on semantic similarity rather than the exact matching of keywords or phrases. It uses word embeddings and cosine similarity techniques to identify semantically similar content to the search query.
Other alternatives
_similar_to
,_corresponding
,_equivalent
.Motivation
There are several potential use cases for semantic search in Directus, including:
I think it can be an elegant addition to what Directus has to offer, which might impact Directus adoption positively as semantic search, in combination with some intelligent OpenAI operations in Flows, will be very popular.
It will be possible to create stores for OpenAI chat to make it more aware of a specific subject. Here is support database is a good example. It will also be a great addition to conventional search capabilities. Here are a few real-life examples where it can be helpful concerning semantic search functionality:
Detailed Design
The main idea behind this feature is to enable Directus to perform semantic searches by leveraging the power of vector embeddings using OpenAI
text-embedding-ada-002 data
model that can help with the following:_sem
into embedding or vector representation of their own and then comparing the distances between the vectors using the functionality of pgvector or Pinecone.It will fit right into the overall experience of Directus filters, besides the indexing part, where it might require some explanation and additional field configuration.
Requirements List
Must Have:
Should Have:
text-embedding-ada-002
model for generating embeddings.Could Have:
Won't Have:
Drawbacks
One potential drawback is the code complexity required to implement these features. Even if the amount of code required is relatively small, it could still add to the overall complexity of the Directus codebase and make it harder to maintain and debug in the future.
Additionally, while OpenAI's APIs may be easy to implement, there could still be issues with integration and maintenance over time. OpenAI is a third-party platform, and changes to its APIs or services could unexpectedly impact Directus. This could lead to additional development and maintenance costs over time.
Alternatives
Not sure right now what good alternatives there are, but it might require additional investigation.
Adoption Strategy
Making semantic search available will make a lot of sense for anyone and will be easy to adopt.
Unresolved Questions
Maybe support for pgvector is sufficient. How much better Pinecone is?
Beta Was this translation helpful? Give feedback.
All reactions