Closed
Description
Context / Scenario
I want a new feature that lets us use date filters for searching documents. For example, if I search for "documents before 2021 with information on LLM," the system should first filter out documents from before 2021, then search these documents for 'LLM'. This mix of vector and regular search would help us find exactly what we need much faster.
The problem
Right now, our system only lets us filter documents by tags. We can't search documents by date for example, which makes it hard to find older documents quickly.
Proposed solution
-
Document Model Update:
- Modify the document model to include various data types like text and date.
- Example:
{ "documentId": "doc123", "content": "Here's the content", "publishDate": "2020-12-01" }
-
Data Indexing:
- Index documents by both content and metadata. For vector data, process text through an embedding model.
-
Query Processing:
- Create a parser to extract filters like dates from user queries and separate them from vector search terms.
- Example query: "Find documents before 2021 about renewable energy."
-
Search Execution:
- First, apply traditional filters (e.g., date). Then, within those results, perform a vector-based search.
- Use tools like Azure AI Search or Elasticsearch to handle both aspects.
-
Result Handling:
- Combine and display results, ensuring they meet both content relevance and specific property filters.
Importance
would be great to have