Skip to content

[Feature Request] Hybrid Vector and Traditional Search #416

Closed
@KSemenenko

Description

@KSemenenko

Context / Scenario

I want a new feature that lets us use date filters for searching documents. For example, if I search for "documents before 2021 with information on LLM," the system should first filter out documents from before 2021, then search these documents for 'LLM'. This mix of vector and regular search would help us find exactly what we need much faster.

The problem

Right now, our system only lets us filter documents by tags. We can't search documents by date for example, which makes it hard to find older documents quickly.

Proposed solution

  1. Document Model Update:

    • Modify the document model to include various data types like text and date.
    • Example: { "documentId": "doc123", "content": "Here's the content", "publishDate": "2020-12-01" }
  2. Data Indexing:

    • Index documents by both content and metadata. For vector data, process text through an embedding model.
  3. Query Processing:

    • Create a parser to extract filters like dates from user queries and separate them from vector search terms.
    • Example query: "Find documents before 2021 about renewable energy."
  4. Search Execution:

    • First, apply traditional filters (e.g., date). Then, within those results, perform a vector-based search.
    • Use tools like Azure AI Search or Elasticsearch to handle both aspects.
  5. Result Handling:

    • Combine and display results, ensuring they meet both content relevance and specific property filters.

Importance

would be great to have

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions