Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add filter_policy init parameter to all retrievers #781

Closed
wants to merge 6 commits into from

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Jun 3, 2024

Why:

The changes across various integrations (Astra, Chroma, Elasticsearch, MongoDB Atlas, OpenSearch, pgvector, Pinecone, Qdrant, and Weaviate) address a common need for more flexible filtering options within the retrieval process. By introducing a filter policy option (replace or merge), developers can now control how runtime filters are applied relative to initialization time filters.

What:

  • Added filter_policy parameter with options replace and merge across multiple retrievers to control filter behavior dynamically.
  • Adjustments made to initialize, serialize (to_dict), and run methods to support the new filter conservatively policy behavior.
  • Unit tests were updated or added to validate the new functionality.

How can it be used:

  • Dynamic Filter Behavior Adjustment: Users can decide whether to completely override the initial filters set during the retriever's initialization (replace) or merge them with runtime filters, with the latter taking precedence (merge).

    # Example: Using the filter_policy parameter
    retriever = SomeRetriever(document_store=my_doc_store, filters={"status": "active"}, filter_policy="merge")
  • Complex Search Scenarios:

    • In cases where the context of a query might dictate altering pre-set filters without discarding them, the merge option allows for an additive approach.
    • For strict query contexts that require ignoring initial filters, the replace option offers a clean slate for filters at runtime.

How did you test it:

  • Unit tests were enhanced or newly created to cover both replace and merge scenarios for the filter_policy parameter.
  • Tests ensure that filter logic is correctly applied based on the policy setting, whether it merges runtime filters with initial filters or replaces them entirely.
  • Additional test cases should be considered for complex filter merge scenarios to ensure priority and override mechanisms function as expected.

Notes for the reviewer:

  • Make sure all retrievers in the project are accounted and appropriately adjusted
  • Pay special attention all retrievers have the same new init param + proper pydoc
  • Verify unit tests were adjusted appropriately
  • Pay special attention to Qdrant retrievers, there we needed (or not, I'm not 100% sure) to keep attribute filters as None
  • It's essential to ensure backward compatibility; existing implementations should default to replace to maintain current behavior unless explicitly set to merge.

@vblagoje vblagoje changed the title Add filter_policy init parameter to all retrievers feat: Add filter_policy init parameter to all retrievers Jun 3, 2024
@vblagoje vblagoje marked this pull request as ready for review June 3, 2024 15:01
@vblagoje vblagoje requested a review from a team as a code owner June 3, 2024 15:01
@vblagoje vblagoje requested review from silvanocerza and removed request for a team June 3, 2024 15:01
@vblagoje
Copy link
Member Author

vblagoje commented Jun 3, 2024

@silvanocerza we also need to address retrievers in Haystack project itself. I suppose we need to somehow coordinate this PR and that one and their releases?

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please break this down into single PRs for single integrations? This will make rolling out these changes with appropriate releases for the single packages easier, as well as producing better release notes.

@vblagoje
Copy link
Member Author

Superseded by:

#819
#820
#821
#822
#823
#824
#825
#826
#827

@vblagoje vblagoje closed this Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add filter_policy init parameter to all retrievers
2 participants