Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow filtering documents on all fields (v2) #4773

Merged
merged 19 commits into from
May 10, 2023

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Apr 27, 2023

Related Issues

Proposed Changes:

  • To implement BM25 retrieval in MemoryDocumentStore we need to be able to filter out non-text documents
  • This usecase might be really handy in many other situations, therefore I decided to extend the filtering
  • This PR allows to filter documents by metadata as before, plus by any other Document field. For example:
docstore.filter_documents(filters = {
    'some_metadata_key': 'some metadata value',
    'content_type': 'text'   # This is the new feature
})

How did you test it?

  • Locally run the unit tests
  • Added many more to the base suite

Notes for the reviewer

Related to #4768, but it's not necessary to wait for it.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added tests that demonstrate the correct behavior of the change
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@ZanSara ZanSara requested a review from a team as a code owner April 27, 2023 16:29
@ZanSara ZanSara requested review from vblagoje and removed request for a team April 27, 2023 16:29
@ZanSara ZanSara removed the request for review from vblagoje April 27, 2023 16:29
@github-actions github-actions bot added the type:documentation Improvements on the docs label Apr 27, 2023
@ZanSara ZanSara requested a review from masci April 27, 2023 16:29
@ZanSara ZanSara added the 2.x Related to Haystack v2.0 label Apr 27, 2023
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments but overall looks good to me. For the sake of being an old broken record, I would have probably shipped the safe operators and the docs flattening in two separated PRs 😄

haystack/preview/document_stores/memory/_filters.py Outdated Show resolved Hide resolved
haystack/preview/document_stores/memory/_filters.py Outdated Show resolved Hide resolved
haystack/preview/document_stores/memory/_filters.py Outdated Show resolved Hide resolved
test/preview/document_stores/_base.py Outdated Show resolved Hide resolved
@ZanSara
Copy link
Contributor Author

ZanSara commented May 5, 2023

For the sake of being an old broken record, I would have probably shipped the safe operators and the docs flattening in two separated PRs smile

Ugh good point... I didn't realize we needed the safe operators until I tested on flat Documents, but in hindsight that's true. Thanks for the reminder! One PR at a time I'll learn to notice 😄

@coveralls
Copy link
Collaborator

coveralls commented May 5, 2023

Coverage Status

Coverage: 37.158% (-0.2%) from 37.336% when pulling 0808a96 on v2-memory-extended-filters into 9cb153d on main.

@ZanSara ZanSara requested a review from masci May 8, 2023 09:26
haystack/preview/dataclasses/document.py Show resolved Hide resolved
test/preview/document_stores/_base.py Outdated Show resolved Hide resolved
@ZanSara ZanSara requested a review from masci May 8, 2023 13:59
@ZanSara ZanSara mentioned this pull request May 8, 2023
16 tasks
@ZanSara ZanSara self-assigned this May 8, 2023
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ZanSara ZanSara merged commit 3a6db68 into main May 10, 2023
@ZanSara ZanSara deleted the v2-memory-extended-filters branch May 10, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve MemoryDocumentStore filtering to ease retrieval
3 participants