Skip to content

fix: SentenceWindowRetriever convert metadata fields back to int#8304

Closed
davidsbatista wants to merge 3 commits intomainfrom
fix-make-sentence-window-retriever-work-picone
Closed

fix: SentenceWindowRetriever convert metadata fields back to int#8304
davidsbatista wants to merge 3 commits intomainfrom
fix-make-sentence-window-retriever-work-picone

Conversation

@davidsbatista
Copy link
Contributor

@davidsbatista davidsbatista commented Aug 28, 2024

Proposed Changes:

Pinecone converts all meta numbers in the meta field to float (https://docs.pinecone.io/guides/data/filter-with-metadata). This causes the SentenceWindowRetriever to crash completely.

This PR checks if the metadata values are floats and converts them back to integers making PineCone supported by the SentenceWindowRetriever

How did you test it?

  • unit tests,
  • integration tests
  • manual verification

Checklist

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Aug 28, 2024
@coveralls
Copy link
Collaborator

coveralls commented Aug 28, 2024

Pull Request Test Coverage Report for Build 10594089336

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 90.36%

Totals Coverage Status
Change from base Build 10592675152: 0.2%
Covered Lines: 7021
Relevant Lines: 7770

💛 - Coveralls

@davidsbatista davidsbatista added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Aug 28, 2024
@davidsbatista davidsbatista changed the title fix: SentenceWindowRetriever convert metadata fields back to int fix: SentenceWindowRetriever convert metadata fields back to int Aug 28, 2024
@davidsbatista davidsbatista marked this pull request as ready for review August 28, 2024 09:33
@davidsbatista davidsbatista requested a review from a team as a code owner August 28, 2024 09:33
@davidsbatista davidsbatista requested review from Amnah199 and anakin87 and removed request for a team August 28, 2024 09:33
@anakin87
Copy link
Member

Since this change is related to how Pinecone handles metadata,
I think it would be simpler and more appropriate to intervene on the Pinecone side.

Here, for example: https://github.com/deepset-ai/haystack-core-integrations/blob/50352b95e0caeedeb5779eeec3b6a7bd79542d80/integrations/pinecone/src/haystack_integrations/document_stores/pinecone/document_store.py#L270

WDYT?

@davidsbatista
Copy link
Contributor Author

It's a good suggestion, but we can't just blindly convert everything back to integers.

To make this generic, we need to know which type the values were originally, i.e.: before being stored in Pinecone; and I don't know where to store that information.

@anakin87
Copy link
Member

Yes, I understand.

I'm simply suggesting that we move the _convert_to_int method to Pinecone with the same keys used here.

Unrelated: I have the impression that the fact that our DocumentSplitter creates a page_number meta field poses risks of overriding user-provided information.

@davidsbatista
Copy link
Contributor Author

``

Yes, I understand.

I'm simply suggesting that we move the _convert_to_int method to Pinecone with the same keys used here.

ah ok, now I understand what you suggested - the only thing is that for the tests we will need to import Pinecone/add them to the CI

@anakin87
Copy link
Member

ah ok, now I understand what you suggested - the only thing is that for the tests we will need to import Pinecone/add them to the CI

No, I suggest modifying the Pinecone Document Store in core-integrations.

Here:
https://github.com/deepset-ai/haystack-core-integrations/blob/50352b95e0caeedeb5779eeec3b6a7bd79542d80/integrations/pinecone/src/haystack_integrations/document_stores/pinecone/document_store.py#L270

@davidsbatista
Copy link
Contributor Author

moving to core-integrations

@davidsbatista davidsbatista deleted the fix-make-sentence-window-retriever-work-picone branch August 29, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ignore-for-release-notes PRs with this flag won't be included in the release notes. topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants