Pinecone: change dummy vector #6932

anakin87 · 2024-02-07T11:04:28Z

Related Issues

fixes Pinecone: dummy vector is not compatible with the new API #6931
similar to Pinecone: Dense vectors must contain at least one non-zero value. haystack-core-integrations#300

Proposed Changes:

I changed the value of the dummy vector
I also refactored the code to create this dummy vector once at init time (to avoid duplication)

How did you test it?

CI

Unfortunately, our CI tests are based on an outdated mock, which is also the reason why this problem never emerged until a user reported it. (in haystack-core-integrations, things are way better)

So I reproduced the issue and tested the change locally using this code:

from haystack.utils import fetch_archive_from_http
from haystack.utils import convert_files_to_docs
from haystack.nodes import PreProcessor
from haystack.document_stores.pinecone import PineconeDocumentStore
from haystack.nodes import EmbeddingRetriever



doc_dir = "data/tutorial8"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/preprocessing_tutorial8.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

all_docs = convert_files_to_docs(dir_path=doc_dir)

preprocessor = PreProcessor(
clean_empty_lines=True,
clean_whitespace=True,
split_by="word",
split_length=100,
split_respect_sentence_boundary=True
)

docs_default = preprocessor.process(all_docs) #create a dictionary with the data in the 'content' key

document_store = PineconeDocumentStore(api_key="FAKE-API-KEY",
        environment = "gcp-starter", index="default")


print(document_store.get_document_count())

document_store.write_documents(docs_default)

retriever = EmbeddingRetriever(
    document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
)

print(retriever.retrieve("What is masking in language models?"))

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

vblagoje · 2024-02-07T11:52:28Z

haystack/document_stores/pinecone.py

-                        embeddings_to_index = np.zeros((len(document_chunk), self.embedding_dim), dtype="float32")
-                        # Convert embeddings to list objects
-                        embeddings = [embed.tolist() if embed is not None else None for embed in embeddings_to_index]
+                        embeddings = [self.dummy_vector] * len(document_chunk)


@anakin87 This is the only diff (aside from -10.0) right? Everything else is the same, no?

vblagoje

🚢

anakin87 added 2 commits February 7, 2024 11:52

change dummy_vector

9971014

reno

cf1b119

anakin87 requested review from a team as code owners February 7, 2024 11:04

anakin87 requested review from dfokina, julian-risch and vblagoje and removed request for a team and julian-risch February 7, 2024 11:04

github-actions bot added topic:document_store topic:pinecone labels Feb 7, 2024

vblagoje reviewed Feb 7, 2024

View reviewed changes

vblagoje self-requested a review February 7, 2024 16:37

vblagoje approved these changes Feb 7, 2024

View reviewed changes

anakin87 merged commit b08deef into v1.x Feb 7, 2024
54 checks passed

anakin87 deleted the pinecone-change-dummy-vector branch February 7, 2024 16:39

anakin87 mentioned this pull request Feb 7, 2024

Pinecone: dummy vector is not compatible with the new API #6931

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinecone: change dummy vector #6932

Pinecone: change dummy vector #6932

anakin87 commented Feb 7, 2024

vblagoje Feb 7, 2024

anakin87 Feb 7, 2024

vblagoje left a comment

Pinecone: change dummy vector #6932

Pinecone: change dummy vector #6932

Conversation

anakin87 commented Feb 7, 2024

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

vblagoje Feb 7, 2024

Choose a reason for hiding this comment

anakin87 Feb 7, 2024

Choose a reason for hiding this comment

vblagoje left a comment

Choose a reason for hiding this comment