We are investigating whether it is optimal to retrieve documents from a vector store using langchain abstractions rather than the clients the vector store provides us.

In [1]:
import os
from pinecone import Pinecone

### Initialization

In [4]:
pinecone_key = os.getenv("PINECONE_API_KEY")
google_key = os.getenv("GOOGLE_KEY")

In [5]:
pc = Pinecone(
    api_key=pinecone_key,
    environment='gcp-starter'
)

In [6]:
pc.list_indexes()

{'indexes': [{'dimension': 768,
              'host': 'general-0c42iuw.svc.gcp-starter.pinecone.io',
              'metric': 'cosine',
              'name': 'general',
              'spec': {'pod': {'environment': 'gcp-starter',
                               'pod_type': 'starter',
                               'pods': 1,
                               'replicas': 1,
                               'shards': 1}},
              'status': {'ready': True, 'state': 'Ready'}}]}

In [11]:
index_name = pc.list_indexes().index_list['indexes'][0]['name']

In [17]:
index = pc.Index(index_name)

### Langchain Retrievers

In [13]:
from langchain.embeddings import GooglePalmEmbeddings
from langchain.vectorstores import Pinecone as LangPinecone

In [15]:
with open("Data/sample.txt", encoding='utf-8') as f:
    text = f.read()

In [18]:
docsearch = LangPinecone(
    index=index,
    embedding=GooglePalmEmbeddings(google_api_key=google_key),
    text_key=text
)

In [20]:
retriever = docsearch.as_retriever()

In [22]:
query = "Who were Hansel and  Gretel"
retriever.get_relevant_documents(query)

TypeError: Index.query() got multiple values for argument 'top_k'

### Remarks
From our investigations the answer is no. The problems include:

1. We have to get embeddings on every call 
2. We gradually clutter the namespace everytime we use the retriever.
3. Using `.get_relevant_documents(query)` is problematic as there is a bug with newer pinecone implementations
```
TypeError: Index.query() got multiple values for argument 'top_k'
```
    
It also defeats the point of this langchain abstraction. We could more easily query the database using the pinecone client.