<a href="https://colab.research.google.com/gist/Daethyra/0e3f515a41d78d89babbea00c057b8d2/langchain-embeddings-retrieval-agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-agent.ipynb)

#### [LangChain Handbook](https://pinecone.io/learn/langchain)

# Retrieval Agents

We've seen in previous chapters how powerful [retrieval augmentation](https://www.pinecone.io/learn/langchain-retrieval-augmentation/) and [conversational agents](https://www.pinecone.io/learn/langchain-agents/) can be. They become even more impressive when we begin using them together.

Conversational agents can struggle with data freshness, knowledge about specific domains, or accessing internal documentation. By coupling agents with retrieval augmentation tools we no longer have these problems.

One the other side, using "naive" retrieval augmentation without the use of an agent means we will retrieve contexts with *every* query. Again, this isn't always ideal as not every query requires access to external knowledge.

Merging these methods gives us the best of both worlds. In this notebook we'll learn how to do this.

[![Open full notebook](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/full-link.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb)

To begin, we must install the prerequisite libraries that we will be using in this notebook.

In [7]:
%pip install -qU \
    openai \
    "pinecone-client[grpc]" \
    pinecone-datasets \
    langchain \
    tiktoken

Note: you may need to restart the kernel to use updated packages.


  You can safely remove it manually.


## Building the Knowledge Base

We will download a pre-embedded dataset from `pinecone-datasets`. Allowing us to skip the embedding and preprocessing steps, if you'd rather work through those steps you can find the [full notebook here](https://github.com/pinecone-io/examples/blob/master/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb).

In [8]:
from pinecone_datasets import load_dataset, list_datasets

# Check available datasets
list_datasets()

['ANN_DEEP1B_d96_angular',
 'ANN_Fashion-MNIST_d784_euclidean',
 'ANN_GIST_d960_euclidean',
 'ANN_GloVe_d100_angular',
 'ANN_GloVe_d200_angular',
 'ANN_GloVe_d25_angular',
 'ANN_GloVe_d50_angular',
 'ANN_LastFM_d64_angular',
 'ANN_MNIST_d784_euclidean',
 'ANN_NYTimes_d256_angular',
 'ANN_SIFT1M_d128_euclidean',
 'amazon_toys_quora_all-MiniLM-L6-bm25',
 'it-threat-data-test',
 'it-threat-data-train',
 'langchain-python-docs-text-embedding-ada-002',
 'movielens-user-ratings',
 'msmarco-v1-bm25-allMiniLML6V2',
 'quora_all-MiniLM-L6-bm25-100K',
 'quora_all-MiniLM-L6-bm25',
 'quora_all-MiniLM-L6-v2_Splade-100K',
 'quora_all-MiniLM-L6-v2_Splade',
 'squad-text-embedding-ada-002',
 'wikipedia-simple-text-embedding-ada-002-100K',
 'wikipedia-simple-text-embedding-ada-002',
 'youtube-transcripts-text-embedding-ada-002']

In [10]:
dataset = load_dataset("squad-text-embedding-ada-002")
dataset.head()

_request non-retriable exception: Invalid bucket name: 'pinecone-datasets-dev\squad-text-embedding-ada-002', 400
Traceback (most recent call last):
  File "c:\Users\dae\.vscode\Software\.venv\lib\site-packages\gcsfs\retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
  File "c:\Users\dae\.vscode\Software\.venv\lib\site-packages\gcsfs\core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "c:\Users\dae\.vscode\Software\.venv\lib\site-packages\gcsfs\retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Invalid bucket name: 'pinecone-datasets-dev\squad-text-embedding-ada-002', 400


HttpError: Invalid bucket name: 'pinecone-datasets-dev\squad-text-embedding-ada-002', 400

In [None]:
len(dataset)

18891

We'll format the dataset ready for upsert and reduce what we use to a subset of the full dataset.

In [None]:
# we drop sparse_values as they are not needed for this example
dataset.documents.drop(['sparse_values', 'blob'], axis=1, inplace=True)

dataset.head()

Unnamed: 0,id,values,metadata
0,5733be284776f41900661182,"[-0.010262451963272523, 0.02222637996192584, -...","{'text': 'Architecturally, the school has a Ca..."
1,5733bf84d058e614000b61be,"[-0.009786712423983223, -0.013988726438873078,...","{'text': 'As at most other universities, Notre..."
2,5733bed24776f41900661188,"[0.013343917696606181, -0.0007001232846109822,...",{'text': 'The university is the major seat of ...
3,5733a6424776f41900660f51,"[-0.0085222901071539, 0.004399558219521822, -0...",{'text': 'The College of Engineering was estab...
4,5733a70c4776f41900660f64,"[-0.006695996885869355, -0.02067068565761649, ...",{'text': 'All of Notre Dame's undergraduate st...


## Vector Database

Next we initialize a Pinecone vector database. For this we need a [free API key](https://app.pinecone.io/), then we create the index:

In [None]:
import pinecone
import os

# Load Pinecone API key
api_key = os.getenv('PINECONE_API_KEY') or 'api_key'
# Set Pinecone environment. Find next to API key in console
env = os.getenv('PINECONE_ENVIRONMENT') or "us-central1-gcp"

pinecone.init(api_key=api_key, environment=env)

In [None]:
index_name = 'langchain-retrieval-agent-fast'

In [None]:
import time

if index_name in pinecone.list_indexes():
   pinecone.delete_index(index_name)

# we create a new index
pinecone.create_index(
    name=index_name,
    metric='dotproduct',
    dimension=1536  # 1536 dim of text-embedding-ada-002
)

# wait for index to be initialized
while not pinecone.describe_index(index_name).status['ready']:
    time.sleep(1)

Then connect to the index:

In [None]:
index = pinecone.GRPCIndex(index_name)
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

We should see that the new Pinecone index has a `total_vector_count` of `0`, as we haven't added any vectors yet.

Now we upsert the data to Pinecone:

In [None]:
index.upsert_from_dataframe(dataset.documents, batch_size=128)

sending upsert requests:   0%|          | 0/18891 [00:00<?, ?it/s]

collecting async responses:   0%|          | 0/148 [00:00<?, ?it/s]

upserted_count: 18891

We've indexed everything, now we can check the number of vectors in our index like so:

In [None]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.1,
 'namespaces': {'': {'vector_count': 18891}},
 'total_vector_count': 18891}

## Creating a Vector Store and Querying

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

openai_api_key = os.getenv('OPENAI_API_KEY') or 'sk-'
model_name = 'text-embedding-ada-002'

embed = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=openai_api_key
)

Now that we've build our index we can switch back over to LangChain. We start by initializing a vector store using the same index we just built. We do that like so:

In [None]:
from langchain.vectorstores import Pinecone

text_field = "text"

# switch back to normal index for langchain
index = pinecone.Index(index_name)

vectorstore = Pinecone(
    index, embed.embed_query, text_field
)

As in previous examples, we can use the `similarity_search` method to do a pure semantic search (without the generation component).

In [None]:
query = "What universities had the most intergenerational wealth?"

vectorstore.similarity_search(
    query,  # our search query
    k=5  # return 3 most relevant docs
)

[Document(page_content='Episcopalians and Presbyterians, as well as other WASPs, tend to be considerably wealthier and better educated (having graduate and post-graduate degrees per capita) than most other religious groups in United States, and are disproportionately represented in the upper reaches of American business, law and politics, especially the Republican Party. Numbers of the most wealthy and affluent American families as the Vanderbilts and the Astors, Rockefeller, Du Pont, Roosevelt, Forbes, Whitneys, the Morgans and Harrimans are Mainline Protestant families.', metadata={'title': 'Protestantism'}),
 Document(page_content='Yale has had many financial supporters, but some stand out by the magnitude or timeliness of their contributions. Among those who have made large donations commemorated at the university are: Elihu Yale; Jeremiah Dummer; the Harkness family (Edward, Anna, and William); the Beinecke family (Edwin, Frederick, and Walter); John William Sterling; Payne Whitne

Looks like we're getting good results. Let's take a look at how we can begin integrating this into a conversational agent.

## Initializing the Conversational Agent

Our conversational agent needs a Chat LLM, conversational memory, and a `RetrievalQA` chain to initialize. We create these using:

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA

# chat completion llm
llm = ChatOpenAI(
    openai_api_key=openai_api_key,
    model_name='gpt-3.5-turbo',
    temperature=0.0
)
# conversational memory
conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=5,
    return_messages=True
)
# retrieval qa chain
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

Using these we can generate an answer using the `run` method:

In [None]:
qa.run(query)

'Based on the provided context, Yale University is mentioned as having received significant donations from wealthy individuals and families, such as Elihu Yale, the Harkness family, the Beinecke family, John William Sterling, Payne Whitney, Joseph E. Sheffield, Paul Mellon, Charles B. G. Murphy, William K. Lanman, and the Yale Class of 1954. These donations suggest a strong presence of intergenerational wealth at Yale University. However, it is important to note that this information does not provide a comprehensive ranking of universities based on intergenerational wealth.'

But this isn't yet ready for our conversational agent. For that we need to convert this retrieval chain into a tool. We do that like so:

In [None]:
from langchain.agents import Tool

tools = [
    Tool(
        name='Knowledge Base',
        func=qa.run,
        description=(
            'use this tool when answering general knowledge queries to get '
            'more information about the topic'
        )
    )
]

Now we can initialize the agent like so:

In [None]:
from langchain.agents import initialize_agent

agent = initialize_agent(
    agent='chat-conversational-react-description',
    tools=tools,
    llm=llm,
    verbose=True,
    max_iterations=3,
    early_stopping_method='generate',
    memory=conversational_memory
)

With that our retrieval augmented conversational agent is ready and we can begin using it.

### Using the Conversational Agent

To make queries we simply call the `agent` directly.

In [None]:
agent(query)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Knowledge Base",
    "action_input": "Universities with the most intergenerational wealth"
}[0m
Observation: [36;1m[1;3mI don't have specific information on universities with the most intergenerational wealth. However, some universities in the United States have significant endowments, which can contribute to their overall wealth. Examples of universities with large endowments include Harvard University, Stanford University, and Princeton University.[0m
Thought:[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University."
}[0m

[1m> Finished chain.[0m


{'input': 'What universities had the most intergenerational wealth?',
 'chat_history': [],
 'output': 'Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.'}

Looks great, now what if we ask it a non-general knowledge question?

In [None]:
agent("what is 2 * 7?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "The product of 2 multiplied by 7 is 14."
}[0m

[1m> Finished chain.[0m


{'input': 'what is 2 * 7?',
 'chat_history': [HumanMessage(content='What universities had the most intergenerational wealth?', additional_kwargs={}, example=False),
  AIMessage(content='Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.', additional_kwargs={}, example=False)],
 'output': 'The product of 2 multiplied by 7 is 14.'}

Perfect, the agent is able to recognize that it doesn't need to refer to it's general knowledge tool for that question. Let's try some more questions.

In [None]:
agent("can you tell me some facts about legacy admissions?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Knowledge Base",
    "action_input": "legacy admissions"
}[0m
Observation: [36;1m[1;3mLegacy admissions refer to the practice of giving preferential treatment to applicants who have family members who attended the university in question. This means that if a student's parent, grandparent, or sibling attended the university, they may have a higher chance of being admitted compared to other applicants with similar qualifications. Legacy admissions are one of the factors that can be taken into account in the holistic admissions process used by some universities in the United States.[0m
Thought:[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "Legacy admissions refer to the practice of giving preferential treatment to applicants who have family members who attended the university in question. This means that if a student's parent, grandparent, or sibling attended the university, they may have 

{'input': 'can you tell me some facts about legacy admissions?',
 'chat_history': [HumanMessage(content='What universities had the most intergenerational wealth?', additional_kwargs={}, example=False),
  AIMessage(content='Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.', additional_kwargs={}, example=False),
  HumanMessage(content='what is 2 * 7?', additional_kwargs={}, example=False),
  AIMessage(content='The product of 2 multiplied by 7 is 14.', additional_kwargs={}, example=False)],
 'output': "Legacy admissions refer to the practice of giving preferential treatment to applicants who have family members who attended the university in question. This means that if a student's parent, grandparent, or sibling attended the university, they may have a higher chance of being admitted compared to other applicants with similar qualifications. Legacy admissions are one of the factors that can be taken int

In [None]:
agent("Teach a class of 7th graders how legacy admissions ruin the playing field.")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Knowledge Base",
    "action_input": "Legacy admissions and their impact on the playing field in college admissions"
}[0m
Observation: [36;1m[1;3mLegacy admissions refer to the practice of giving preferential treatment to applicants who have family members, usually parents or grandparents, who attended the university in question. This practice is controversial because it can perpetuate social and economic advantages for certain groups of people, particularly those from wealthier backgrounds. 

Legacy admissions can have an impact on the playing field in college admissions by potentially disadvantaging applicants from underrepresented or disadvantaged backgrounds. By giving preference to legacy applicants, universities may be prioritizing the continuation of a privileged class rather than promoting diversity and equal opportunity. This can create a system where certain groups have a higher likelihood of gai

{'input': 'Teach a class of 7th graders how legacy admissions ruin the playing field.',
 'chat_history': [HumanMessage(content='What universities had the most intergenerational wealth?', additional_kwargs={}, example=False),
  AIMessage(content='Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.', additional_kwargs={}, example=False),
  HumanMessage(content='what is 2 * 7?', additional_kwargs={}, example=False),
  AIMessage(content='The product of 2 multiplied by 7 is 14.', additional_kwargs={}, example=False),
  HumanMessage(content='can you tell me some facts about legacy admissions?', additional_kwargs={}, example=False),
  AIMessage(content="Legacy admissions refer to the practice of giving preferential treatment to applicants who have family members who attended the university in question. This means that if a student's parent, grandparent, or sibling attended the university, they may have a highe

Looks great! We're also able to ask questions that refer to previous interactions in the conversation and the agent is able to refer to the conversation history to as a source of information.

That's all for this example of building a retrieval augmented conversational agent with OpenAI and Pinecone (the OP stack) and LangChain.

Once finished, we delete the Pinecone index to save resources:

In [None]:
pinecone.delete_index(index_name)

---