# Lab | Agent & Vector store

**Change the state union dataset and replicate this lab by updating the prompts accordingly.**

One such dataset is the [sonnets.txt](https://github.com/martin-gorner/tensorflow-rnn-shakespeare/blob/master/shakespeare/sonnets.txt) dataset or any other data of your choice from the same git.

# Combine agents and vector stores

This notebook covers how to combine agents and vector stores. The use case for this is that you've ingested your data into a vector store and want to interact with it in an agentic manner.

The recommended method for doing so is to create a `RetrievalQA` and then use that as a tool in the overall agent. Let's take a look at doing this below. You can do this with multiple different vector DBs, and use the agent as a way to route between them. There are two different ways of doing this - you can either let the agent use the vector stores as normal tools, or you can set `return_direct=True` to really just use the agent as a router.

## Create the vector store

In [1]:
#!pip install langchain chromadb openai langchain-openai

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
from pathlib import Path
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader, WebBaseLoader
from langchain_core.documents import Document
from langchain.agents import AgentType, Tool, initialize_agent

In [3]:
# Load environment variables
_ = load_dotenv(find_dotenv())

# API keys
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
HUGGINGFACEHUB_API_TOKEN = os.getenv('HUGGINGFACEHUB_API_TOKEN')

In [4]:
llm = OpenAI(temperature=0)

In [5]:
from pathlib import Path

relevant_parts = []
for p in Path(".").absolute().parts:
    relevant_parts.append(p)
    if relevant_parts[-3:] == ["langchain", "docs", "modules"]:
        break
doc_path = str(Path(*relevant_parts) / "Hamlet.txt")

In [6]:
loader = TextLoader(doc_path)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings, collection_name="hamlet")

Created a chunk of size 1251, which is longer than the specified 1000
Created a chunk of size 1743, which is longer than the specified 1000
Created a chunk of size 1361, which is longer than the specified 1000
Created a chunk of size 1400, which is longer than the specified 1000
Created a chunk of size 1511, which is longer than the specified 1000
Created a chunk of size 1227, which is longer than the specified 1000
Created a chunk of size 1102, which is longer than the specified 1000
Created a chunk of size 2077, which is longer than the specified 1000
Created a chunk of size 1527, which is longer than the specified 1000
Created a chunk of size 1277, which is longer than the specified 1000
Created a chunk of size 2481, which is longer than the specified 1000
Created a chunk of size 1529, which is longer than the specified 1000
Created a chunk of size 1022, which is longer than the specified 1000
Created a chunk of size 1441, which is longer than the specified 1000
Created a chunk of s

In [8]:
# Create the RetrievalQA system for Hamlet
hamlet_qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()
)

In [9]:
from langchain_community.document_loaders import WebBaseLoader

In [10]:
loader = WebBaseLoader("https://apple.github.io/coremltools/docs-guides/source/overview-coremltools.html")

In [12]:
docs = loader.load()
coreml_texts = text_splitter.split_documents(docs)
coreml_db = Chroma.from_documents(coreml_texts, embeddings, collection_name="coreml")
coreml_qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=coreml_db.as_retriever()
)

## Create the Agent

In [13]:
# Import things that are needed generically
from langchain.agents import AgentType, Tool, initialize_agent
from langchain_openai import OpenAI

In [14]:
# Define tools for the agent
tools = [
    Tool(
        name="Hamlet QA System",
        func=hamlet_qa.run,
        description="Useful for answering questions about the text of Hamlet.",
    ),
    Tool(
        name="CoreML QA System",
        func=coreml_qa.run,
        description="Useful for answering questions about the CoreML tools documentation.",
    ),
]

In [15]:
# Initialize the agent
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

  warn_deprecated(


In [16]:
agent.invoke(
    "What is the main theme of Hamlet?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should use the Hamlet QA System to find the main theme of the play.
Action: Hamlet QA System
Action Input: "What is the main theme of Hamlet?"[0m
Observation: [36;1m[1;3m The main theme of Hamlet is the struggle with mortality and the question of whether it is better to live and suffer or to die and end all pain.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: The main theme of Hamlet is the struggle with mortality and the question of whether it is better to live and suffer or to die and end all pain.[0m

[1m> Finished chain.[0m


{'input': 'What is the main theme of Hamlet?',
 'output': 'The main theme of Hamlet is the struggle with mortality and the question of whether it is better to live and suffer or to die and end all pain.'}

In [17]:
agent.invoke("What is CoreML and what are its main features?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should use the CoreML QA System to answer this question.
Action: CoreML QA System
Action Input: "What is CoreML and what are its main features?"[0m
Observation: [33;1m[1;3m CoreML is a machine learning framework developed by Apple for use in iOS, macOS, watchOS, and tvOS applications. It allows developers to integrate trained machine learning models into their apps, making it easier to incorporate features such as image recognition, natural language processing, and more. Some of its main features include support for various libraries and frameworks, a unified conversion process, and the ability to convert models to both ML programs and neural networks.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: CoreML is a machine learning framework developed by Apple for use in iOS, macOS, watchOS, and tvOS applications. Some of its main features include support for various libraries and frameworks, a unified 

{'input': 'What is CoreML and what are its main features?',
 'output': 'CoreML is a machine learning framework developed by Apple for use in iOS, macOS, watchOS, and tvOS applications. Some of its main features include support for various libraries and frameworks, a unified conversion process, and the ability to convert models to both ML programs and neural networks.'}

## Use the Agent solely as a router

You can also set `return_direct=True` if you intend to use the agent as a router and just want to directly return the result of the RetrievalQAChain.

Notice that in the above examples the agent did some extra work after querying the RetrievalQAChain. You can avoid that and just return the result directly.

In [18]:
# Define tools for the agent
tools = [
    Tool(
        name="Hamlet QA System",
        func=hamlet_qa.run,
        description="Useful for answering questions about the text of Hamlet.",
        return_direct=True
    ),
    Tool(
        name="CoreML QA System",
        func=coreml_qa.run,
        description="Useful for answering questions about the CoreML tools documentation.",
        return_direct=True
    ),
]

In [19]:
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [20]:
agent.invoke(
    "To be or not to be?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m This is a famous quote from Hamlet, so I should use the Hamlet QA System.
Action: Hamlet QA System
Action Input: "To be or not to be?"[0m
Observation: [36;1m[1;3m The question is whether it is better to live and suffer through life's challenges or to end one's life and potentially escape those challenges.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


{'input': 'To be or not to be?',
 'output': " The question is whether it is better to live and suffer through life's challenges or to end one's life and potentially escape those challenges."}

In [21]:
agent.invoke("How does it acceralate my Mac M1?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should use the CoreML QA System to find information about the Mac M1.
Action: CoreML QA System
Action Input: "How does it accelerate my Mac M1?"[0m
Observation: [33;1m[1;3m Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine (NE) while minimizing its memory footprint and power consumption. This allows for faster and more efficient processing of machine learning models on the Mac M1.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


{'input': 'How does it acceralate my Mac M1?',
 'output': ' Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine (NE) while minimizing its memory footprint and power consumption. This allows for faster and more efficient processing of machine learning models on the Mac M1.'}

## Multi-Hop vector store reasoning

Because vector stores are easily usable as tools in agents, it is easy to use answer multi-hop questions that depend on vector stores using the existing agent framework.

In [22]:
tools = [
    Tool(
        name="Hamlet QA System",
        func=hamlet_qa.run,
        description="Useful for answering questions about the text of Hamlet.",
    ),
    Tool(
        name="CoreML QA System",
        func=coreml_qa.run,
        description="Useful for answering questions about the CoreML tools documentation.",
    ),
]

In [23]:
# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [24]:
agent.invoke(
    "Short story about Hamlet?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should use the Hamlet QA System to find a summary of the story.
Action: Hamlet QA System
Action Input: "summary of Hamlet"[0m
Observation: [36;1m[1;3m Hamlet is a play by William Shakespeare that follows the story of Prince Hamlet, who is seeking revenge for his father's murder. The play is set in Denmark and includes a cast of characters such as King Claudius, Hamlet's uncle and the current king, Polonius, the lord chamberlain, and Horatio, Hamlet's friend. The play also features the ghost of Hamlet's father, who reveals that he was murdered by Claudius. Hamlet struggles with his desire for revenge and his own moral code, leading to a series of tragic events. The play also explores themes of madness, betrayal, and family relationships.[0m
Thought:[32;1m[1;3m This is a good summary, but I should also check the CoreML QA System for any additional information.
Action: CoreML QA System
Action Input: "Hamlet summary"[0m

{'input': 'Short story about Hamlet?',
 'output': "Hamlet is a play by William Shakespeare that follows the story of Prince Hamlet, who is seeking revenge for his father's murder. The play is set in Denmark and includes a cast of characters such as King Claudius, Hamlet's uncle and the current king, Polonius, the lord chamberlain, and Horatio, Hamlet's friend. The play also features the ghost of Hamlet's father, who reveals that he was murdered by Claudius. Hamlet struggles with his desire for revenge and his own moral code, leading to a series of tragic events. The play also explores themes of madness, betrayal, and family relationships."}

In [25]:
agent.invoke(
    "short summary of the end of the play"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should use the Hamlet QA System to find the answer
Action: Hamlet QA System
Action Input: "What happens at the end of Hamlet?"[0m
Observation: [36;1m[1;3m At the end of Hamlet, Hamlet's mother, Queen Gertrude, reveals that Hamlet has killed Polonius in a fit of madness. This leads to a chain of events that ultimately results in the deaths of several characters, including Hamlet himself.[0m
Thought:[32;1m[1;3m This is a good summary, but I should also check the CoreML QA System for any additional information.
Action: CoreML QA System
Action Input: "What are the main events that lead to the end of the play?"[0m
Observation: [33;1m[1;3m I don't know, as this context does not mention a play or any events leading to its end.[0m
Thought:[32;1m[1;3m I should try rephrasing my question to be more specific.
Action: CoreML QA System
Action Input: "What are the main events that lead to the end of Hamlet?"[0m
Observation:

{'input': 'short summary of the end of the play',
 'output': "The end of Hamlet is marked by a series of tragic events, including the deaths of several characters and Hamlet's fulfillment of his revenge against his uncle Claudius."}