#Combine agents and vector stores
This notebook covers how to combine agents and vectorstores. The use case for this is that you've ingested your data into a vectorstore and want to interact with it in an agentic manner.

The recommended method for doing so is to create a RetrievalQA and then use that as a tool in the overall agent. Let's take a look at doing this below. You can do this with multiple different vectordbs, and use the agent as a way to route between them. There are two different ways of doing this - you can either let the agent use the vectorstores as normal tools, or you can set return_direct=True to really just use the agent as a router.

#Integrating langsmith

Langsmith is used for demonstrating debuging. Connect to https://smith.langchain.com

Harrison Chase, creator of LangChain was kind enough to provide us with a beta access code - **langsmith_partner_2023**

Please feel free to provide writups, and feedback to Harrison and the @langchainAI about LangSmith. It is a work in progress, with changes implemented all the time

In [89]:
!pip install -q langchain openai colab_env chromadb tiktoken

In [90]:
import colab_env
import langchain
import openai
import os

#export the project name to langsmith

**note you need to add this name as a project to https://smith.langchain.com**

In [91]:
# Connect this notebook to langsmith

import os

# connect notebook to langsmith
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'

# This key is sourced from vars.env
# os.environ['LANGCHAIN_API_KEY'] = '<your-api-key>'  # Uncomment and replace <your-api-key> with your actual API key

os.environ['LANGCHAIN_PROJECT'] = 'GAI-Demo-Agents_Combined_with_vector_stores_as_tools'

# To verify, you can print the variables
print(os.environ.get('LANGCHAIN_TRACING_V2'))
print(os.environ.get('LANGCHAIN_ENDPOINT'))
#print(os.environ.get('LANGCHAIN_API_KEY'))  # Uncomment if you want to print your API key (be careful with sharing your notebook)
print(os.environ.get('LANGCHAIN_PROJECT'))


true
https://api.smith.langchain.com
GAI-Demo-Agents_Combined_with_vector_stores_as_tools


In [92]:
#pull your openai key from your vars.env or dotenv
openai_api_key=os.environ['OPENAI_API_KEY']

In [93]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader

llm = OpenAI(temperature=0)

In [94]:
#from pathlib import Path

#relevant_parts = []
#for p in Path(".").absolute().parts:
#    relevant_parts.append(p)
#    if relevant_parts[-3:] == ["langchain", "docs", "modules"]:
#        break
#doc_path = str(Path(*relevant_parts) / "/content/gdrive/MyDrive/Projects/GAIHackathon0823/transcripts/day1/Talk1_-_OpenAI_Summit_-_Techstrong.txt")

In [95]:
documents = []
# Note, I used my relative path in my google drive. If executing using terminal or vscode uncomment the line below
# loader = DirectoryLoader('../transcripts/', glob='**/*.txt')

# Google Colab / Gdrive code
loader = DirectoryLoader('/content/gdrive/MyDrive/Projects/GAIHackathon0823/transcripts/', glob='**/*.txt')

In [96]:
from langchain.document_loaders import TextLoader

loader = TextLoader(doc_path)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings, collection_name="GAIhackathon-transcripts")



In [97]:
gai_hackathon = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()
)

In [98]:
from langchain.document_loaders import WebBaseLoader

In [99]:
#loader = WebBaseLoader("https://python.langchain.com/docs/")
loader = WebBaseLoader("https://beta.ruff.rs/docs/faq/")

In [100]:
docs = loader.load()
ruff_qa_texts = text_splitter.split_documents(docs)
ruff_qa_db = Chroma.from_documents(ruff_qa_texts, embeddings, collection_name="ruff_qa")
ruff_qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=ruff_qa_db.as_retriever()
)



In [101]:
len(docs)

1

#create the agent

In [102]:
# Import things that are needed generically
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain.llms import OpenAI
from langchain import LLMMathChain, SerpAPIWrapper

In [103]:
tools = [
    Tool(
        name="GAI Hackathon QA System",
        #func=state_of_union.run,
        func=gai_hackathon.run,
        description="useful for when you need to answer questions about the most recent GAI Hackathon discussions. Input should be a fully formed question.",
    ),
    Tool(
        name="Ruff QA System",
        #func=ruff_qa.run,
        func=ruff_qa.run,
        description="useful for when you need to answer questions about Ruff (a Python Linter). Input should be a fully formed question.",
    ),
]

In [104]:
# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [105]:
agent.run(
    "What was said about LLMs, if there are multiple statements compare and contrast all?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what was said about LLMs.
Action: GAI Hackathon QA System
Action Input: What was said about LLMs?[0m
Observation: [36;1m[1;3m Speaker 10 said that LLMs can be used to interpret unique terms differently and that they can be used to patch externally trained LM ER models to a specific use case. They also mentioned that using different acronyms can open up new security vectors and prompt injection.[0m
Thought:[32;1m[1;3m I need to compare and contrast the statements.
Action: GAI Hackathon QA System
Action Input: What are the differences between the statements about LLMs?[0m
Observation: [36;1m[1;3m The speaker is discussing how different LLMs interpret unique terms differently and how this can open up new security vectors and prompt injection when using a different acronym.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: LLMs can be used to interpret unique terms differently and c

'LLMs can be used to interpret unique terms differently and can be used to patch externally trained LM ER models to a specific use case. Using different acronyms can open up new security vectors and prompt injection. The differences between the statements are that LLMs interpret unique terms differently and using different acronyms can open up new security vectors and prompt injection.'

In [106]:
agent.run("How would I lint a jupyter notebook?")





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to use a linter to check the code in the notebook
Action: Ruff QA System
Action Input: How do I lint a jupyter notebook?[0m
Observation: [33;1m[1;3m You can lint a Jupyter notebook by running Ruff over it with nbQA. After installing Ruff and nbQA, you can run Ruff over a notebook like so: nbqa ruff Untitled.ipynb.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: To lint a Jupyter notebook, you can use Ruff with nbQA by running nbqa ruff Untitled.ipynb.[0m

[1m> Finished chain.[0m


'To lint a Jupyter notebook, you can use Ruff with nbQA by running nbqa ruff Untitled.ipynb.'

#Use the Agent solely as a router
You can also set return_direct=True if you intend to use the agent as a router and just want to directly return the result of the RetrievalQAChain.

Notice that in the above examples the agent did some extra work after querying the RetrievalQAChain. You can avoid that and just return the result directly.

In [107]:
tools = [
    Tool(
        name="GAI Hackathon QA System",
        func=gai_hackathon.run,
        description="useful for when you need to answer questions about the most recent GAI Hackathon dicussion. Input should be a fully formed question.",
        return_direct=True,
    ),
    Tool(
        name="ruff_qa QA System",
        func=ruff_qa.run,
        description="useful for when you need to answer questions about ruff_qa. Input should be a fully formed question.",
        return_direct=True,
    ),
]

In [108]:
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [109]:
agent.run(
    "What was said about LLMs?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what was said about LLMs
Action: GAI Hackathon QA System
Action Input: What was said about LLMs?[0m
Observation: [36;1m[1;3m Speaker 10 said that LLMs have a codex of three letter acronyms and that it is interesting to think about how to take an LLM model trained externally and patch it internally for a specific use case, and what security vectors that could open up.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


' Speaker 10 said that LLMs have a codex of three letter acronyms and that it is interesting to think about how to take an LLM model trained externally and patch it internally for a specific use case, and what security vectors that could open up.'

#Multi-Hop vectorstore reasoning
Because vectorstores are easily usable as tools in agents, it is easy to use answer multi-hop questions that depend on vectorstores using the existing agent framework

In [110]:
tools = [
    Tool(
        name="GAI Hackathon QA System",
        func=gai_hackathon.run,
        description="useful for when you need to answer questions about the most recent GAI Hackathon dicussion. Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.",
    ),
    Tool(
        name="ruff_qa QA System",
        func=ruff_qa.run,
        description="useful for when you need to answer questions about ruff (a python linter)). Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.",
    ),
]

In [111]:
# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [112]:
agent.run(
    "How can I lint my jupyter notebook?, how would I use LLMs in the enterprise?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to use the right tool for the job
Action: ruff_qa QA System
Action Input: How can I lint my jupyter notebook?[0m
Observation: [33;1m[1;3m You can lint your Jupyter Notebook by running Ruff over it. After installing Ruff and nbQA, you can run Ruff over a notebook like so: nbqa ruff Untitled.ipynb.[0m
Thought:[32;1m[1;3m I need to use the right tool for the job
Action: GAI Hackathon QA System
Action Input: How would I use LLMs in the enterprise?[0m
Observation: [36;1m[1;3m It depends on the use case. LLMs can be used to patch externally trained models to an internal use case, but this could open up new security vectors and prompt injection risks. It is important to consider the codex of three letter acronyms and special words used in the enterprise, as well as the incident management and confluence or wiki documents.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: You can lint your Jupyter N

'You can lint your Jupyter Notebook by running Ruff over it. When using LLMs in the enterprise, it is important to consider the codex of three letter acronyms and special words used in the enterprise, as well as the incident management and confluence or wiki documents.'