# Building a Custom Document Retrieval Tool with Deep Lake and LangChain: A Step-by-Step Workflow

## Workflow

In [None]:
"""
1. Setting up Deep Lake:  In our case, we're using Deep Lake to store document embeddings and their corresponding text.

2. Storing documents in Deep Lake: Once Deep Lake is set up, we’ll create embeddings for our documents. The embeddings and their corresponding documents are then stored in Deep Lake. This will set up our vector database, which is ready to be queried.

3. Creating the retrieval tool: Now, we use Langchain to create a custom tool that will interact with Deep Lake. This tool is essentially a function that takes a query as input and returns the most similar documents from Deep Lake as output. To find the most similar documents, the tool first computes the embedding of the query using the same model we used for the documents. Then, it queries Deep Lake with this query embedding, and Deep Lake returns the documents whose embeddings are most similar to the query embedding.

4. Using the tool with an agent: Finally, we use this custom tool with an agent from Langchain. When the agent receives a question, it uses the tool to retrieve relevant documents from Deep Lake, and then it uses its language model to generate a response based on these documents."""

### Setting up the database


In [3]:
from langchain.vectorstores import DeepLake
from langchain_google_genai.embeddings import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model = 'models/embedding-001')

my_activeloop_org_id = "samman"
my_activeloop_dataset_name = "langchain_course_custom_tool-1"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding=embeddings)

Your Deep Lake dataset has been successfully created!


 

### Storing documents in Deep Lake

In [4]:
# addings faqs(frequently asked questions) to dataset

faqs = [
    "What is PayPal?\nPayPal is a digital wallet that follows you wherever you go. Pay any way you want. Link your credit cards to your PayPal Digital wallet, and when you want to pay, simply log in with your username and password and pick which one you want to use.",
    "Why should I use PayPal?\nIt's Fast! We will help you pay in just a few clicks. Enter your email address and password, and you're pretty much done! It's Simple! There's no need to run around searching for your wallet. Better yet, you don't need to type in your financial details again and again when making a purchase online. We make it simple for you to pay with just your email address and password.",
    "Is it secure?\nPayPal is the safer way to pay because we keep your financial information private. It isn't shared with anyone else when you shop, so you don't have to worry about paying businesses and people you don't know. On top of that, we've got your back. If your eligible purchase doesn't arrive or doesn't match its description, we will refund you the full purchase price plus shipping costs with PayPal's Buyer Protection program.",
    "Where can I use PayPal?\nThere are millions of places you can use PayPal worldwide. In addition to online stores, there are many charities that use PayPal to raise money. Find a list of charities you can donate to here. Additionally, you can send funds internationally to anyone almost anywhere in the world with PayPal. All you need is their email address. Sending payments abroad has never been easier.",
    "Do I need a balance in my account to use it?\nYou do not need to have any balance in your account to use PayPal. Similar to a physical wallet, when you are making a purchase, you can choose to pay for your items with any of the credit cards that are attached to your account. There is no need to pre-fund your account."
]

db.add_texts(faqs)

Creating 5 embeddings in 1 batches of size 5:: 100%|██████████| 1/1 [00:36<00:00, 36.63s/it]

Dataset(path='hub://samman/langchain_course_custom_tool-1', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype     shape     dtype  compression
  -------    -------   -------   -------  ------- 
   text       text      (5, 1)     str     None   
 metadata     json      (5, 1)     str     None   
 embedding  embedding  (5, 768)  float32   None   
    id        text      (5, 1)     str     None   





['9995338b-c82f-11ee-b60b-60189524c791',
 '9995338c-c82f-11ee-98a7-60189524c791',
 '9995338d-c82f-11ee-b082-60189524c791',
 '9995338e-c82f-11ee-b8f9-60189524c791',
 '9995338f-c82f-11ee-9aea-60189524c791']

### Creating the retrieval tool

In [9]:
retriever = db.as_retriever()

from langchain.agents import tool

CUSTOM_TOOL_N_DOCS = 3 # number of retrieved docs from deep lake to consider
CUSTOM_TOOL_DOCS_SEPARATOR ="\n\n" # how to join together the retrieved docs to form a single string

# We use the tool decorator to wrap a function that will become our custom tool
# Note that the tool has a single string as input and returns a single string as output
# The name of the function will be the name of our custom tool
# The docstring of the function will be the description of our custom tool
# The description is used by the agent to decide whether to use the tool for a specific query

@tool
def retrieve_n_docs_tool(query: str) -> str:
    """ Searches for relevant documents that may contain the answer to the query."""
    docs = retriever.get_relevant_documents(query)[:CUSTOM_TOOL_N_DOCS]
    texts = [doc.page_content for doc in docs]
    texts_merged = CUSTOM_TOOL_DOCS_SEPARATOR.join(texts)
    return texts_merged

### Using the tool with an agent

In [14]:
from langchain_google_genai import GoogleGenerativeAI
from langchain.agents import initialize_agent, AgentType

llm = GoogleGenerativeAI(model = 'gemini-pro', temperature=0)

agent = initialize_agent([retrieve_n_docs_tool], llm, agent="chat-zero-shot-react-description", verbose=True)

In [15]:
response = agent.run("Are my info kept private when I shop with Paypal?")
print(response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: Are my info kept private when I shop with Paypal?
Thought: I should search for documents that discuss Paypal's privacy policy.
Action:
```
{
  "action": "retrieve_n_docs_tool",
  "action_input": {
    "query": "Paypal privacy policy"
  }
}
```
[0m
Observation: [36;1m[1;3mIs it secure?
PayPal is the safer way to pay because we keep your financial information private. It isn't shared with anyone else when you shop, so you don't have to worry about paying businesses and people you don't know. On top of that, we've got your back. If your eligible purchase doesn't arrive or doesn't match its description, we will refund you the full purchase price plus shipping costs with PayPal's Buyer Protection program.

What is PayPal?
PayPal is a digital wallet that follows you wherever you go. Pay any way you want. Link your credit cards to your PayPal Digital wallet, and when you want to pay, simply log in with your username and