## Build your own digital Doraemon: The Full Python Notebook

LLM Agents can keep a knapsack of tools in its arsenal and can decide when to use them and when not to. In this notebook, we will be building my JasontheFoodie bot as a LLM agent.

#### 1. Load LLM to use in the agent

For easier implementation, I will be using Azure OpenAI API.

In [1]:
import os
# from langchain.chat_models import AzureChatOpenAI

# openai model
from langchain_openai import ChatOpenAI
openai_api_key="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"
os.environ["OPENAI_API_KEY"] = openai_api_key

In [2]:
# define env variables for AzureOpenAI model
# os.environ["OPENAI_API_TYPE"] = "azure"
# os.environ["OPENAI_API_BASE"] = "YOUR_ENDPOINT_HERE"
# os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY_HERE"
# os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"

In [3]:
# Instatiate LLM
# llm = AzureChatOpenAI(
#     deployment_name='YOUR_DEPLOYMENT_NAME_HERE',
#     temperature=0
# )
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0, openai_api_key = openai_api_key)

#### 2. Define functions for LLM agent to use as tools

The ReAct agent type uses multiple tools in its arsenal. Here are the functions we can define as tools for the agent to use.

##### 2a. Create Vector Store Retriever

A vector store is needed to store the food database and its embeddings and to act as a retriever as well. Deeplake vector store is used in this notebook.

In [4]:
import pandas as pd
from langchain.vectorstores import DeepLake
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader

In [5]:
# instantiate OpenAIEmbeddings to use to create embeddings from document context
embeddings = OpenAIEmbeddings(deployment="EMBEDDING_DEPLOYMENT_NAME", chunk_size=16)

  warn_deprecated(


In [6]:
# instantiate CSV loader and load food reviews with review link as source
# try with only certain field names
loader = CSVLoader(file_path='dummy_data.csv', csv_args={
        "delimiter": ",",
}, encoding='utf-8', source_column='review_link')
data = loader.load()

In [7]:
data

[Document(page_content="food_desc_title: Absolutely stunning plate of seafood bee hoon!\nfood_desc_body: From the fresh, succulent and thick fish slices, to the charring of the bee hoon and ingredients to an amazingly robust seafood broth, it had it all. Each mouthful of the bee hoon and soup just shouts flavour, due to the huge amount of umami and smokey flavour. Love the fresh crispy pork lard as well!\nThe portion came pretty big too! Really extremely worth it and I can't wait to come back to have another plate of this!\nreview_link: https://abc.xyz/review1\nnum_stars: 5\nvenue_name: ABC Beehoon\nvenue_url: https://abc.xyz/1\nvenue_price: ~$10/pax\nvenue_tag: Hawker Food\nSeafood", metadata={'source': 'https://abc.xyz/review1', 'row': 0}),
 Document(page_content='food_desc_title: Good food and wines\nfood_desc_body: The Short Ribs Galbi really stole the show for me. Tender and extremely flavourful, the galbi pieces were also very smokey. The sweet and savoury marinade further enhanc

In [12]:
# augment data's metadata with lat and long values; this is needed for one of our tools later on
df = pd.read_csv("dummy_data.csv", index_col=False)

df_lat = df.lat.values.tolist()
df_long = df.long.values.tolist()

for lat, long, item in zip(df_lat, df_long, data):
    item.metadata["lat"] = lat
    item.metadata["long"] = long

AttributeError: 'DataFrame' object has no attribute 'lat'

In [13]:
# create deeplake db
db = DeepLake(
    dataset_path="./my_deeplake_2/", embedding=embeddings, overwrite=True
)
db.add_documents(data)

Creating 6 embeddings in 1 batches of size 6:: 100%|██████████| 1/1 [00:00<00:00,  1.44it/s]

Dataset(path='./my_deeplake_2/', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (6, 1)      str     None   
 metadata     json      (6, 1)      str     None   
 embedding  embedding  (6, 1536)  float32   None   
    id        text      (6, 1)      str     None   





['e8914f70-dcf3-11ee-9dee-fa163e42c4ba',
 'e891507e-dcf3-11ee-9dee-fa163e42c4ba',
 'e89150c4-dcf3-11ee-9dee-fa163e42c4ba',
 'e89150e2-dcf3-11ee-9dee-fa163e42c4ba',
 'e891510a-dcf3-11ee-9dee-fa163e42c4ba',
 'e8915128-dcf3-11ee-9dee-fa163e42c4ba']

In [14]:
# load from existing DB, if database already exists
db = DeepLake(
    dataset_path="./my_deeplake_2/", embedding=embeddings, read_only=True
)

Deep Lake Dataset in ./my_deeplake_2/ already exists, loading from the storage


I would like to filter out the relevant documents retrieved that have a similarity score below a certain threshold. The similarity_score_threshold method seems to be bugging at the time of writing, and hence I will be writing my own custom retriever below.

In [15]:
# defining a normal retriever
def search(query):
    search_results = db.similarity_search_with_score(k = 5,
                                            query = query,
                                            distance_metric = 'cos'
                                        )
    
    filtered_res = list(filter(lambda x: x[1] >= 0.8, search_results))
    filtered_res = [t[0] for t in filtered_res]

    return filtered_res

##### 2b. Create Geolocation Retriever from FourSquare API

In [16]:
import requests

In [17]:
# headers for API call which includes FQ API key
headers = {
    "Accept": "application/json",
    "Authorization": "YOUR_API_KEY"
}

In [18]:
# function to retrieve coordinates based on most relevant search on foursquares platform
def get_coordinates(address):
    url = f'https://api.foursquare.com/v3/places/search?query={address.lower().strip().replace(" ", "%20")}&near=singapore'

    req = requests.request("GET", url, headers=headers)
    results_dict = eval(req.text)

    result_name = results_dict['results'][0]['name']
    result_geo = results_dict['results'][0]['geocodes']['main']
    
    return (result_geo['latitude'], result_geo['longitude']), result_name

##### 2c. Create a retriever with geolocation filter

To search for relevant listings near a certain location, we can create a retriever with a custom filter to remove listings that are too far away from the search location. This is useful when you want to do a text search near a certain location.

In [32]:
from haversine import haversine as hs
import deeplake

In [33]:
# create a filter function for deeplake to use
# deeplake.compute decorator is needed for filter function with multiple arguments
@deeplake.compute
def filter_fn(x, coor):
    venue_lat = float(x['metadata'].data()['value']['lat'])
    venue_long = float(x['metadata'].data()['value']['long'])

    dist = hs(coor, (venue_lat, venue_long))
    return dist <= 0.8 # listings that are >800m away from target location is filtered out

In [34]:
# create a function to do similarity search with filtering using deeplake.search
def search_with_filter(prompt_coor_string: str) -> list:

    # seperate search query and coordinates string input and create coordinate float tuple
    params_ls = prompt_coor_string.split(" | ")
    prompt = params_ls[0].replace('"', '')
    coor_ls = params_ls[1].split(", ")
    coor = (float(coor_ls[0]), float(coor_ls[1]))
    
    search_results = db.similarity_search_with_score(k = 5,
                                                query = prompt,
                                                filter = filter_fn(coor),
                                                distance_metric = 'cos'
                                            )
    
    # only keep reviews that have a similarity score of 0.8 and above
    filtered_res = list(filter(lambda x: x[1] >= 0.8, search_results))
    filtered_res = [t[0] for t in filtered_res]
    
    return filtered_res

In [35]:
# an example on how the function will be used
search_with_filter('"omakase courses" | 1.276525, 103.845725')

100%|██████████| 665/665 [00:00<00:00, 2180.14it/s]


[Document(page_content="place_title: Kei Hachi\nplace_url: https://www.burpple.com/kei-hachi?bp_ref=%2Ff%2FWr8X_PCG\nfood_desc_title: Best Meal I've Had So Far\nfood_desc_body: Just some random photos of dishes served during their Kei Hachi Lunch Omakase ($128) because its too hard to choose specific favourites from their beautiful course meal. All of their food in the course are right on point and so darn delicious. Each food item is presented like a form of art and paired with the beautiful ambience, this is one hell of a treat. You get a huge variety of different food preparation styles and definitely a filling meal by the end of the course. Loved the hospitality of the chefs and the servers. Highly recommended if you love Japanese food and would like a good treat! Indeed the best meal I have ever had so far 😍😍\nreview_link: https://www.burpple.com/f/R6qr9qKk\npos_prob: 0.9908563\nnum_stars: 5\nvenue_price: ~$130/pax\nvenue_tag: Date Night\nFine Dining\nJapanese\nvenue_loc: https://

#### 3. Create the agent

We will initialise the LLM agent that contains the 3 functions that we have created.

In [36]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

In [48]:
# initialise the 3 tools to be added to the agent.
# Note: that the name and description of the tool are very important for a reAct agent type, as it relies on them heavily to decide on when to use the tools
tools = [
    Tool(
        name="Database Search",
        func=search,
        description="Useful for querying food review data. Input should be the user text query to find relevant food review articles."
    ),
    Tool(
        name="Database Search with Distance Filter",
        func=search_with_filter,
        description="Useful for searching food places near specific locations. This function is to be used after Geolocation Search. The input string should be a text search query and a set of latitude and longitude coordinates seperated by |."  
    ),
    Tool(
        name="Geolocation Search",
        func=get_coordinates,
        description="Useful to retrieve latitude and longitude coordinates and name of place from a defined location. This is used before Database Search with Distance Filter. Input should be the place location given by the user. Output should be a tuple containing latitude and longitude coordinates and name of place."
    )
]

In [55]:
# Define prefix of LLM agent
# Note: this is where you can prompt engineer the template for the agent, so that the agent understands clearly on what tasks it aims to do and how it should format its answers.
PREFIX = """
Answer the following questions as best you can. You have access to the following tools below that can help to find food reviews. Only answer questions that are related to the use of the tools given.
If the question is unrelated, reject the question. If answer is not found, just say that you do not have any relevant results. Return the final answer if you think you are ready. In your final answer, you MUST list the reviews in the following format:

Here are my recommendations:
🏠 [Name of place]
<i>[venue tags]</i>
✨ Avg Rating: [Rating of venue]
💸 Price: [Estimated price of dining at venue] (this is optional. If not found or not clear, use a dash instead.)
📍 <a href=[Location of venue] ></a>
📝 Reviews:
[list of review_link, seperated by linespace] (Use this format: 1. <a href=[review_link] >[food_desc_title text]</a>)

If you cannot find any reviews to respond to the user, just say that you don't know.
"""

In [56]:
# Construct the agent
# Note: sometimes the agent will hallucinate and not format its answer properly. Setting handle_parsing_error to True will allow the agent to check its own formatting.
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    handle_parsing_errors=True,
    agent_kwargs={
        'prefix': PREFIX
    }
)

In [57]:
# run an example
agent.run("Do you know any restaurants that have omakase courses near Tanjong Pagar MRT?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should use the Geolocation Search tool to find the latitude and longitude coordinates of Tanjong Pagar MRT. Then I can use the Database Search with Distance Filter tool to search for restaurants with omakase courses near that location.
Action: Geolocation Search
Action Input: Tanjong Pagar MRT[0m
Observation: [38;5;200m[1;3m((1.276525, 103.845725), 'Tanjong Pagar Mrt Station')[0m
Thought:[32;1m[1;3mNow that I have the latitude and longitude coordinates of Tanjong Pagar MRT, I can use the Database Search with Distance Filter tool to search for restaurants with omakase courses near that location.
Action: Database Search with Distance Filter
Action Input: "omakase courses" | 1.276525, 103.845725[0m

100%|██████████| 665/665 [00:00<00:00, 2916.32it/s]



Observation: [33;1m[1;3m[Document(page_content="place_title: Kei Hachi\nplace_url: https://www.burpple.com/kei-hachi?bp_ref=%2Ff%2FWr8X_PCG\nfood_desc_title: Best Meal I've Had So Far\nfood_desc_body: Just some random photos of dishes served during their Kei Hachi Lunch Omakase ($128) because its too hard to choose specific favourites from their beautiful course meal. All of their food in the course are right on point and so darn delicious. Each food item is presented like a form of art and paired with the beautiful ambience, this is one hell of a treat. You get a huge variety of different food preparation styles and definitely a filling meal by the end of the course. Loved the hospitality of the chefs and the servers. Highly recommended if you love Japanese food and would like a good treat! Indeed the best meal I have ever had so far 😍😍\nreview_link: https://www.burpple.com/f/R6qr9qKk\npos_prob: 0.9908563\nnum_stars: 5\nvenue_price: ~$130/pax\nvenue_tag: Date Night\nFine Dining\nJa

"🏠 Kei Hachi\n<i>Date Night, Fine Dining, Japanese</i>\n✨ Avg Rating: 5\n💸 Price: ~$130/pax\n📍 <a href=https://www.google.com/maps/search/?api=1&query=1.279512,103.8415799 ></a>\n📝 Reviews:\n1. <a href=https://www.burpple.com/f/R6qr9qKk >Best Meal I've Had So Far</a>\n\n🏠 KYUU By Shunsui\n<i>Late Night, Japanese, Dinner With Drinks</i>\n✨ Avg Rating: 4\n💸 Price: ~$130/pax\n📍 <a href=https://www.google.com/maps/search/?api=1&query=1.2799799,103.841516 ></a>\n📝 Reviews:\n1. <a href=https://www.burpple.com/f/bAd77B8k >Do come here if you love your ikura!</a>\n\n🏠 Teppei Japanese Restaurant (Orchid Hotel)\n<i>Sushi, Chirashi, Seafood, Date Night, Japanese</i>\n✨ Avg Rating: 5\n💸 Price: ~$100/pax\n📍 <a href=https://www.google.com/maps/search/?api=1&query=1.276993,103.843891 ></a>\n📝 Reviews:\n1. <a href=https://www.burpple.com/f/PWJbmM5Z >Very Good Omakase, Worth The Price</a>"