### Document QA using Large Language Models (LLMs)

Using LLM document extraction methods for better querying of food review data

This is the dummy notebook to the article I have written here:

To visit my food recommender bot on Telegram, please use this link here: https://t.me/jasonthefoodie_bot

#### 1. Checkout the Dataset

In [1]:
import pandas as pd

In [3]:
# import data to take a quick look
df = pd.read_csv("dummy_data.csv")
df

Unnamed: 0,food_desc_title,food_desc_body,review_link,num_stars,venue_name,venue_url,venue_price,venue_tag
0,Absolutely stunning plate of seafood bee hoon!,"From the fresh, succulent and thick fish slice...",https://abc.xyz/review1,5,ABC Beehoon,https://abc.xyz/1,~$10/pax,Hawker Food\nSeafood
1,Good food and wines,The Short Ribs Galbi really stole the show for...,https://abc.xyz/review2,5,New Place Seafood House,https://abc.xyz/2,~$50/pax,Zi Char\nSeafood\nGood For Groups\nChinese
2,Lovely sandwiches and sides!,A relatively new cafe serving up superb sandwi...,https://abc.xyz/review3,4,Sando Place,https://abc.xyz/3,~$15/pax,Breakfast & Brunch\nCafes & Coffee\nSandwiches
3,So many lovely dishes that were well executed!,What a spread! The Gyoza set ($13.90) comes wi...,https://abc.xyz/review4,5,Dumpling King,https://abc.xyz/4,~$20/pax,Japanese
4,Not bad of a coffee!,A quaint coffee place in the heart of Chinatow...,https://abc.xyz/review5,2,Coffee House,https://abc.xyz/5,~$5/pax,Cafes & Coffee
5,Lovely thick fish slices!,What's there to not love from a good bowl of f...,https://abc.xyz/review6,5,Fish Soup Ban Mian,https://abc.xyz/6,~$5/pax,Hawker Food


#### 2. Setting up the Vector Store

In [4]:
from langchain.vectorstores import DeepLake
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders.csv_loader import CSVLoader
import os

In [5]:
# define env variables for AzureOpenAI model
# os.environ["OPENAI_API_TYPE"] = "azure"
# os.environ["OPENAI_API_BASE"] = "OPENAI_API_BASE"
# os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"
# os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"

# openai model
from langchain_openai import ChatOpenAI
openai_api_key="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"
os.environ["OPENAI_API_KEY"] = openai_api_key



In [6]:
# instantiate OpenAIEmbeddings
# note that chunk_size is set to 1 due AzureOpenAI limitations: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#embeddings
embeddings = OpenAIEmbeddings(deployment="embedding", chunk_size=1)

  warn_deprecated(


In [7]:
# instantiate CSV loader and load food reviews with review link as source
loader = CSVLoader(file_path='dummy_data.csv', csv_args={
        "delimiter": ",",
}, encoding='utf-8', source_column='review_link')
data = loader.load()

In [8]:
# see what the document content is like
print(data[0])

page_content="food_desc_title: Absolutely stunning plate of seafood bee hoon!\nfood_desc_body: From the fresh, succulent and thick fish slices, to the charring of the bee hoon and ingredients to an amazingly robust seafood broth, it had it all. Each mouthful of the bee hoon and soup just shouts flavour, due to the huge amount of umami and smokey flavour. Love the fresh crispy pork lard as well!\nThe portion came pretty big too! Really extremely worth it and I can't wait to come back to have another plate of this!\nreview_link: https://abc.xyz/review1\nnum_stars: 5\nvenue_name: ABC Beehoon\nvenue_url: https://abc.xyz/1\nvenue_price: ~$10/pax\nvenue_tag: Hawker Food\nSeafood" metadata={'source': 'https://abc.xyz/review1', 'row': 0}


In [9]:
# create deeplake db
db = DeepLake(
    dataset_path="./my_deeplake/", embedding_function=embeddings, overwrite=True
)
db.add_documents(data)

Using embedding function is deprecated and will be removed in the future. Please use embedding instead.
Creating 6 embeddings in 1 batches of size 6:: 100%|██████████| 1/1 [01:02<00:00, 62.49s/it]

Dataset(path='./my_deeplake/', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (6, 1)      str     None   
 metadata     json      (6, 1)      str     None   
 embedding  embedding  (6, 1536)  float32   None   
    id        text      (6, 1)      str     None   





['e44717fe-dcf0-11ee-b388-fa163e42c4ba',
 'e4471916-dcf0-11ee-b388-fa163e42c4ba',
 'e44719ac-dcf0-11ee-b388-fa163e42c4ba',
 'e44719de-dcf0-11ee-b388-fa163e42c4ba',
 'e4471a06-dcf0-11ee-b388-fa163e42c4ba',
 'e4471a2e-dcf0-11ee-b388-fa163e42c4ba']

In [10]:
# load from existing DB, if database exists
db = DeepLake(
    dataset_path="./my_deeplake/", embedding_function=embeddings, read_only=True
)

Using embedding function is deprecated and will be removed in the future. Please use embedding instead.


Deep Lake Dataset in ./my_deeplake/ already exists, loading from the storage


In [11]:
# example query
query = "What places selling seafood bee hoon have you been to?"
docs = db.similarity_search(query)

In [12]:
docs

[Document(page_content="food_desc_title: Absolutely stunning plate of seafood bee hoon!\nfood_desc_body: From the fresh, succulent and thick fish slices, to the charring of the bee hoon and ingredients to an amazingly robust seafood broth, it had it all. Each mouthful of the bee hoon and soup just shouts flavour, due to the huge amount of umami and smokey flavour. Love the fresh crispy pork lard as well!\nThe portion came pretty big too! Really extremely worth it and I can't wait to come back to have another plate of this!\nreview_link: https://abc.xyz/review1\nnum_stars: 5\nvenue_name: ABC Beehoon\nvenue_url: https://abc.xyz/1\nvenue_price: ~$10/pax\nvenue_tag: Hawker Food\nSeafood", metadata={'source': 'https://abc.xyz/review1', 'row': 0}),
 Document(page_content='food_desc_title: Good food and wines\nfood_desc_body: The Short Ribs Galbi really stole the show for me. Tender and extremely flavourful, the galbi pieces were also very smokey. The sweet and savoury marinade further enhanc

#### 3. Generating Prompts with Context

In [13]:
from langchain.chat_models import AzureChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains.question_answering import load_qa_chain

In [14]:
# define prompt template
prompt_template = """You are a food recommender bot that has visited and given reviews for places given in the context. Help users find food recommendations.
Use only the context given to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Filter out any results from the context that you are not so confident of.
Answer the user directly first and then list down your suggestions according to the format below, if the user is asking for suggestions. End your answer right after giving the suggestions. Placeholders are indicated using [] and comments are indicated using (). Recommend more than 1 option to the user, if possible.
Keep your answer to at most 3500 chracters.

[Short direct answer to the user's question]

Here are my recommendations:
🏠 [Name of place]
<i>[venue tags]</i>
✨ Avg Rating: [Rating of venue]
💸 Price: [Estimated price of dining at venue] (this is optional. If not found or not clear, use a dash instead.)
📍 <a href=[Location of venue] ></a>
📝 Reviews:
[list of review_link, seperated by linespace] (Use this format: 1. <a href=[review_link] >[food_desc_title text]</a>)

For example,

🏠 Doodak
<i>Steak, Date Night, Korean, Seafood</i>
✨ Avg Rating: 4
💸 Price: ~$100/pax
📍 <a href="https://www.google.com/maps/search/?api=1&query=1.3521,103.8198"></a>
📝 Reviews:
1. <a href="https://abc.xyz/review1">Good food and presentation</a>

Here is the context:
{context}

Question: {question}
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
PROMPT

PromptTemplate(input_variables=['context', 'question'], template='You are a food recommender bot that has visited and given reviews for places given in the context. Help users find food recommendations.\nUse only the context given to answer the question at the end. If you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\nFilter out any results from the context that you are not so confident of.\nAnswer the user directly first and then list down your suggestions according to the format below, if the user is asking for suggestions. End your answer right after giving the suggestions. Placeholders are indicated using [] and comments are indicated using (). Recommend more than 1 option to the user, if possible.\nKeep your answer to at most 3500 chracters.\n\n[Short direct answer to the user\'s question]\n\nHere are my recommendations:\n🏠 [Name of place]\n<i>[venue tags]</i>\n✨ Avg Rating: [Rating of venue]\n💸 Price: [Estimated price of dining at venue] 

In [15]:
# Instatiate LLM
# llm = AzureChatOpenAI(
#     deployment_name='DEPLOYMENT_NAME',
#     temperature=0
# )
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0, openai_api_key = openai_api_key)

In [16]:
# instantiate QA chain
chain = load_qa_chain(llm, chain_type="stuff", prompt=PROMPT)

In [17]:
# pass example query to vector store and QA chain
query = "Any hawker food to recommend?"
docs = db.similarity_search(query)
output = chain({"input_documents": docs, "question": query}, return_only_outputs=True)

  warn_deprecated(


In [18]:
# print results
# note that there is some hallucination here, as links to the review are not mine.
print(output["output_text"])

Yes, I have a great hawker food recommendation for you!

Here are my recommendations:
🏠 ABC Beehoon
<i>Hawker Food, Seafood</i>
✨ Avg Rating: 5
💸 Price: ~$10/pax
📍 <a href="https://abc.xyz/1"></a>
📝 Reviews:
1. <a href="https://abc.xyz/review1">Absolutely stunning plate of seafood bee hoon!</a>
