# LLM Workshop Notebook - Additional RAG examples

**Author:** Aron Brand

**Copyright (c) 2024**

This notebook is an integral part of Aron Brand's LLM Workshop, designed to explore and utilize the capabilities of large language models (LLMs).

**License:**

This program is free software: you are encouraged to redistribute and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. This software is provided under either version 3 of the License, or (at your discretion) any later version.

The program is distributed in the hope that it will be useful and informative. However, it comes with no warranty; not even the implied warranty of merchantability or fitness for a particular purpose. For more details, please refer to the GNU General Public License.

For a copy of the GNU General Public License, please visit [https://www.gnu.org/licenses/](https://www.gnu.org/licenses/).

In [2]:
%pip install langchain_chroma

Note: you may need to restart the kernel to use updated packages.


Prepare OpenAI Connection

In [3]:
import configparser

# Create a ConfigParser object
config = configparser.ConfigParser()

# Read the configuration file
config.read('config.ini')

# Access the values
OPENAI_API_KEY = config['openai']['OPENAI_API_KEY']

In [4]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model_name="gpt-3.5-turbo",
    temperature=0.2
)

In [5]:
from langchain_openai import OpenAIEmbeddings 

embeddings = OpenAIEmbeddings(
    openai_api_key=OPENAI_API_KEY,
    model="text-embedding-3-small"
)


RAG for CSV Files

This example demonstrates how to use Retrieval-Augmented Generation (RAG) on a CSV file containing city information. It retrieves a recommended city for a user query based on the information in the file.

Each row in the CSV is treated as a document. To facilitate embedding in a vector database, these documents are further split into chunks of up to 200 characters.

To retrieve the complete row (parent document) for each matching chunk, the ParentDocumentRetriever is employed.

In [6]:
from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path="./cities.csv", source_column="City")

data = loader.load()



In [7]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_chroma import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

vectorstore = Chroma(collection_name="full_documents", embedding_function=embeddings)

child_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=5)

# The storage layer for the parent documents
store = InMemoryStore()
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

In [8]:
retriever.add_documents(data, ids=None)

This finds the top k (4) chunks that match a given query

In [9]:
vectorstore.similarity_search("city with cinema event")

[Document(page_content='Timezone: UTC+7\nMayor: Anies Baswedan\nFamous for: Culture\nTourist Attractions: Ancol Dreamland\nAirport Code: CGK\nHotel Name: The Ritz-Carlton Jakarta\nRestaurant Name: Nasi Goreng', metadata={'doc_id': 'c722724c-0d77-4426-aed5-e07e0f886358', 'row': 21, 'source': 'Jakarta'}),
 Document(page_content='Mayor: Eric Garcetti\nFamous for: Entertainment\nTourist Attractions: Hollywood Walk of Fame\nAirport Code: LAX\nHotel Name: The Beverly Hills Hotel\nRestaurant Name: Providence\nMuseum Name: Getty Center', metadata={'doc_id': '28c82c64-a1c2-4a39-b92c-3627d7b50940', 'row': 5, 'source': 'Los Angeles'}),
 Document(page_content='Mayor: Kishori Pednekar\nFamous for: Bollywood\nTourist Attractions: Marine Drive\nAirport Code: BOM\nHotel Name: Taj Mahal Palace\nRestaurant Name: Gajalee', metadata={'doc_id': 'aafa14b4-f21a-4ce0-b06b-e376428761cc', 'row': 18, 'source': 'Mumbai'}),
 Document(page_content='Public Transportation: Metro\nUniversities: Fudan University\nSport

This finds the top parent documents (rows) that match a given query

In [10]:
retrieved_docs = retriever.invoke("a great festival for movie lovers")
retrieved_docs

[Document(page_content='City: Toronto\nCountry: Canada\nPopulation: 2930000\nArea (sq. km): 630.21\nGDP ($): 276000000000\nMajor Landmark: CN Tower\nClimate: Humid continental\nLanguage: English\nCurrency: Canadian Dollar\nTimezone: UTC-5\nMayor: John Tory\nFamous for: Diversity\nTourist Attractions: Royal Ontario Museum\nAirport Code: YYZ\nHotel Name: The Ritz-Carlton\nRestaurant Name: Canoe\nMuseum Name: Art Gallery of Ontario\nPark Name: High Park\nPopulation Density (/sq. km): 4652\nAnnual Visitors: 27400000\nAverage Temperature (Celsius): 8\nAnnual Rainfall (mm): 831\nElevation (m): 76\nPublic Transportation: TTC\nUniversities: University of Toronto\nSports Teams: Toronto Maple Leafs\nFamous Festivals: Toronto International Film Festival\nFamous Foods: Poutine\nMajor Industries: Technology', metadata={'source': 'Toronto', 'row': 19}),
 Document(page_content='City: Delhi\nCountry: India\nPopulation: 30236316\nArea (sq. km): 1484\nGDP ($): 300000000000\nMajor Landmark: India Gate\nC

This shows how to do perform RAG with the cities CSV data

In [11]:
from langchain.chains import RetrievalQA
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

prompt_template = """You are a travel agent. Recommend a single travel destination from the below list based on the user preferences, and explain why. 
Rely only on the provided context and do not provide any additional details on the city. Provide a booking code and Airport Code, average Temperature.
If there is no suitable city, say you don't have any suitable recommendation.
Destinations: {context}
User preferences: {question}
"""

document_prompt = PromptTemplate(input_variables=["page_content", "row"], template="{page_content}, booking_code: {row}")

cityqa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": ChatPromptTemplate.from_template(template=prompt_template), "document_prompt": document_prompt}
)

In [12]:
cityqa.invoke({"query": "I like religious sites"})

{'query': 'I like religious sites',
 'result': "I recommend the city of Bangkok, Thailand. Bangkok is famous for its temples, with attractions such as Wat Phra Kaew and the Grand Palace. The city's rich cultural heritage and religious sites make it a perfect destination for travelers who enjoy exploring religious landmarks. \n\nBooking Code: 11\nAirport Code: BKK\nAverage Temperature: 28°C",
 'source_documents': [Document(page_content='City: Bangkok\nCountry: Thailand\nPopulation: 8271051\nArea (sq. km): 1568.7\nGDP ($): 403000000000\nMajor Landmark: Grand Palace\nClimate: Tropical\nLanguage: Thai\nCurrency: Thai Baht\nTimezone: UTC+7\nMayor: Aswin Kwanmuang\nFamous for: Temples\nTourist Attractions: Wat Phra Kaew\nAirport Code: BKK\nHotel Name: Mandarin Oriental\nRestaurant Name: Nahm\nMuseum Name: Bangkok National Museum\nPark Name: Lumpini Park\nPopulation Density (/sq. km): 5282\nAnnual Visitors: 22700000\nAverage Temperature (Celsius): 28\nAnnual Rainfall (mm): 1500\nElevation (m)

In [13]:
cityqa.invoke({"query": "I want to meet the mayor of a city famous for cinema"})

{'query': 'I want to meet the mayor of a city famous for cinema',
 'result': "I recommend Los Angeles, USA. Los Angeles is famous for its entertainment industry, including Hollywood, and is home to many celebrities and movie studios. The city's mayor, Eric Garcetti, is actively involved in promoting the city's film industry and is a great person to meet if you are interested in cinema. Additionally, Los Angeles has a Mediterranean climate with an average temperature of 19 degrees Celsius, making it a pleasant destination to visit. \n\nBooking code: 5\nAirport Code: LAX\nAverage Temperature: 19°C",
 'source_documents': [Document(page_content='City: Los Angeles\nCountry: USA\nPopulation: 3990456\nArea (sq. km): 1302\nGDP ($): 866000000000\nMajor Landmark: Hollywood Sign\nClimate: Mediterranean\nLanguage: English\nCurrency: US Dollar\nTimezone: UTC-8\nMayor: Eric Garcetti\nFamous for: Entertainment\nTourist Attractions: Hollywood Walk of Fame\nAirport Code: LAX\nHotel Name: The Beverly Hi

Next, we will demonstrate how to perform Retrieval-Augmented Generation (RAG) using data stored in Python dictionaries. In this example, we will showcase how to utilize only the "description" field for the embedding process. Other city attributes will be stored as metadata, meaning they are not included in the retrieval process but are still provided to the Large Language Model (LLM) during the generation phase of RAG.

In [14]:
cities = [
    {
        "city_name": "Paris",
        "description": "Paris is known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. Visitors can also explore charming neighborhoods like Montmartre and enjoy delicious French cuisine. The city's elevation is approximately 35 meters (115 feet) above sea level.",
        "booking_code": 35,
    },
    {
        "city_name": "Tokyo",
        "description": "Tokyo, the capital of Japan, offers a blend of modern skyscrapers and historic temples. Tourists can visit attractions like the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya crossing. With an elevation of about 40 meters (131 feet) above sea level, Tokyo is a vibrant metropolis.",
        "booking_code": 40
    },
    {
        "city_name": "Rome",
        "description": "Rome is a city steeped in history, with landmarks such as the Colosseum, Vatican City, and the Trevi Fountain. Visitors can explore ancient ruins, dine on authentic Italian cuisine, and soak in the vibrant atmosphere. Rome's elevation is approximately 21 meters (69 feet) above sea level.",
        "booking_code": 21
    },
    {
        "city_name": "New York City",
        "description": "New York City, also known as the Big Apple, is a bustling metropolis with iconic sights like Times Square, Central Park, and the Statue of Liberty. Visitors can enjoy Broadway shows, world-class museums, and diverse cuisine options. The city's elevation varies, but most of Manhattan is around 10 meters (33 feet) above sea level.",
        "booking_code": 10
    },
    {
        "city_name": "Sydney",
        "description": "Sydney, located on Australia's east coast, boasts attractions such as the Sydney Opera House, Sydney Harbour Bridge, and Bondi Beach. Visitors can explore the city's vibrant culture, enjoy outdoor activities, and savor fresh seafood. Sydney's elevation is approximately 19 meters (62 feet) above sea level.",
        "booking_code": 19
    },
    {
        "city_name": "Rio de Janeiro",
        "description": "Rio de Janeiro is famous for its stunning beaches, including Copacabana and Ipanema, as well as landmarks like Christ the Redeemer and Sugarloaf Mountain. Visitors can experience the vibrant Carnival atmosphere, explore lush rainforests, and enjoy breathtaking views. The city's elevation is around 0 meters (0 feet) above sea level.",
        "booking_code": 0
    },
    {
        "city_name": "London",
        "description": "London, the capital of England, offers a mix of history, culture, and modernity. Tourists can visit attractions such as the Tower of London, Buckingham Palace, and the British Museum. With an elevation of approximately 35 meters (115 feet) above sea level, London is a dynamic global city.",
        "booking_code": 35
    },
    {
        "city_name": "Cape Town",
        "description": "Cape Town is a coastal city in South Africa known for its stunning natural beauty, including Table Mountain and Cape Point. Visitors can explore historic sites like Robben Island, indulge in wine tasting in the nearby Winelands, and enjoy outdoor activities like hiking and surfing. The city's elevation varies, but most areas are around 10 meters (33 feet) above sea level.",
        "booking_code": 10
    },
    {
        "city_name": "Venice",
        "description": "Venice is a unique city built on water, famous for its picturesque canals, historic architecture, and romantic ambiance. Tourists can visit landmarks such as St. Mark's Basilica, the Grand Canal, and the Rialto Bridge. With an elevation of around 1 meter (3 feet) above sea level, Venice is a must-visit destination.",
        "booking_code": 1
    },
    {
        "city_name": "Dubai",
        "description": "Dubai is known for its futuristic skyline, luxury shopping malls, and extravagant attractions like the Burj Khalifa and Palm Jumeirah. Visitors can experience desert safaris, indoor skiing, and world-class dining. The city's elevation is approximately 16 meters (52 feet) above sea level.",
        "booking_code": 16
    }
]


In [15]:
from langchain_core.documents import Document
import copy

# Function to convert city dictionaries to Document instances
def dict_to_document(dict):
    dict_copy = copy.deepcopy(dict)
    description = dict_copy.pop('description')
    return Document(page_content=description, metadata=dict_copy)

# Convert cities array to documents array
documents = [dict_to_document(city) for city in cities]

documents


[Document(page_content="Paris is known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. Visitors can also explore charming neighborhoods like Montmartre and enjoy delicious French cuisine. The city's elevation is approximately 35 meters (115 feet) above sea level.", metadata={'city_name': 'Paris', 'booking_code': 35}),
 Document(page_content='Tokyo, the capital of Japan, offers a blend of modern skyscrapers and historic temples. Tourists can visit attractions like the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya crossing. With an elevation of about 40 meters (131 feet) above sea level, Tokyo is a vibrant metropolis.', metadata={'city_name': 'Tokyo', 'booking_code': 40}),
 Document(page_content="Rome is a city steeped in history, with landmarks such as the Colosseum, Vatican City, and the Trevi Fountain. Visitors can explore ancient ruins, dine on authentic Italian cuisine, and soak in the vibrant atmosphere. Rome's elevation is appro

In [16]:

from langchain.vectorstores import Chroma

db = Chroma.from_documents(documents, embedding=embeddings)

In [17]:
from langchain.chains import RetrievalQA
from langchain_core.vectorstores import VectorStoreRetriever
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

retriever = VectorStoreRetriever(vectorstore=db)

prompt_template = """You are a travel agent. Recommend a single travel destination from the below list based on the user preferences, and explain why.
Rely only on the provided context and do not provide any additional details on the city. 
Say if you want to book, provide booking code (the code).
If there is no suitable city, say you don't have any suitable recommendation.
Destinations: {context}
User preferences: {question}
"""

document_prompt = PromptTemplate(input_variables=["page_content", "booking_code"], template="{page_content}, booking_code: {booking_code}")

qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": ChatPromptTemplate.from_template(template=prompt_template), "document_prompt": document_prompt}
)


In [18]:
qa.invoke({"query": "I like religious sites in the far east"})

{'query': 'I like religious sites in the far east',
 'result': 'Based on your preference for religious sites in the far east, I recommend Tokyo, the capital of Japan. Tokyo offers a mix of modern skyscrapers and historic temples, including the famous Senso-ji Temple. You can also explore other attractions like the Tokyo Tower while experiencing the vibrant metropolis. \n\nIf you would like to book a trip to Tokyo, please use booking code: 40.',
 'source_documents': [Document(page_content='Tokyo, the capital of Japan, offers a blend of modern skyscrapers and historic temples. Tourists can visit attractions like the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya crossing. With an elevation of about 40 meters (131 feet) above sea level, Tokyo is a vibrant metropolis.', metadata={'booking_code': 40, 'city_name': 'Tokyo'}),
  Document(page_content="Sydney, located on Australia's east coast, boasts attractions such as the Sydney Opera House, Sydney Harbour Bridge, and Bondi Beach. Vi

In [19]:
qa.invoke({"query": "I like boats"})

{'query': 'I like boats',
 'result': "I recommend Venice for you. Venice is a unique city built on water, famous for its picturesque canals. You can explore the city by boat and enjoy the romantic ambiance of this historic destination. Don't miss landmarks such as St. Mark's Basilica, the Grand Canal, and the Rialto Bridge. To book your trip to Venice, use booking code 1.",
 'source_documents': [Document(page_content="Venice is a unique city built on water, famous for its picturesque canals, historic architecture, and romantic ambiance. Tourists can visit landmarks such as St. Mark's Basilica, the Grand Canal, and the Rialto Bridge. With an elevation of around 1 meter (3 feet) above sea level, Venice is a must-visit destination.", metadata={'booking_code': 1, 'city_name': 'Venice'}),
  Document(page_content="Sydney, located on Australia's east coast, boasts attractions such as the Sydney Opera House, Sydney Harbour Bridge, and Bondi Beach. Visitors can explore the city's vibrant cultur

In [24]:
qa.invoke({"query": "I like dogs"})

{'query': 'I like dogs',
 'result': "I don't have any suitable recommendation based on the user preferences provided.",
 'source_documents': [Document(page_content="Rio de Janeiro is famous for its stunning beaches, including Copacabana and Ipanema, as well as landmarks like Christ the Redeemer and Sugarloaf Mountain. Visitors can experience the vibrant Carnival atmosphere, explore lush rainforests, and enjoy breathtaking views. The city's elevation is around 0 meters (0 feet) above sea level.", metadata={'booking_code': 0, 'city_name': 'Rio de Janeiro'}),
  Document(page_content="Dubai is known for its futuristic skyline, luxury shopping malls, and extravagant attractions like the Burj Khalifa and Palm Jumeirah. Visitors can experience desert safaris, indoor skiing, and world-class dining. The city's elevation is approximately 16 meters (52 feet) above sea level.", metadata={'booking_code': 16, 'city_name': 'Dubai'}),
  Document(page_content="Venice is a unique city built on water, fa

In [25]:
qa.invoke({"query": "I like nature and romance"})

{'query': 'I like nature and romance',
 'result': "I would recommend Venice for you. Venice is a unique city built on water, known for its picturesque canals and romantic ambiance. You can explore historic architecture, take a gondola ride through the canals, and visit landmarks like St. Mark's Basilica. The city's elevation is around 1 meter (3 feet) above sea level, making it a must-visit destination for nature lovers and those seeking a romantic getaway.\n\nBooking code: 1",
 'source_documents': [Document(page_content="Rio de Janeiro is famous for its stunning beaches, including Copacabana and Ipanema, as well as landmarks like Christ the Redeemer and Sugarloaf Mountain. Visitors can experience the vibrant Carnival atmosphere, explore lush rainforests, and enjoy breathtaking views. The city's elevation is around 0 meters (0 feet) above sea level.", metadata={'booking_code': 0, 'city_name': 'Rio de Janeiro'}),
  Document(page_content="Venice is a unique city built on water, famous fo

In [29]:
qa.invoke({"query": "Find the most modern city that is not too hot"})

{'query': 'Find the most modern city that is not too hot',
 'result': 'Based on the user preferences of finding the most modern city that is not too hot, I would recommend Tokyo, the capital of Japan. Tokyo offers a blend of modern skyscrapers and futuristic technology while also maintaining a comfortable temperature for most of the year. With an elevation of about 40 meters (131 feet) above sea level, Tokyo is a vibrant metropolis that is not too hot compared to other cities on the list. \n\nBooking code: 40',
 'source_documents': [Document(page_content='Tokyo, the capital of Japan, offers a blend of modern skyscrapers and historic temples. Tourists can visit attractions like the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya crossing. With an elevation of about 40 meters (131 feet) above sea level, Tokyo is a vibrant metropolis.', metadata={'booking_code': 40, 'city_name': 'Tokyo'}),
  Document(page_content='London, the capital of England, offers a mix of history, culture, an