# abc Trip Planner (MVP)

Members
* Daniel Jiang (1009509)
* John Mo (1009513)
* Timothy Zheng (1009502)

## Notes

To run this program, you will need to create and populate a `.env` file with your OpenAI API key

`.env`

```bash
OPENAI_API_KEY = "your_key_here"
```

### TODO
* Set up OpenAI key
* Implement LLM
* RAG system
* (eval) compare to "ground truth"

### Innovative "ideas"
* Take note of the places LLM suggests, then make LLM system search online for the prices of places. Then output estimated cost of trip.
* Approach the topic as a "narrative journey": Add the history of places, culture, etc and make attractions "references" to the narrative, instead of the primary focus. Make the narrative the primary focus.
* Attach a packing list
* Implement weather forecast into trip plan

In [1]:
%pip uninstall pandas -y
%pip uninstall numpy -y
%pip uninstall pytz -y
%pip uninstall python-dateutil -y

Found existing installation: pandas 2.2.3
Uninstalling pandas-2.2.3:
  Successfully uninstalled pandas-2.2.3
Note: you may need to restart the kernel to use updated packages.
Found existing installation: numpy 1.26.4
Uninstalling numpy-1.26.4:
  Successfully uninstalled numpy-1.26.4
Note: you may need to restart the kernel to use updated packages.
Found existing installation: pytz 2024.2
Uninstalling pytz-2024.2:
  Successfully uninstalled pytz-2024.2
Note: you may need to restart the kernel to use updated packages.
Found existing installation: python-dateutil 2.9.0.post0
Uninstalling python-dateutil-2.9.0.post0:
  Successfully uninstalled python-dateutil-2.9.0.post0
Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install --upgrade pip --quiet
%pip install openai --quiet
%pip install python-dotenv --quiet
%pip install praw --quiet
%pip install pydantic --quiet
%pip install langchain-community --quiet
%pip install bs4 --quiet
%pip install pandas --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
from openai import OpenAI
from dotenv import load_dotenv
from IPython.display import display, Markdown, clear_output
import time

load_dotenv() # Requires .env file

True

In [4]:
client = OpenAI()

system_prompt = """You are a knowledgeable travel assistant. Provide a comprehensive and well-structured guide for the specified destination. Your response should be detailed, accurate, and formatted using markdown. Follow these guidelines:

1. Begin with a brief introduction about the destination.
2. Use level 2 headers (##) to separate main sections.
3. Use bolding (**) for subsections or important points within sections.
4. Include exclusively the following sections in exactly this order.
   - Best Time to Visit
   - Top Attractions
   - Local Culture and Etiquette
   - Getting Around
   - Food and Dining
   - Accommodation
   - Safety and Health
   - Budget Tips
   - Unique Experiences
   - Essential Phrases (if applicable)

5. Provide practical, up-to-date information and insider tips, highlighting any date-specific events if possible.
6. Use bullet points sparingly, preferring well-structured paragraphs.
7. Include any recent developments or changes affecting travel to the destination.
8. Conclude with a brief summary or final travel tip.

Aim for a comprehensive guide that's easy to read and informative for travelers."""

In [5]:
destination = "Chiang Mai"
stream = client.chat.completions.create(
    model="gpt-3.5-turbo-0125",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Provide key travel tips for {destination}"}
    ],
    stream=True
)

full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        full_response += content
        
        # Clear the current output and display the updated markdown
        clear_output(wait=True)
        display(Markdown(full_response))
        
        # Add a small delay to make the streaming visible
        time.sleep(0.01)

## Introduction
Chiang Mai, a charming city in northern Thailand, is known for its rich cultural heritage, stunning temples, vibrant markets, and lush landscapes. Travelers flock to this city to experience its unique blend of history, nature, and modern amenities.

## Best Time to Visit
**The best time to visit Chiang Mai is during the cool season, from November to February, when the weather is pleasant and ideal for outdoor activities.**

## Top Attractions
- **Wat Phra That Doi Suthep:** A sacred temple located on a mountain with panoramic views of the city.
- **Old City:** Explore the ancient walls and moats, visit temples, and experience the local way of life.
- **Night Bazaar:** Shop for local handicrafts, souvenirs, and enjoy street food.
- **Elephant Nature Park:** Interact with rescued elephants ethically and learn about conservation efforts.
- **Doi Inthanon National Park:** Discover Thailand's highest peak, waterfalls, and hill tribe villages.

## Local Culture and Etiquette
- **Respect for the King and the Royal Family is paramount, and it's important to dress modestly when visiting temples.**
- **Remove shoes before entering homes or temples.**
- **Greet locals with a slight bow (wai) and a smile.**
- **Avoid public displays of affection.**

## Getting Around
- **Tuk-tuks and songthaews are popular modes of transportation within the city.**
- **Renting a scooter is a convenient way to explore Chiang Mai and its surrounding areas.**
- **Grab (ride-hailing app) and traditional taxis are also available for longer distances.**

## Food and Dining
- **Don't miss trying khao soi, a popular Northern Thai coconut curry noodle soup.**
- **Visit local markets like Warorot Market and Muang Mai Market for authentic Thai street food.**
- **Experience a traditional Khantoke dinner with cultural performances.**

## Accommodation
- **Chiang Mai offers a range of accommodation options from budget hostels to luxury resorts.**
- **Stay in the Old City for easy access to temples and markets, or Nimmanhaemin Road for a trendy vibe.**

## Safety and Health
- **Chiang Mai is relatively safe, but be cautious of pickpocketing in crowded areas.**
- **Drink bottled water and use mosquito repellent to prevent mosquito-borne illnesses.**
- **Check travel advisories for any health concerns or updates on safety issues.**

## Budget Tips
- **Street food and local markets offer affordable dining options.**
- **Take advantage of free temple visits and explore nature parks for budget-friendly activities.**
- **Consider staying in guesthouses or hostels for budget accommodation.**

## Unique Experiences
- **Attend the Yi Peng Lantern Festival in November for a magical sight of sky lanterns floating into the night sky.**
- **Learn Thai cooking with a local cooking class and visit organic farms.**
- **Explore the countryside by trekking through the hills and staying in remote villages.**

## Essential Phrases
- **Sawadee ka/kap (Hello)**
- **Kob khun ka/kap (Thank you)**
- **Nitt mo nah? (How much is this?)**

## Summary
Chiang Mai offers a perfect blend of cultural experiences, natural beauty, and modern amenities for travelers. By respecting local customs, trying regional cuisine, and exploring the city's unique attractions, visitors can immerse themselves in the charm of this northern Thai gem.

***

## RAG Implementation

## TODO: Need to create system that dynamically finds reddit threads (maybe tripadvisor if time permits) given a location (refer to the diagram in slides to implement)
* Searches google API for places based on "things to do in <location> site:reddit.com" etc.

In [6]:
# code here

### Testing
* Below is a test list of sources for Chiang Mai (still need to implement the system for fetching docs from Google)

In [7]:
sources = [
    'https://www.reddit.com/r/chiangmai/comments/1cb3fx4/yi_pengloi_krathong_2024/',
    'https://www.reddit.com/r/ThailandTourism/comments/1cpccho/yi_peng_festival_on_a_budget/',
    'https://www.reddit.com/r/ThailandTourism/comments/13b0gld/help_me_with_chiang_mai_itinerary/',
    'https://www.reddit.com/r/chiangmai/comments/1evxoyu/4_days_trip_in_chiang_mai_any_suggestions_for/',
    'https://www.reddit.com/r/chiangmai/comments/1er2mj4/things_to_do_in_chiang_mai/',
    'https://www.reddit.com/r/chiangmai/comments/1bmosut/chiang_mai_trip_report_places_to_eat_activities/',
    'https://www.reddit.com/r/chiangmai/comments/185wni7/what_to_do_in_chiang_mai/',
    
    # buggy tripadvisor
    'https://www.tripadvisor.com/ShowUserReviews-g293917-d8820428-r621075297-Yi_Peng_and_Loy_Krathong_Lantern_Festival-Chiang_Mai.html'
]

In [8]:
import praw
import requests
from bs4 import BeautifulSoup, SoupStrainer
from urllib.parse import urlparse
import pandas as pd
import time
from typing import List, Dict
import os




### Reddit API Key
Create a Reddit API key by following the instructions [here](https://www.reddit.com/prefs/apps) (create a script)

add the following to your `.env` file
```bash
REDDIT_CLIENT_ID = "<your-reddit-client-id>"
REDDIT_CLIENT_SECRET = "<your-reddit-client-secret>" 
REDDIT_USER_AGENT = "python:<app-name>:v1.0 (by /u/<your-reddit-username>)"
```

In [9]:
# Scrape reddit using praw

# using praw to better seperate reddit data
reddit = praw.Reddit(
    client_id=os.environ.get("REDDIT_CLIENT_ID"),
    client_secret=os.environ.get("REDDIT_CLIENT_SECRET"),
    user_agent=os.environ.get("REDDIT_USER_AGENT"),
)

def scrape_reddit_post(url: str) -> Dict:
    """Scrape content from a Reddit post."""
    try:
        # Extract post ID from URL
        submission_id = url.split('comments/')[1].split('/')[0]
        submission = reddit.submission(id=submission_id)
        
        # Get post content
        post_data = {
            'title': submission.title,
            'text': submission.selftext,
            'comments': [],
            'url': url,
            'source': 'reddit'
        }
        
        # Get comments (limit to top-level comments)
        submission.comments.replace_more(limit=0)
        for comment in submission.comments:
            if not comment.stickied:  # Skip stickied comments
                post_data['comments'].append(comment.body)
        
        return post_data
    
    except Exception as e:
        print(f"Error scraping Reddit post {url}: {str(e)}")
        return None


In [10]:
from langchain_community.document_loaders import WebBaseLoader

# TODO scraping tripadvisor is too buggy keeps getting rate limited
def scrape_tripadvisor(url: str):
    loader = WebBaseLoader(
    web_paths=(url,),
    bs_kwargs=dict(
        parse_only=SoupStrainer(
            class_=("post-content", "post-title", "post-header")
            )
        ),
    )
    return loader.load()[0]

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [11]:
# Combined function
def scrape_urls(urls: List[str]) -> pd.DataFrame:
    """Scrape content from a list of URLs."""
    scraped_data = []
    
    for url in urls:
        print(f"Scraping {url}")
        domain = urlparse(url).netloc
        
        # Add delay between requests
        time.sleep(1)
        
        if 'reddit.com' in domain:
            data = scrape_reddit_post(url)
        else:
            data = scrape_tripadvisor(url)
            
        if data:
            scraped_data.append(data)
    
    # Convert to DataFrame
    df = pd.DataFrame(scraped_data)
    return df

# non-reddit links keep getting rate limited
# display(scrape_urls(['https://www.tripadvisor.com/ShowUserReviews-g293917-d8820428-r621075297-Yi_Peng_and_Loy_Krathong_Lantern_Festival-Chiang_Mai.html']))

In [12]:
# Scrape reddit only
reddit_results = scrape_urls([s for s in sources if 'reddit' in s])

# Display first few rows
display(reddit_results.head())

Scraping https://www.reddit.com/r/chiangmai/comments/1cb3fx4/yi_pengloi_krathong_2024/
Scraping https://www.reddit.com/r/ThailandTourism/comments/1cpccho/yi_peng_festival_on_a_budget/
Scraping https://www.reddit.com/r/ThailandTourism/comments/13b0gld/help_me_with_chiang_mai_itinerary/
Scraping https://www.reddit.com/r/chiangmai/comments/1evxoyu/4_days_trip_in_chiang_mai_any_suggestions_for/
Scraping https://www.reddit.com/r/chiangmai/comments/1er2mj4/things_to_do_in_chiang_mai/
Scraping https://www.reddit.com/r/chiangmai/comments/1bmosut/chiang_mai_trip_report_places_to_eat_activities/
Scraping https://www.reddit.com/r/chiangmai/comments/185wni7/what_to_do_in_chiang_mai/


Unnamed: 0,title,text,comments,url,source
0,Yi Peng/Loi Krathong 2024,Hey everyone!\n\nMy Girl and me want to visit ...,"[There is events you can pay to go, but I woul...",https://www.reddit.com/r/chiangmai/comments/1c...,reddit
1,Yi Peng Festival - on a Budget,"Hi, so I'm from the Philippines! \n\nMy friend...",[No matter where you stay in the Old City area...,https://www.reddit.com/r/ThailandTourism/comme...,reddit
2,Help me with Chiang Mai itinerary,This September me and my friends will be in Ch...,[You'll be completely fucked after a night bus...,https://www.reddit.com/r/ThailandTourism/comme...,reddit
3,4 days trip in Chiang Mai - Any suggestions fo...,Hello all! I'm planning on a solo trip to Chia...,[Sticky waterfall: a wonder that will mess wit...,https://www.reddit.com/r/chiangmai/comments/1e...,reddit
4,Things to do in Chiang Mai,Hey guys! I'm an Irish Tourist and am in Chian...,[Here’s a helpful [Chiang Mai Itinerary](https...,https://www.reddit.com/r/chiangmai/comments/1e...,reddit


### Encode Reddit data in single string for embedding

Target structure (Markdown):

* Markdown performance in GPT-4 is cheaper and more token-efficient
- https://arxiv.org/html/2411.10541v1
   - GPT-3.5-turbo prefers JSON, whereas GPT-4 favors Markdown
   - Encode in markdown since we are using GPT-4
- https://community.openai.com/t/markdown-is-15-more-token-efficient-than-json/841742

In [13]:
first_post = reddit_results.iloc[0]
first_post

title                               Yi Peng/Loi Krathong 2024
text        Hey everyone!\n\nMy Girl and me want to visit ...
comments    [There is events you can pay to go, but I woul...
url         https://www.reddit.com/r/chiangmai/comments/1c...
source                                                 reddit
Name: 0, dtype: object

In [14]:
from dataclasses import dataclass
from typing import List
import textwrap

@dataclass
class RedditThread:
    title: str
    text: str
    comments: List[str]
    url: str
    source: str
    
    def __str__(self) -> str:
        # Format the main post
        output = [
            f"# {self.title}",
            "",
            f"## Original Post",
            "",
            textwrap.fill(self.text, width=80),
            "",
            "### Comments",
            ""
        ]
        
        # Format comments with indentation to show thread structure
        for comment in self.comments:
            # Calculate comment depth based on leading spaces
            depth = len(comment) - len(comment.lstrip())
            # Remove leading spaces and wrap text
            clean_comment = comment.lstrip()
            # Add markdown list marker with proper indentation
            indent = "  " * (depth // 2)  # Adjust indentation based on depth
            wrapped_comment = textwrap.fill(
                clean_comment,
                width=80,
                initial_indent=f"{indent}- ",
                subsequent_indent=f"{indent}  "
            )
            output.append(wrapped_comment)
        
        output += [
            "",
            "#### URL",
            self.url
        ]
        
        return "\n".join(output)
    
    def get_comments_as_str(self):
        output = ["### Comments"]
        # Format comments with indentation to show thread structure
        for comment in self.comments:
            # Calculate comment depth based on leading spaces
            depth = len(comment) - len(comment.lstrip())
            # Remove leading spaces and wrap text
            clean_comment = comment.lstrip()
            # Add markdown list marker with proper indentation
            indent = "  " * (depth // 2)  # Adjust indentation based on depth
            wrapped_comment = textwrap.fill(
                clean_comment,
                width=80,
                initial_indent=f"{indent}- ",
                subsequent_indent=f"{indent}  "
            )
            output.append(wrapped_comment)
        return "\n".join(output)

In [15]:
reddit_threads = [RedditThread(
    title=row['title'],
    text=row['text'],
    comments=row['comments'],
    url=row['url'],
    source=row['source']
) for index, row in reddit_results.iterrows()]

In [16]:
str(reddit_threads[0])

"# Yi Peng/Loi Krathong 2024\n\n## Original Post\n\nHey everyone!  My Girl and me want to visit Thailand in November. We saw that\naround the 15./16. of November there will be the Yi Peng and Loi Krathong\nfestivities in Chiang Mai. It is not easy to get an overview of what to expect\nthere. So I am grateful for any information!   So here are a few questions:  -\nDo we need to attend the paid festival by CAD? Or is this just a touristic cash\ngrab? - Where to get a lantern? - Are we allowed to let a lantern start from\nwithin the city?  - What is a “must see” at that time in Chiang mai? e.g. any\nspecial ceremonies? Parades? - Is there like a timetable what is happening when\nin the city?   Thank & Cheers!\n\n### Comments\n\n- There is events you can pay to go, but I would advise against it. It's very\n  simple festival, go to any bridge on the ping river, there is lanterns\n  everywhere to buy from street vendors, you can buy flowers to set on the water\n  etc. You dont have to stress

In [17]:
display(Markdown(str(reddit_threads[0])))

# Yi Peng/Loi Krathong 2024

## Original Post

Hey everyone!  My Girl and me want to visit Thailand in November. We saw that
around the 15./16. of November there will be the Yi Peng and Loi Krathong
festivities in Chiang Mai. It is not easy to get an overview of what to expect
there. So I am grateful for any information!   So here are a few questions:  -
Do we need to attend the paid festival by CAD? Or is this just a touristic cash
grab? - Where to get a lantern? - Are we allowed to let a lantern start from
within the city?  - What is a “must see” at that time in Chiang mai? e.g. any
special ceremonies? Parades? - Is there like a timetable what is happening when
in the city?   Thank & Cheers!

### Comments

- There is events you can pay to go, but I would advise against it. It's very
  simple festival, go to any bridge on the ping river, there is lanterns
  everywhere to buy from street vendors, you can buy flowers to set on the water
  etc. You dont have to stress to much about time, when the sun goes down around
  7 is when you can expect the lanterns to raise to the air. Go have a nice
  dinner near ping river (Riverside or The Dukes, and when dinner is done go to
  the river and enjoy the sights and buy some lanterns and some flower
  decorations to give to the water spirits ( which is from my understanding what
  the whole festival Loi Krathong is about)   I hired a photographer for 2 hours
  in 2017 and got some amazing shots with me and my family.
- Hey, I was in CM for Yi Peng/Loi Krathong last November, it was very good. No
  need to go to the paid festival, the city centre (particularly around the
  river area) is absolutely buzzing with people/festivities and decorated
  beautifully.  In terms of launching lanterns from the city - people seemed to
  be launching them mostly from Nawarat Bridge (literally hundreds, maybe
  thousands). I can't endorse this though as it's banned due to fire risk. It
  also can't be great for the environment / wildlife. Unfortunately there was a
  big fire last year that took hours to put out
  (https://www.nationthailand.com/thailand/general/40033278). I witnessed it
  from my balcony and it was a very sad sight. Apparently this has happed in
  previous years too. Fortunately I think nobody was hurt, but people probably
  lost homes, or at very least businesses.  Other than that, it was a really
  great festival. CM (and Thailand in general) has some of the strongest
  festival game I've ever seen. Each one truly magical and memorable. And yes,
  there was parade around the perimeter of the Old City on one of the last days,
  that was good too.  One thing I'd say is to try and book accommodation and
  flights (if that's how you'll be travelling) well in advance as a lot of
  people visit CM for this festival. I was there for the the whole of November
  and the influx was very noticeable. Afterwards, as my departure from CM
  happened to coincide with the end of Yi Peng/Loi Krathong, I ended up having
  to get a 10hr bus to BKK to just get a reasonably priced last(ish) minute
  flight to HCM as this was \~$300 from CM (usually $50-70 for this route). I
  imagine it would've been cheaper from CM if I'd booked more in advance.  Have
  a great time! I love CM so much, one of my favourite places in the world. If
  you have time, drive \~20mins from the centre (or get a Grab taxi for \~300
  bhat) to Royal Park Rajapruek. Afterwards, you can also drive/taxi 5-10min up
  to Wat Phra That Doi Kham, the temple overlooking the park + has a great view
  over CM (if you go up by Grab taxi you may have to get a songthaew back down
  though as I'm not sure if Grab collect from up there, or perhaps you could try
  and tipping the Grab driver that took you up to wait for you). This area is
  truly special and relatively unknown / unvisited by tourists as it's a bit out
  of town.
- The paid event is worth it if you dgaf about the cost and want something less
  crowded and organized including transportation to/from and everything planned
  out.  There’s a free event at doi saket lakes which is fun but insanely
  overcrowded. It’s absolutely crazy how many people are here for it. We’re
  already sold out of beds for this year.

#### URL
https://www.reddit.com/r/chiangmai/comments/1cb3fx4/yi_pengloi_krathong_2024/

### Add reddit data into vector store

In [18]:
%pip install --quiet sentence-transformers
%pip install --quiet langchain
%pip install --quiet langchain_openai

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [19]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o"
)

In [20]:
reddit_thread_strings = [str(t) for t in reddit_threads] # not being used anymore

In [21]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.schema import Document
import numpy as np
from typing import List, Optional
import re

def create_reddit_vectorstore(
    reddit_threads: List[RedditThread], 
    persist_directory: str = "./chroma_db",
) -> Chroma:
# Initialize embeddings model
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L12-v2", # allows embedding the entire reddit thread, which is probably more useful
        model_kwargs={'device': 'mps'}
    )
    
    # Convert threads to Documents with metadata
    documents = []
    for i, thread in enumerate(reddit_threads):
        metadata = {
            "thread_id": i,
            "title": thread.title,
            "content": thread.text,
            "comments": thread.get_comments_as_str(),
            "url": thread.url,
            "type": "reddit_thread",
        }
        
        doc = Document(
            page_content=str(thread),
            metadata=metadata
        )
        documents.append(doc)
    
    vectorstore = Chroma.from_documents(
        documents=documents,
        embedding=embeddings,
        persist_directory=persist_directory
    )
    
    vectorstore.persist()
    return vectorstore

# Create vectorstore
vectorstore = create_reddit_vectorstore(reddit_threads)

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 3,
        "filter": None,
    }
)

  embeddings = HuggingFaceEmbeddings(
  from .autonotebook import tqdm as notebook_tqdm
  vectorstore.persist()


In [22]:
from langchain_core.prompts import ChatPromptTemplate

rag_system_prompt = (
    "You're a helpful AI assistant. Given a user question "
    "and some Reddit reddit threads formatted in markdown format composing of user questions and comments, answer the user "
    "question using the replies in the comments based on the topic of the thread. If none of the threads answer the question, "
    "just say you don't know."
    "\n\nHere are the Reddit Threads in full: "
    "{context}"
)

rag_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", rag_system_prompt),
        ("human", "{input}"),
    ]
)

In [23]:
def format_reddit_threads(threads):
    enumerated_reddit_threads = [
        f"Source ID: {i}\nReddit Thread: {thread}"
        for i, thread in enumerate(threads)
    ]

    return "\n\n" + "\n\n".join(enumerated_reddit_threads)


In [24]:
from pydantic import BaseModel, Field
from langchain_core.runnables import RunnablePassthrough


class Citation(BaseModel):
    source_id: int = Field(
        ...,
        description="The integer ID of a SPECIFIC source which justifies the answer.",
    )
    quote: str = Field(
        ...,
        description="The VERBATIM quote from the specified source that justifies the answer.",
    )


class QuotedAnswer(BaseModel):
    """Answer the user question based only on the given sources, and cite the sources used."""

    answer: str = Field(
        ...,
        description="The answer to the user question, which is based only on the given sources.",
    )
    citations: List[Citation] = Field(
        ..., description="Citations from the given sources that justify the answer."
    )

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_reddit_threads(x["context"])))
    | rag_prompt
    | llm.with_structured_output(QuotedAnswer)
)


retrieve_docs = (lambda x: x["input"]) | retriever

chain = RunnablePassthrough.assign(context=retrieve_docs).assign(
    answer=rag_chain_from_docs
)

In [25]:
result = chain.invoke({"input": f"What is a must-eat food in {destination}?"})

In [26]:
result["answer"]

QuotedAnswer(answer='In Chiang Mai, a must-eat food is Khao Soi, a northern Thai dish. Additionally, you should try Hung Lay curry and explore Lanna (northern) Thai food, which includes dishes such as nam prik ong, guang hung lay, and sai ua.', citations=[Citation(source_id=2, quote='Hung Lay curry.'), Citation(source_id=2, quote="do make sure you explore Lanna (northern) Thai food of which khao soi is only one dish and there's some great dishes to try, including nam prik ong, guang hung lay, sai ua and more.")])

In [27]:
result["answer"].answer

'In Chiang Mai, a must-eat food is Khao Soi, a northern Thai dish. Additionally, you should try Hung Lay curry and explore Lanna (northern) Thai food, which includes dishes such as nam prik ong, guang hung lay, and sai ua.'

In [30]:
import json
def format_citations_to_json(citations: List[Citation]) -> str:
    # Convert citations to list of dictionaries
    citations_dict = [citation.model_dump() for citation in citations]
    
    # Convert to JSON with proper formatting
    json_output = json.dumps(citations_dict, indent=2)
    return json_output

In [32]:
print(format_citations_to_json(result["answer"].citations))

[
  {
    "source_id": 2,
    "quote": "Hung Lay curry."
  },
  {
    "source_id": 2,
    "quote": "do make sure you explore Lanna (northern) Thai food of which khao soi is only one dish and there's some great dishes to try, including nam prik ong, guang hung lay, sai ua and more."
  }
]


In [33]:
first_result = result['context'][0]
print(first_result.page_content)

print("Metadata")

import json

print(json.dumps(
    first_result.metadata,
    sort_keys=False,
    indent=4,
    separators=(',', ': ')
))

# Things to do in Chiang Mai

## Original Post

Hey guys! I'm an Irish Tourist and am in Chiang Mai for a few days Does anyone
have any recommendations for food or drinks?  I love a good bar, and I love good
food. Apart from that please feel free to mention any other cool stuff!
(Picture: Mae Salong, Chiang Rai)

### Comments

- Here’s a helpful [Chiang Mai Itinerary](https://travelhiatus.com/the-
  perfect-4-days-in-chiang-mai-itinerary/) guide
- https://www.tripadvisor.com/Tourism-g293917-Chiang_Mai-Vacations.html
- Go to santitham and never leave
- Skugga estate, the queen's botanical garden and Elephant Nature Park and
  sunday night market were our highlights.
- Well the UN Irish pub for a pint of Guinness and there's plenty on that road
  to keep you entertained.

#### URL
https://www.reddit.com/r/chiangmai/comments/1er2mj4/things_to_do_in_chiang_mai/
Metadata
{
    "comments": "### Comments\n- Here\u2019s a helpful [Chiang Mai Itinerary](https://travelhiatus.com/the-\n  perfect-

In [34]:
# Here's all the context
result['context']

[Document(metadata={'comments': "### Comments\n- Here’s a helpful [Chiang Mai Itinerary](https://travelhiatus.com/the-\n  perfect-4-days-in-chiang-mai-itinerary/) guide\n- https://www.tripadvisor.com/Tourism-g293917-Chiang_Mai-Vacations.html\n- Go to santitham and never leave\n- Skugga estate, the queen's botanical garden and Elephant Nature Park and\n  sunday night market were our highlights.\n- Well the UN Irish pub for a pint of Guinness and there's plenty on that road\n  to keep you entertained.", 'content': "Hey guys! I'm an Irish Tourist and am in Chiang Mai for a few days\nDoes anyone have any recommendations for food or drinks? \nI love a good bar, and I love good food.\nApart from that please feel free to mention any other cool stuff! \n(Picture: Mae Salong, Chiang Rai)\n", 'thread_id': 4, 'title': 'Things to do in Chiang Mai', 'type': 'reddit_thread', 'url': 'https://www.reddit.com/r/chiangmai/comments/1er2mj4/things_to_do_in_chiang_mai/'}, page_content="# Things to do in Chi

### Generate list of cited claims to add to the enriched RAG answer
* Small change in architecture: our program will output a list of cited claims, which is then passed into the LLM to generate the final answer.

In [36]:
rag_queries = [
    # Core travel information
    f"What are the most underrated or secret spots in {destination} that tourists usually miss?",
    f"What's the current situation with tourists in {destination} as of 2024?",
    f"What are common tourist scams to avoid in {destination} right now?",
    
    # Local insights
    f"Locals of {destination}, what tips would you give to tourists in 2024?",
    f"What are the best neighborhoods to stay in {destination} for different budgets?",
    f"What's the best way to get around {destination} nowadays?",
    
    # Food and culture
    f"What are must-try local foods in {destination} and where to find them?",
    f"What cultural faux pas should tourists avoid in {destination}?",
    f"Which local festivals or events are worth planning a trip around in {destination}?",
    
    # Practical advice
    f"How much does everything cost in {destination} in 2024?",
    f"What's the best time to visit {destination} and why?",
    f"What should I absolutely pack for {destination} that most tourists forget?",
    
    # Safety and logistics
    f"How safe is {destination} for solo travelers in 2024?",
    f"What areas should tourists avoid in {destination}?",
    f"What's the current transportation situation in {destination}?"
]

In [37]:
from typing import List, Dict, Tuple
import re

def generate_qa_pairs(destination: str, chain, queries: List[str]) -> List[Tuple[str, str]]:
    qa_pairs = []
    
    for query in queries:
        
        # Get relevant documents from the retriever
        result = chain.invoke({"input": query})
        
        if not result:
            print(f"Unable to get result for query {query}, skipping")
            continue
        
        # Generate answer using a template
        answer = result['answer'].answer
        citations = format_citations_to_json(result['answer'].citations)
        retrieved_docs = result['context']
        
        
        answer = f"Based on recent Reddit discussions:\n{answer}\nCitations:\n{citations}"
            
        qa_pairs.append((query, answer, citations, retrieved_docs)) # also store retireved docs for benchmarking
    
    return qa_pairs

### Generate Answer

In [39]:
qa_pairs = generate_qa_pairs(destination=destination, chain=chain, queries=rag_queries[:2])
reddit_insights = "\n\n".join([
        f"Q: {q}\nA: {a}" for q, a, citations, retrieve_docs in qa_pairs
    ])

In [41]:
print(reddit_insights)

Q: What are the most underrated or secret spots in Chiang Mai that tourists usually miss?
A: Based on recent Reddit discussions:
Some underrated or secret spots in Chiang Mai that tourists usually miss include:

1. **Sticky Waterfalls**: A unique waterfall experience where the algae-free rocks allow you to climb up the falls.
2. **Elely Cafe**: A little riverside cafe with elephants, overshadowed by about 20 dachshunds, offering a unique cafe experience.
3. **Monjam**: Spend a night in "bubble" "glamping" hotels with a beautiful view of the mountains.
4. **Wat Umong**: A favorite temple located south of the Suthep CMU campus, known for its tranquility.
5. **Doi Inthanon**: While it's known as Thailand’s highest mountain, the cool climate and the temple at the top make it a worthwhile visit.
6. **Mae Ngat Dam**: Offers beautiful places to stay and is less frequented by tourists.
7. **Tham Muang On**: A cave temple sometimes visited by monkeys, located in Chiang Dao.
Citations:
[
  {
   

In [42]:
rag_system_prompt = """You are a knowledgeable travel assistant. Provide a comprehensive and well-structured guide for the specified destination. Your response should be detailed, accurate, and formatted using markdown. Follow these guidelines:


The following information has been verified from recent Reddit discussions and should be incorporated into your response. 
Each piece of information you use must be cited with a number in brackets (e.g., [1], [2]). Only include information that has citations:

{reddit_insights}

Make sure you follow these guidelines:

1. Begin with a brief introduction about the destination.
2. Use level 2 headers (##) to separate main sections.
3. Use bolding (**) for subsections or important points within sections.
4. Include exclusively the following sections in exactly this order.
   - Best Time to Visit
   - Top Attractions
   - Local Culture and Etiquette
   - Getting Around
   - Food and Dining
   - Accommodation
   - Safety and Health
   - Budget Tips
   - Unique Experiences
   - Essential Phrases (if applicable)

5. Provide practical, up-to-date information and insider tips, highlighting any date-specific events if possible.
6. Use bullet points sparingly, preferring well-structured paragraphs.
7. Include any recent developments or changes affecting travel to the destination.
8. Cite every factual claim using numbers in brackets (e.g., "The Sticky Waterfalls are known for their unique climbing experience[1]").
9. At the end of your response, include a "Citations" section with all referenced quotes.
10. Prioritize recent information and current trends.

Citations must be formatted as:
[1] "exact quote from source"
[2] "exact quote from source"

Aim for a comprehensive guide that's easy to read and informative for travelers."""

### Add Citations to Answer

In [43]:
destination = "Chiang Mai"
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": rag_system_prompt},
        {"role": "user", "content": f"Provide key travel tips for {destination}"}
    ],
    stream=True
)

full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        full_response += content
        
        # Clear the current output and display the updated markdown
        clear_output(wait=True)
        display(Markdown(full_response))
        
        # Add a small delay to make the streaming visible
        time.sleep(0.01)

# Chiang Mai Travel Guide

Chiang Mai, located in Northern Thailand, is a city renowned for its beautiful mountainous scenery, rich culture, and historical significance. Often referred to as the "Rose of the North," Chiang Mai offers a harmonious blend of historical heritage, vibrant festivals, and natural wonders, making it a must-visit destination for travelers.

## Best Time to Visit

**November to February** is generally considered the best time to visit Chiang Mai as the weather is cooler and more pleasant. This period falls right after the rainy season, bringing lush greenery to the region. The famous **Yi Peng Lantern Festival** also takes place in November, offering a spectacular sight as thousands of lanterns light up the night sky[1].

## Top Attractions

Chiang Mai is home to over 300 Buddhist temples, with some of the most iconic being:

- **Wat Phra That Doi Suthep**: This sacred temple is perched on Doi Suthep mountain, providing panoramic views of Chiang Mai city[2].
- **Wat Chedi Luang**: Known for its massive pagoda, this temple is a significant historical and spiritual site in the city[2].

The **Sticky Waterfalls**, also known as Bua Thong Waterfalls, are a unique natural attraction famous for their climbable limestone surface, making it a popular spot for visitors[3].

## Local Culture and Etiquette

Chiang Mai locals are generally polite and friendly. Understanding a few cultural etiquettes can enhance your experience:

- **Dress modestly** when visiting temples; shoulders and knees should be covered.
- **Remove shoes** before entering homes and temples.
- **Respect the monarchy**; speaking against the royal family is a serious offence in Thailand[4].

## Getting Around

Chiang Mai offers several modes of transportation:

- **Songthaews (Red Trucks)**: These are shared taxis and a cost-effective way to travel around the city.
- **Tuk-tuks**: Convenient for short distances but agree on a fare before starting the journey.
- **Renting a scooter**: A popular option for exploring the surrounding countryside; ensure you have a valid international driving permit[5].

## Food and Dining

Chiang Mai's cuisine is a highlight of any visit. Must-try dishes include:

- **Khao Soi**: A rich, coconut-based curry noodle soup that is emblematic of Northern Thai cuisine.
- **Sai Oua**: A spicy Northern Thai sausage, often enjoyed as a snack.
- For a memorable dining experience, visit the **Sunday Walking Street Market**, where local street food and crafts are abundant[6].

## Accommodation

Chiang Mai offers a wide range of accommodations:

- **Old City**: Ideal for first-time visitors seeking cultural sights and easy access to temples.
- **Nimmanhaemin (Nimman)**: Popular among young travelers and expats for its trendy cafes and nightlife.
- Budget travelers can find numerous **hostels and guesthouses** throughout the city, particularly in the Old City area[7].

## Safety and Health

Chiang Mai is generally safe, but it's wise to take basic precautions:

- Avoid poorly lit areas at night.
- Be cautious of pickpockets in crowded places.
- The air quality can deteriorate between February and April due to seasonal burning, so those with respiratory issues should take precautions[8].

## Budget Tips

- **Street food** is not only affordable but offers some of the best local flavors.
- Use **public transport** such as songthaews and local buses to save on travel costs.
- Many of Chiang Mai's temples are free; others request a small donation[9].

## Unique Experiences

- **Elephant Sanctuaries**: Opt to visit ethical sanctuaries that prioritize the well-being of the animals, such as the Elephant Nature Park[10].
- **Cooking Classes**: Learn to cook classic Thai dishes with fresh ingredients from a local market[11].
- **Hot Springs**: Visit the San Kamphaeng Hot Springs for a relaxing day in nature[12].

## Essential Phrases

While many locals speak English, learning a few Thai phrases can enrich your travel experience:

- **Sawasdee (krub/kha)**: Hello
- **Khob khun (krub/kha)**: Thank you
- **Hong nam yu tee nai?**: Where is the bathroom?

## Citations

[1] "November to February is generally considered the best time to visit Chiang Mai as the weather is cooler and more pleasant... Yi Peng Lantern Festival."  
[2] "Wat Phra That Doi Suthep... Wat Chedi Luang."  
[3] "The Sticky Waterfalls... unique natural attraction famous for their climbable limestone surface."  
[4] "Respect the monarchy; speaking against the royal family is a serious offence in Thailand."  
[5] "Popular option for exploring the surrounding countryside; ensure you have a valid international driving permit."  
[6] "Sunday Walking Street Market, where local street food and crafts are abundant."  
[7]"Budget travelers can find numerous hostels and guesthouses throughout the city, particularly in the Old City area."  
[8] "The air quality can deteriorate between February and April due to seasonal burning."  
[9] "Many of Chiang Mai's temples are free; others request a small donation."  
[10] "Opt to visit ethical sanctuaries that prioritize the well-being of the animals, such as the Elephant Nature Park."  
[11] "Cooking Classes; Learn to cook classic Thai dishes with fresh ingredients from a local market."  
[12] "Visit the San Kamphaeng Hot Springs for a relaxing day in nature."  

### Evaluate citations using NLI model (from RAG - tutorial slides)

In [None]:
# download huggingface model and do stuff

### Present With Rag / Without RAG relevance (the evaluation step)