# abc Trip Planner (MVP)

Members
* Daniel Jiang (1009509)
* John Mo (1009513)
* Timothy Zheng (1009502)

## Notes

To run this program, you will need to create and populate a `.env` file with your OpenAI API key

`.env`

```bash
OPENAI_API_KEY = "your_key_here"
```

### TODO
* Set up OpenAI key
* Implement LLM
* RAG system
* (eval) compare to "ground truth"

### Innovative "ideas"
* Take note of the places LLM suggests, then make LLM system search online for the prices of places. Then output estimated cost of trip.
* Approach the topic as a "narrative journey": Add the history of places, culture, etc and make attractions "references" to the narrative, instead of the primary focus. Make the narrative the primary focus.
* Attach a packing list
* Implement weather forecast into trip plan

In [1]:
%pip uninstall pandas -y
%pip uninstall numpy -y
%pip uninstall pytz -y
%pip uninstall python-dateutil -y

Found existing installation: pandas 2.2.3
Uninstalling pandas-2.2.3:
  Successfully uninstalled pandas-2.2.3
Note: you may need to restart the kernel to use updated packages.
Found existing installation: numpy 1.26.4
Uninstalling numpy-1.26.4:
  Successfully uninstalled numpy-1.26.4
Note: you may need to restart the kernel to use updated packages.
Found existing installation: pytz 2024.2
Uninstalling pytz-2024.2:
  Successfully uninstalled pytz-2024.2
Note: you may need to restart the kernel to use updated packages.
Found existing installation: python-dateutil 2.9.0.post0
Uninstalling python-dateutil-2.9.0.post0:
  Successfully uninstalled python-dateutil-2.9.0.post0
Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install --upgrade pip --quiet
%pip install openai --quiet
%pip install python-dotenv --quiet
%pip install praw --quiet
%pip install pydantic --quiet
%pip install langchain-community --quiet
%pip install bs4 --quiet
%pip install pandas --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
from openai import OpenAI
from dotenv import load_dotenv
from IPython.display import display, Markdown, clear_output
import time

load_dotenv() # Requires .env file

True

In [4]:
client = OpenAI()

system_prompt = """You are a knowledgeable travel assistant. Provide a comprehensive and well-structured guide for the specified destination. Your response should be detailed, accurate, and formatted using markdown. Follow these guidelines:

1. Begin with a brief introduction about the destination.
2. Use level 2 headers (##) to separate main sections.
3. Use bolding (**) for subsections or important points within sections.
4. Include exclusively the following sections in exactly this order.
   - Best Time to Visit
   - Top Attractions
   - Local Culture and Etiquette
   - Getting Around
   - Food and Dining
   - Accommodation
   - Safety and Health
   - Budget Tips
   - Unique Experiences
   - Essential Phrases (if applicable)

5. Provide practical, up-to-date information and insider tips, highlighting any date-specific events if possible.
6. Use bullet points sparingly, preferring well-structured paragraphs.
7. Include any recent developments or changes affecting travel to the destination.
8. Conclude with a brief summary or final travel tip.

Aim for a comprehensive guide that's easy to read and informative for travelers."""

In [5]:
destination = "Chiang Mai"
stream = client.chat.completions.create(
    model="gpt-3.5-turbo-0125",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Provide key travel tips for {destination}"}
    ],
    stream=True
)

full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        full_response += content
        
        # Clear the current output and display the updated markdown
        clear_output(wait=True)
        display(Markdown(full_response))
        
        # Add a small delay to make the streaming visible
        time.sleep(0.01)

# Travel Guide to Chiang Mai, Thailand

Chiang Mai, located in northern Thailand, is a culturally rich city known for its ancient temples, vibrant markets, and picturesque mountain landscapes. Travelers flock to Chiang Mai to experience its laid-back atmosphere, delicious cuisine, and numerous outdoor activities.

## Best Time to Visit

**The best time to visit Chiang Mai is between November and February** when the weather is cooler and drier, making it ideal for exploring the city and its surroundings. Avoid the hot season from March to May and the rainy season from June to October.

## Top Attractions

**1. Wat Phra That Doi Suthep:** A must-visit temple situated on a mountain with stunning views of the city below.
**2. Chiang Mai Old City:** Explore the historic center with its ancient walls, moats, and numerous temples like Wat Chedi Luang.
**3. Night Bazaar:** Shop for local handicrafts, souvenirs, and try delicious street food at this bustling market.
**4. Elephant Nature Park:** Interact with rescued elephants in an ethical and sustainable environment.
**5. Doi Inthanon National Park:** Discover Thailand’s highest peak, waterfalls, and hill tribe villages.

## Local Culture and Etiquette

**1. Dress modestly when visiting temples.**
**2. Remove your shoes before entering someone's home or a temple.**
**3. Greet locals with a respectful "wai" (placing your palms together at chest level).**
**4. Avoid public displays of affection.**
**5. Bargain politely at markets but remember to be respectful.**

## Getting Around

**1. Red Trucks (Songthaews):** These shared taxis are a common and affordable way to get around the city.
**2. Tuk-tuks:** Negotiate the price before getting in.
**3. Grab:** A convenient option for getting around, similar to Uber.
**4. Renting a scooter or bicycle:** Great for exploring areas outside the city center.

## Food and Dining

**1. Try Khao Soi:** A traditional Northern Thai coconut curry noodle soup.
**2. Visit a night market for local dishes like grilled meats, papaya salad, and mango sticky rice.**
**3. Attend a cooking class to learn how to make traditional Thai dishes.**

## Accommodation

**1. Old City:** Stay in this area for easy access to temples, markets, and restaurants.
**2. Nimmanhaemin Road:** Trendy area with boutique hotels, cafes, and shopping.
**3. Riverside:** Offers beautiful views and a quieter atmosphere.

## Safety and Health

**1. Stay hydrated, especially during the hot season.**
**2. Be cautious of traffic when crossing the streets.**
**3. Watch out for scams, especially around popular tourist areas.**
**4. Ensure your vaccinations are up to date before traveling.**

## Budget Tips

**1. Eat at local markets and street stalls for affordable meals.**
**2. Opt for accommodations slightly outside the city center for better rates.**
**3. Bargain when shopping at markets but do so respectfully.**

## Unique Experiences

**1. Monk Chat:** Engage in conversations with local monks to learn about Buddhism and Thai culture.
**2. Take a traditional Lanna cooking class to learn about Northern Thai cuisine.**
**3. Attend a Yi Peng lantern release ceremony during the Loy Krathong festival if you visit in November.**

## Final Travel Tip

While in Chiang Mai, don’t miss the opportunity to participate in a monk’s alms giving ceremony early in the morning. It provides a unique insight into the local culture and spirituality. Enjoy your trip to this enchanting city!

***

## RAG Implementation

## TODO: Need to create system that dynamically finds reddit threads (maybe tripadvisor if time permits) given a location (refer to the diagram in slides to implement)
* Searches google API for places based on "things to do in <location> site:reddit.com" etc.

In [6]:
# code here

### Testing
* Below is a test list of sources for Chiang Mai (still need to implement the system for fetching docs from Google)

In [7]:
sources = [
    'https://www.reddit.com/r/chiangmai/comments/1cb3fx4/yi_pengloi_krathong_2024/',
    'https://www.reddit.com/r/ThailandTourism/comments/1cpccho/yi_peng_festival_on_a_budget/',
    'https://www.reddit.com/r/ThailandTourism/comments/13b0gld/help_me_with_chiang_mai_itinerary/',
    'https://www.reddit.com/r/chiangmai/comments/1evxoyu/4_days_trip_in_chiang_mai_any_suggestions_for/',
    'https://www.reddit.com/r/chiangmai/comments/1er2mj4/things_to_do_in_chiang_mai/',
    'https://www.reddit.com/r/chiangmai/comments/1bmosut/chiang_mai_trip_report_places_to_eat_activities/',
    'https://www.reddit.com/r/chiangmai/comments/185wni7/what_to_do_in_chiang_mai/',
    
    # buggy tripadvisor
    'https://www.tripadvisor.com/ShowUserReviews-g293917-d8820428-r621075297-Yi_Peng_and_Loy_Krathong_Lantern_Festival-Chiang_Mai.html'
]

In [8]:
import praw
import requests
from bs4 import BeautifulSoup, SoupStrainer
from urllib.parse import urlparse
import pandas as pd
import time
from typing import List, Dict
import os




### Reddit API Key
Create a Reddit API key by following the instructions [here](https://www.reddit.com/prefs/apps) (create a script)

add the following to your `.env` file
```bash
REDDIT_CLIENT_ID = "<your-reddit-client-id>"
REDDIT_CLIENT_SECRET = "<your-reddit-client-secret>" 
REDDIT_USER_AGENT = "python:<app-name>:v1.0 (by /u/<your-reddit-username>)"
```

In [9]:
# Scrape reddit using praw

# using praw to better seperate reddit data
reddit = praw.Reddit(
    client_id=os.environ.get("REDDIT_CLIENT_ID"),
    client_secret=os.environ.get("REDDIT_CLIENT_SECRET"),
    user_agent=os.environ.get("REDDIT_USER_AGENT"),
)

def scrape_reddit_post(url: str) -> Dict:
    """Scrape content from a Reddit post."""
    try:
        # Extract post ID from URL
        submission_id = url.split('comments/')[1].split('/')[0]
        submission = reddit.submission(id=submission_id)
        
        # Get post content
        post_data = {
            'title': submission.title,
            'text': submission.selftext,
            'comments': [],
            'url': url,
            'source': 'reddit'
        }
        
        # Get comments (limit to top-level comments)
        submission.comments.replace_more(limit=0)
        for comment in submission.comments:
            if not comment.stickied:  # Skip stickied comments
                post_data['comments'].append(comment.body)
        
        return post_data
    
    except Exception as e:
        print(f"Error scraping Reddit post {url}: {str(e)}")
        return None


In [10]:
from langchain_community.document_loaders import WebBaseLoader

# TODO scraping tripadvisor is too buggy keeps getting rate limited
def scrape_tripadvisor(url: str):
    loader = WebBaseLoader(
    web_paths=(url,),
    bs_kwargs=dict(
        parse_only=SoupStrainer(
            class_=("post-content", "post-title", "post-header")
            )
        ),
    )
    return loader.load()[0]

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [11]:
# Combined function
def scrape_urls(urls: List[str]) -> pd.DataFrame:
    """Scrape content from a list of URLs."""
    scraped_data = []
    
    for url in urls:
        print(f"Scraping {url}")
        domain = urlparse(url).netloc
        
        # Add delay between requests
        time.sleep(1)
        
        if 'reddit.com' in domain:
            data = scrape_reddit_post(url)
        else:
            data = scrape_tripadvisor(url)
            
        if data:
            scraped_data.append(data)
    
    # Convert to DataFrame
    df = pd.DataFrame(scraped_data)
    return df

# non-reddit links keep getting rate limited
# display(scrape_urls(['https://www.tripadvisor.com/ShowUserReviews-g293917-d8820428-r621075297-Yi_Peng_and_Loy_Krathong_Lantern_Festival-Chiang_Mai.html']))

In [12]:
# Scrape reddit only
reddit_results = scrape_urls([s for s in sources if 'reddit' in s])

# Display first few rows
display(reddit_results.head())

Scraping https://www.reddit.com/r/chiangmai/comments/1cb3fx4/yi_pengloi_krathong_2024/
Scraping https://www.reddit.com/r/ThailandTourism/comments/1cpccho/yi_peng_festival_on_a_budget/
Scraping https://www.reddit.com/r/ThailandTourism/comments/13b0gld/help_me_with_chiang_mai_itinerary/
Scraping https://www.reddit.com/r/chiangmai/comments/1evxoyu/4_days_trip_in_chiang_mai_any_suggestions_for/
Scraping https://www.reddit.com/r/chiangmai/comments/1er2mj4/things_to_do_in_chiang_mai/
Scraping https://www.reddit.com/r/chiangmai/comments/1bmosut/chiang_mai_trip_report_places_to_eat_activities/
Scraping https://www.reddit.com/r/chiangmai/comments/185wni7/what_to_do_in_chiang_mai/


Unnamed: 0,title,text,comments,url,source
0,Yi Peng/Loi Krathong 2024,Hey everyone!\n\nMy Girl and me want to visit ...,"[There is events you can pay to go, but I woul...",https://www.reddit.com/r/chiangmai/comments/1c...,reddit
1,Yi Peng Festival - on a Budget,"Hi, so I'm from the Philippines! \n\nMy friend...",[No matter where you stay in the Old City area...,https://www.reddit.com/r/ThailandTourism/comme...,reddit
2,Help me with Chiang Mai itinerary,This September me and my friends will be in Ch...,[You'll be completely fucked after a night bus...,https://www.reddit.com/r/ThailandTourism/comme...,reddit
3,4 days trip in Chiang Mai - Any suggestions fo...,Hello all! I'm planning on a solo trip to Chia...,[Sticky waterfall: a wonder that will mess wit...,https://www.reddit.com/r/chiangmai/comments/1e...,reddit
4,Things to do in Chiang Mai,Hey guys! I'm an Irish Tourist and am in Chian...,[Here’s a helpful [Chiang Mai Itinerary](https...,https://www.reddit.com/r/chiangmai/comments/1e...,reddit


### Encode Reddit data in single string for embedding

Target structure (Markdown):

* Markdown performance in GPT-4 is cheaper and more token-efficient
- https://arxiv.org/html/2411.10541v1
   - GPT-3.5-turbo prefers JSON, whereas GPT-4 favors Markdown
   - Encode in markdown since we are using GPT-4
- https://community.openai.com/t/markdown-is-15-more-token-efficient-than-json/841742

In [19]:
first_post = reddit_results.iloc[0]
first_post

title                               Yi Peng/Loi Krathong 2024
text        Hey everyone!\n\nMy Girl and me want to visit ...
comments    [There is events you can pay to go, but I woul...
url         https://www.reddit.com/r/chiangmai/comments/1c...
source                                                 reddit
Name: 0, dtype: object

In [47]:
from dataclasses import dataclass
from typing import List
import textwrap

@dataclass
class RedditThread:
    title: str
    text: str
    comments: List[str]
    url: str
    source: str
    
    def __str__(self) -> str:
        # Format the main post
        output = [
            f"# {self.title}",
            "",
            f"## Original Post",
            "",
            textwrap.fill(self.text, width=80),
            "",
            "### Comments",
            ""
        ]
        
        # Format comments with indentation to show thread structure
        for comment in self.comments:
            # Calculate comment depth based on leading spaces
            depth = len(comment) - len(comment.lstrip())
            # Remove leading spaces and wrap text
            clean_comment = comment.lstrip()
            # Add markdown list marker with proper indentation
            indent = "  " * (depth // 2)  # Adjust indentation based on depth
            wrapped_comment = textwrap.fill(
                clean_comment,
                width=80,
                initial_indent=f"{indent}- ",
                subsequent_indent=f"{indent}  "
            )
            output.append(wrapped_comment)
        
        output += [
            "",
            "#### URL",
            self.url
        ]
        
        return "\n".join(output)

In [48]:
reddit_threads = [RedditThread(
    title=row['title'],
    text=row['text'],
    comments=row['comments'],
    url=row['url'],
    source=row['source']
) for index, row in reddit_results.iterrows()]

In [49]:
str(reddit_threads[0])

"# Yi Peng/Loi Krathong 2024\n\n## Original Post\n\nHey everyone!  My Girl and me want to visit Thailand in November. We saw that\naround the 15./16. of November there will be the Yi Peng and Loi Krathong\nfestivities in Chiang Mai. It is not easy to get an overview of what to expect\nthere. So I am grateful for any information!   So here are a few questions:  -\nDo we need to attend the paid festival by CAD? Or is this just a touristic cash\ngrab? - Where to get a lantern? - Are we allowed to let a lantern start from\nwithin the city?  - What is a “must see” at that time in Chiang mai? e.g. any\nspecial ceremonies? Parades? - Is there like a timetable what is happening when\nin the city?   Thank & Cheers!\n\n### Comments\n\n- There is events you can pay to go, but I would advise against it. It's very\n  simple festival, go to any bridge on the ping river, there is lanterns\n  everywhere to buy from street vendors, you can buy flowers to set on the water\n  etc. You dont have to stress

In [50]:
display(Markdown(str(reddit_threads[0])))

# Yi Peng/Loi Krathong 2024

## Original Post

Hey everyone!  My Girl and me want to visit Thailand in November. We saw that
around the 15./16. of November there will be the Yi Peng and Loi Krathong
festivities in Chiang Mai. It is not easy to get an overview of what to expect
there. So I am grateful for any information!   So here are a few questions:  -
Do we need to attend the paid festival by CAD? Or is this just a touristic cash
grab? - Where to get a lantern? - Are we allowed to let a lantern start from
within the city?  - What is a “must see” at that time in Chiang mai? e.g. any
special ceremonies? Parades? - Is there like a timetable what is happening when
in the city?   Thank & Cheers!

### Comments

- There is events you can pay to go, but I would advise against it. It's very
  simple festival, go to any bridge on the ping river, there is lanterns
  everywhere to buy from street vendors, you can buy flowers to set on the water
  etc. You dont have to stress to much about time, when the sun goes down around
  7 is when you can expect the lanterns to raise to the air. Go have a nice
  dinner near ping river (Riverside or The Dukes, and when dinner is done go to
  the river and enjoy the sights and buy some lanterns and some flower
  decorations to give to the water spirits ( which is from my understanding what
  the whole festival Loi Krathong is about)   I hired a photographer for 2 hours
  in 2017 and got some amazing shots with me and my family.
- Hey, I was in CM for Yi Peng/Loi Krathong last November, it was very good. No
  need to go to the paid festival, the city centre (particularly around the
  river area) is absolutely buzzing with people/festivities and decorated
  beautifully.  In terms of launching lanterns from the city - people seemed to
  be launching them mostly from Nawarat Bridge (literally hundreds, maybe
  thousands). I can't endorse this though as it's banned due to fire risk. It
  also can't be great for the environment / wildlife. Unfortunately there was a
  big fire last year that took hours to put out
  (https://www.nationthailand.com/thailand/general/40033278). I witnessed it
  from my balcony and it was a very sad sight. Apparently this has happed in
  previous years too. Fortunately I think nobody was hurt, but people probably
  lost homes, or at very least businesses.  Other than that, it was a really
  great festival. CM (and Thailand in general) has some of the strongest
  festival game I've ever seen. Each one truly magical and memorable. And yes,
  there was parade around the perimeter of the Old City on one of the last days,
  that was good too.  One thing I'd say is to try and book accommodation and
  flights (if that's how you'll be travelling) well in advance as a lot of
  people visit CM for this festival. I was there for the the whole of November
  and the influx was very noticeable. Afterwards, as my departure from CM
  happened to coincide with the end of Yi Peng/Loi Krathong, I ended up having
  to get a 10hr bus to BKK to just get a reasonably priced last(ish) minute
  flight to HCM as this was \~$300 from CM (usually $50-70 for this route). I
  imagine it would've been cheaper from CM if I'd booked more in advance.  Have
  a great time! I love CM so much, one of my favourite places in the world. If
  you have time, drive \~20mins from the centre (or get a Grab taxi for \~300
  bhat) to Royal Park Rajapruek. Afterwards, you can also drive/taxi 5-10min up
  to Wat Phra That Doi Kham, the temple overlooking the park + has a great view
  over CM (if you go up by Grab taxi you may have to get a songthaew back down
  though as I'm not sure if Grab collect from up there, or perhaps you could try
  and tipping the Grab driver that took you up to wait for you). This area is
  truly special and relatively unknown / unvisited by tourists as it's a bit out
  of town.
- The paid event is worth it if you dgaf about the cost and want something less
  crowded and organized including transportation to/from and everything planned
  out.  There’s a free event at doi saket lakes which is fun but insanely
  overcrowded. It’s absolutely crazy how many people are here for it. We’re
  already sold out of beds for this year.

#### URL
https://www.reddit.com/r/chiangmai/comments/1cb3fx4/yi_pengloi_krathong_2024/

### Add reddit data into vector store

In [58]:
%pip install --quiet chromadb
%pip install --quiet sentence-transformers
%pip install --quiet langchain
%pip install --quiet langchain_openai

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


In [59]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o"
)

In [None]:
from langchain.text_splitter import MarkdownHeaderTextSplitter, RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
import chromadb

reddit_thread_strings = [str(t) for t in reddit_threads]

def create_reddit_vectorstore(reddit_threads: list[str], persist_directory: str = "./chroma_db"):
    # Initialize the markdown header splitter
    headers_to_split_on = [
        ("#", "Title"),
        ("##", "Original Post"),
        ("###", "Comments"),
        ("####", "URL")
    ]
    markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
    
    # Initialize recursive splitter for further splitting
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=512,
        chunk_overlap=50,
        separators=["\n\n", "\n", " ", ""],
        length_function=len
    )
    
    # Initialize embeddings model
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2", # optimal model for conversational data
        model_kwargs={'device': 'mps'}
    )
    
    # Process all documents
    all_splits = []
    for thread in reddit_threads:
        # Split by markdown headers first
        md_splits = markdown_splitter.split_text(thread)
        
        # Further split the content
        for md_split in md_splits:
            splits = text_splitter.split_text(md_split.page_content)
            all_splits.extend(splits)
    
    # Create and persist vectorstore
    vectorstore = Chroma.from_texts(
        texts=all_splits,
        embedding=embeddings,
        persist_directory=persist_directory
    )
    vectorstore.persist()
    
    return vectorstore

vectorstore = create_reddit_vectorstore(reddit_thread_strings)

retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
)

In [None]:
def format_reddit_threads(threads):
    enumerated_reddit_threads = [
        f"Source ID: {i}\nReddit Thread: {thread}"
        for i, thread in enumerate(threads)
    ]

    return "\n\n" + "\n\n".join(enumerated_reddit_threads)


In [55]:
from langchain_core.prompts import ChatPromptTemplate

rag_system_prompt = (
    "You're a helpful AI assistant. Given a user question "
    "and some Reddit thread snippets including the original post and top comments, answer the user "
    "question. If none of the threads answer the question, "
    "just say you don't know."
    "\n\nHere are the Reddit Threads: "
    "{context}"
)

rag_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", rag_system_prompt),
        ("human", "{input}"),
    ]
)

In [None]:
from pydantic import BaseModel, Field
from langchain_core.runnables import RunnablePassthrough


class Citation(BaseModel):
    source_id: int = Field(
        ...,
        description="The integer ID of a SPECIFIC source which justifies the answer.",
    )
    quote: str = Field(
        ...,
        description="The VERBATIM quote from the specified source that justifies the answer.",
    )


class QuotedAnswer(BaseModel):
    """Answer the user question based only on the given sources, and cite the sources used."""

    answer: str = Field(
        ...,
        description="The answer to the user question, which is based only on the given sources.",
    )
    citations: List[Citation] = Field(
        ..., description="Citations from the given sources that justify the answer."
    )

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_reddit_threads(x["context"])))
    | rag_prompt
    | llm.with_structured_output(QuotedAnswer)
)


retrieve_docs = (lambda x: x["input"]) | retriever

chain = RunnablePassthrough.assign(context=retrieve_docs).assign(
    answer=rag_chain_from_docs
)

In [None]:
result = chain.invoke({"input": "How fast are cheetahs?"})

In [None]:
result["answer"]

In [None]:
result['context'][0]

### Vectorstore retrieval based on relevance

### Generate Answer

In [14]:
# code here

### Add Citations to Answer

In [15]:
## TODO: Create citation

### Evaluate citations using NLI model (from RAG - tutorial slides)

In [16]:
# download huggingface model and do stuff

### Present With Rag / Without RAG relevance (the evaluation step)