# HamiltonBot

This is a RAG-app for the city of Hamilton. This notebook contains the actual RAG code, free of `streamlit` UI components.

## Architecture Diagram

![diagram](architecture_diagram.png)

This notebooks is intended for both myself in the future, employers who just want to see the RAG code, and any future interns who want just the core functionality to improve upon. I opted to use the following packages for this project:
- `langchain`: I like the simplicity and elegance the abstractions provide.
- `langchain_openai`: As of writing this (January 2024), OpenAI has the current best models.
- `chromadb`: Chroma lets me have a local db for storing the embeddings, which simplifies a lot of the security and drives down the cost.
- `unstructured`: Given that the RFPs are fairly complex documents with tables and images, I would need a way to parse them into html and base64 formats to feed into the LLMs. (Check issues of tenancy, they say that they don't store, if it's not allowed, try the Hosted SaaS API)

In [None]:
from langchain_openai import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import WebBaseLoader
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_experimental.text_splitter import SemanticChunker

## Query Transformation

As you may know, the first step of any RAG pipeline is to transform the query. This should be done so that the quality and relevance of the documents retrieved can be better. There are a couple techniques for this:
1. Rewrite-Retrive-Read: Tell an LLM to try and improve the query by rephrasing it.
2. Multi-Query: Have an LLM generate 2-3 queries that ask the same thing different ways.
3. Step-back Prompting: Have the LLM to ask some "more basic" questions such as asking what principles are being used in the question.
4. RAG-Fusion: ???

I have currently selected Multi-Query as the Query Transformation. I believe this balances the cost with performance, where Rewrite-Retrieve-Read would be too simple and Step-back Prompting would be too expensive and slow.

>ℹ️ The questions that are going to be used to query the vectorstore are not going to be accessible as regular strings, instead they are logs so some code will be required should you want to get the questions as strings.

[Article on Query Transformations on the Langchain Blog](https://blog.langchain.dev/query-transformations/).

## Document Loaders

Since the eventual usecase for this system will be QnA with large (200+ page) documents, it's important to chunk them up into more manageable chunks. I would like to experiment (use a different prebuilt function) with Semantic Splitting. Semantic Splitting works by going through the document text 3 consecutive sentences at a time. If the embeddings of two groups of sentences are similar, it will merge both sentences into a single chunk. This way it groups sentences with similar semantic content.

[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/8OJC21T2SL4/0.jpg)](https://www.youtube.com/watch?v=8OJC21T2SL4?si=4s4VROINPiQOWUMh)

# Retrieval Method

When getting documents from the vectorstore that relate to a certain query, there are a couple options on how to select them.
1. The naive approach is the find the return the $k$ most similar embeddings of the documents to the query. This is fine, but for more complex documents (like RFPs), it can be helpful to maximize diversity of the documents.
2. Maximal-Marginal Relevance (MMR): This works by finding the embeddings with the greatest cosine similarity to the query but also penalizing them for similarity to already selected documents.

Given the nature of RFPs, I will be choosing to use MMR (It's just an option that you can choose from in the `.as_retriever()` method).

[Langchain Docs for Selecting Documents with MMR](https://python.langchain.com/docs/modules/model_io/prompts/example_selector_types/mmr)

In [None]:


loader = WebBaseLoader("http://www.paulgraham.com/superlinear.html")

docs = loader.load()

embedding = OpenAIEmbeddings(
    api_key="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",
    openai_api_type="davinci",
)

text_splitter = SemanticChunker(embedding=embedding)
chunks = text_splitter.split_documents(docs)


vectordb = Chroma(
    persist_directory="./chroma_db",
).from_documents(documents=chunks, embedding=embedding)

question = "What are the fundamental use cases of superlinear returns?"

retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 3}),
    llm=ChatOpenAI(
        temperature=0, api_key="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"
    ),
)

unique_docs = retriever_from_llm.get_relevant_documents(query=question)

unique_docs


Number of requested results 20 is greater than number of elements in index 19, updating n_results = 19
Number of requested results 20 is greater than number of elements in index 19, updating n_results = 19
Number of requested results 20 is greater than number of elements in index 19, updating n_results = 19


[Document(page_content="to another. But though the borders of these concepts are blurry,\nthey're not meaningless. I've tried to write about them as precisely\nas I could without crossing into error.[1]\nEvolution itself is probably the most pervasive example of\nsuperlinear returns for performance. But this is hard for us to\nempathize with because we're not the recipients; we're the returns.[2]\nKnowledge did of course have a practical effect before the\nIndustrial Revolution. The development of agriculture changed human\nlife completely. But this kind of change was the result of broad,\ngradual improvements in technique, not the discoveries of a few\nexceptionally learned people.[3]\nIt's not mathematically correct to describe a step function as\nsuperlinear, but a step function starting from zero works like a\nsuperlinear function when it describes the reward curve for effort\nby a rational actor. If it starts at zero then the part before the\nstep is below any linearly increasing 

In [4]:
import ipywidgets as widgets

widgets.FileUpload(
    accept='',  # Accepted file extension e.g. '.txt', '.pdf', 'image/*', 'image/*,.pdf'
    multiple=False  # True to accept multiple files upload else False
)

FileUpload(value=(), description='Upload')