# HamiltonBot

This is a RAG-app for the city of Hamilton. This notebook contains the actual RAG code, free of `streamlit` UI components.

## Architecture Overview

- Semantic-Splitting (Document Loaders)
- ~~Multi-query (Query Transformation)~~
- Summary-based Multi-vector (Indexing)
- ~~Maximal Marginal Relevance~~

## Multi-Query

This section of the code is for generating additional queries to then pass to the database. It works by passing the user's query to an LLM and asking it to create 3 more similar queries. They retain the semantic content of the original question but might contain different keywords. One strange thing is that the questions appear as logging information not as strings.

In [15]:
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import WebBaseLoader
from langchain.retrievers.multi_query import MultiQueryRetriever


loader = WebBaseLoader("http://www.paulgraham.com/superlinear.html")

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
chunks = text_splitter.split_documents(docs)

embedding = OpenAIEmbeddings(
    api_key="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",
    openai_api_type="davinci",
)
vectordb = Chroma(
    persist_directory="./chroma_db",
).from_documents(documents=chunks, embedding=embedding)

question = "What are the fundamental use cases of superlinear returns?"
llm = ChatOpenAI(
    temperature=0, api_key="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"
)

retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectordb.as_retriever(search_type="mmr", search_kwargs={"k":3}), llm=llm
)

unique_docs = retriever_from_llm.get_relevant_documents(query=question)
unique_docs


INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide examples of the primary applications where superlinear returns are observed?', '2. In what scenarios do we typically see superlinear returns being utilized?', '3. Could you list some fundamental use cases that demonstrate the concept of superlinear returns?']


[Document(page_content="to another. But though the borders of these concepts are blurry,\nthey're not meaningless. I've tried to write about them as precisely\nas I could without crossing into error.[1]\nEvolution itself is probably the most pervasive example of\nsuperlinear returns for performance. But this is hard for us to\nempathize with because we're not the recipients; we're the returns.[2]\nKnowledge did of course have a practical effect before the\nIndustrial Revolution. The development of agriculture changed human\nlife completely. But this kind of change was the result of broad,\ngradual improvements in technique, not the discoveries of a few\nexceptionally learned people.[3]\nIt's not mathematically correct to describe a step function as\nsuperlinear, but a step function starting from zero works like a\nsuperlinear function when it describes the reward curve for effort\nby a rational actor. If it starts at zero then the part before the\nstep is below any linearly increasing 

In [17]:
import ipywidgets as widgets

widgets.FileUpload(
    accept='',  # Accepted file extension e.g. '.txt', '.pdf', 'image/*', 'image/*,.pdf'
    multiple=False  # True to accept multiple files upload else False
)

FileUpload(value=(), description='Upload')