## Features

This is the prototype version for the RAG chatbot which should be done for Week 3's progress meeting (11/02/25).

- Embedding and storing documents in a Pinecone Vector DB
- Early RAG (only one document is searched at the moment)
- LLM prompting

# Code

In [1]:
from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

In [2]:
import os
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY")

In [3]:
pdfreader = PdfReader('Data/Policies/StudentContract.pdf')

In [4]:
from typing_extensions import Concatenate
# read text from pdf
raw_text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        raw_text += content

In [5]:
# We need to split the text using Character Text Split such that it sshould not increse token size
text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 800,
    chunk_overlap  = 200,
    length_function = len,
)
texts = text_splitter.split_text(raw_text)

In [8]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [9]:
document_search = FAISS.from_texts(texts, embeddings)

In [13]:
from langchain.chains.question_answering import load_qa_chain
from langchain_openai import ChatOpenAI

In [14]:
llm = ChatOpenAI(
    model = "gpt-4o-mini",
    api_key = os.environ["OPENAI_API_KEY"],
    temperature = 0.0
)

chain = load_qa_chain(llm, chain_type="stuff")

stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  chain = load_qa_chain(llm, chain_type="stuff")


In [15]:
query = "Can I cancel the contract?"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)

  chain.run(input_documents=docs, question=query)


'Yes, you can cancel the contract. New students have a statutory right to cancel the contract without giving any reason within two cancellation periods: the first is 14 days from the day you accept the offer of a place at the University, and the second is 14 days after the start of the course. To exercise this right, you must inform the University of your decision to cancel by a clear statement, such as an email.'