# Naive RAG

The following codes are the implementation of Naive RAG.

Load the necessary libraries

- dotenv - to hide the api keys
- os - to handle the os environment
- ChatOpenAI - to use the model in OpenAI
- UnstructuredLoader - to load the txt file turn into multiple Documents
- RecursiveCharacterTextSplitter - to split the Documents
- Chroma - is an AI-native open-source vector database
- HuggingFaceEmbeddings - to convert text to embeddings
- hub - to get the template prompt
- StrOutputParser - to only get the content of the output of the llm

In [65]:
from dotenv import load_dotenv
load_dotenv()
import os
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
from langchain_openai import ChatOpenAI
from langchain_unstructured import UnstructuredLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.vectorstores.utils import filter_complex_metadata
from langchain.embeddings import HuggingFaceEmbeddings
from langchain import hub
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


In [66]:
llm = ChatOpenAI(model="gpt-4o-mini")

## Store the data to Vector Database

In [67]:
FILE_PATH = "/Users/krimssmirk/Desktop/rag-llm/document.txt"

In [68]:
# Load the contents
loader = UnstructuredLoader(FILE_PATH)
docs = loader.load() # return list of Documents

In [69]:
# chunk the contents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

In [72]:
# index the contents and store it
vectorstore = Chroma.from_documents(documents=filter_complex_metadata(splits), embedding=HuggingFaceEmbeddings())

  vectorstore = Chroma.from_documents(documents=filter_complex_metadata(splits), embedding=HuggingFaceEmbeddings())
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2


## RAG demo

In [76]:
# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What are the documents needed if the applicant his/her psa unreadable and cannot provide it?")

INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


'If the PSA birth certificate is unreadable or cannot be provided, the applicant should submit a Birth Certificate issued by the Local Civil Registrar and a Negative Certificate issued by the PSA. For married applicants, if the marriage certificate issued by PSA is unreadable, they should submit a Marriage Certificate issued by the Local Civil Registrar. If there is no record of marriage in PSA, they must also provide a Marriage Certificate from the Local Civil Registrar along with a Negative Certificate from PSA.'