# Naive RAG

The following codes are the implementation of Naive RAG.

Load the necessary libraries

- dotenv - to hide the api keys
- os - to handle the os environment
- ChatOpenAI - to use the model in OpenAI
- UnstructuredLoader - to load the txt file turn into multiple Documents
- RecursiveCharacterTextSplitter - to split the Documents
- Chroma - is an AI-native open-source vector database
- HuggingFaceEmbeddings - to convert text to embeddings
- hub - to get the template prompt
- StrOutputParser - to only get the content of the output of the llm

In [3]:
from dotenv import load_dotenv
load_dotenv()
import os
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
from langchain_openai import ChatOpenAI
from langchain_unstructured import UnstructuredLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.vectorstores.utils import filter_complex_metadata
from langchain.embeddings import HuggingFaceEmbeddings
from langchain import hub
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser





In [4]:
llm = ChatOpenAI(model="gpt-4o-mini")

## Store the data to Vector Database

In [37]:
def get_pdf_paths():
    categories = os.listdir("./PDFs")
    paths = []
    for category in categories:
        filenames = os.listdir(os.path.join(os.getcwd(), "PDFs", category))
        pdfs_per_category = [os.path.join(os.getcwd(), "PDFs", category, file) for file in filenames]
        paths += pdfs_per_category
    return paths


from langchain_community.document_loaders import PyPDFLoader

file_paths = get_pdf_paths()

pages = []
for file_path in file_paths:
    loader = PyPDFLoader(file_path)
    async for page in loader.alazy_load():
        pages.append(page)

In [51]:
print(llm.invoke(f"summarize the following: {pages[2].page_content}").content)

INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


**Summary of Requirements for Visiting Relatives in Japan (May 2023)**

**A. Purpose:**  
To visit relatives in Japan within the third degree of relationship. For visiting distant relatives or friends, different requirements apply.

**B. Requirements:**  
- **Applicant's Requirements:**
  1. **Passport:** Must include holder’s signature.
  2. **Application Form:** A facial photo (4.5x3.5 cm) is required. Proof of relationship (birth/marriage certificates) must be submitted.
  3. **Birth Certificate:** Issued by PSA within the last year; additional documents may be needed if unreadable or if there is a late registration.
  4. **Marriage Certificate:** For married applicants, issued by PSA within the last year; similar additional documents apply as for the birth certificate.
  5. **Bank Certificate:** Required if the applicant will cover part or all of the travel expenses.
  6. **Photocopy of Income Tax Return:** Required if the applicant will cover part or all of the travel expenses.

-

In [40]:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

vector_store = InMemoryVectorStore.from_documents(pages, OpenAIEmbeddings())
# docs = vector_store.similarity_search("What is LayoutParser?", k=2)
# for doc in docs:
#     print(f'Page {doc.metadata["page"]}: {doc.page_content}\n')

INFO: HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO: HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [6]:
FILE_PATH = "/Users/krimssmirk/Desktop/rag-llm/document.txt"

In [7]:
# Load the contents
# loader = UnstructuredLoader(FILE_PATH)
# docs = loader.load() # return list of Documents

In [15]:
# chunk the contents
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
# splits = text_splitter.split_documents(docs)

In [16]:
# index the contents and store it
# vectorstore = Chroma.from_documents(documents=filter_complex_metadata(splits), embedding=HuggingFaceEmbeddings())

  vectorstore = Chroma.from_documents(documents=filter_complex_metadata(splits), embedding=HuggingFaceEmbeddings())
INFO: Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2
INFO: Use pytorch device: cpu


## RAG demo

In [52]:
# Retrieve and generate using the relevant snippets of the blog.
retriever = vector_store.as_retriever()
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)



In [56]:
retriever.invoke("what is the requirements of visiting relatives?")

INFO: HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[Document(id='7b287cb1-9e08-43c2-9e97-5b938f84dd01', metadata={'source': '/Users/krimssmirk/Desktop/rag-llm/PDFs/temporary_visitor/visiting_relatives.pdf', 'page': 0}, page_content='VISITING RELATIVES   \nMAY  2023  \nA. PURPOSE  \nVisit relatives residing in Japan within the third degree . \n(If you visit relatives beyond the third degree, refer the requirements for “VISITNG FRIENDS  AND \nDISTANT RELATIVES ”.) \nB. Requirements （Details→ https://www.ph.em b-japan.go.jp/itpr_ja/11_000001_00898.html ） \n  ※ Downloadable from this website   \n APPLICANT’S REQUIREMENTS  \n(1) Passport（Holder ’s signature required ） \n(2) Application Form ※（A facial Photo (4.5×3.5cm) must be attached. ） \n☞ Submit All the relatives’ Birth Certi ficate/ Marriage Certificate enough to prove the third degree \nrelationship between Applicant and Inviter.  \n   (3) Birth Certificate (issued by PSA within 1 year)  \n   【ADDITIONAL REQUIREMENTS 】 \n    - If (3)  is unreadable, submit a Birth certificate  issued 

In [57]:
rag_chain.invoke("list the requirements of visiting relatives")

INFO: HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


'To visit relatives in Japan within the third degree, the requirements include a passport, an application form with a facial photo, and documentation proving the relationship, such as birth or marriage certificates. Additional documents may include a bank certificate and income tax return if the applicant will cover travel expenses. An invitation letter, itinerary, and residence certificate from the inviter are also necessary.'