# Hi! 

Here, I explore how to build a RAG model that I can "chat with" my research papers and find details, and if there is time, find exactly the source where my prompt response originated form, similar to Olmo2. https://allenai.org/olmo

This notebook can do the following:


- Loads your PDFs
- Extracts text and metadata (including source file and page)
- Creates embeddings using Ollama (nomic-embed-text)
- Runs a local RAG chain with Ollama LLM (e.g., llama3)
- Returns the answer along with the source file and page number

# Installations! 

In [13]:
# Install dependencies if needed (comment out when finished)
# !pip install langchain-ollama langchain chromadb langchain-community pathlib


# Imports!


In [14]:
from langchain_ollama import OllamaEmbeddings, OllamaLLM
from langchain_community.document_loaders import PyMuPDFLoader 
from langchain_community.vectorstores import Chroma 
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA

from pathlib import Path
from datetime import datetime
import json
import os


#  Path to your PDFs / Chroma_DB!


In [None]:
pdf_directory = './pdfs'  # Change this to your local directory with PDFs
prompts_and_reponses_directory = './prompts_and_responses'

for path in [pdf_directory, prompts_and_reponses_directory]:
    os.makedirs(path, exist_ok=True)

# Load your PDFs!



In [16]:
docs = []

for file in Path(pdf_directory).glob("*.pdf"):
    loader = PyMuPDFLoader(str(file))
    loaded_docs = loader.load()
    for doc in loaded_docs:
        doc.metadata['source'] = file.name
    docs.extend(loaded_docs)

print(f"Loaded {len(docs)} chunks from PDFs")

Loaded 8 chunks from PDFs


Now a little more technical lets split some chuks of text to overlap, feel fre to adjust numbers!

Then create a chroma vector store


In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=200)
split_docs = splitter.split_documents(docs)

## Pull the desired model on your terminal!
get what you need: `$ ollama pull llama3 `

show what you got: `$ ollama list`

this is crutial before running the next cell!

In [20]:
embedding = OllamaEmbeddings(model='llama3')  # Change model if needed

# Train!

In [None]:
# Create a Chroma vector store and persist it to disk
persist_directory = f"./chroma_db_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}"
vectordb = Chroma.from_documents(
    split_docs,
    embedding=embedding,
    persist_directory=persist_directory
)
vectordb.persist()


In [27]:
# embeds the text of split_docs into numerical vectors using the embedding model, and stores them in the database.
vectordb = Chroma.from_documents(split_docs, embedding=embedding, persist_directory=persist_directory) # this one takes a while
vectordb.persist()

KeyboardInterrupt: 

In [26]:
# Initialize RAG with Ollama LLM
llm = OllamaLLM(model='llama3')
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vectordb.as_retriever(), return_source_documents=True)

# Get ready to Prompt!

In [42]:

def save_rag_response_json(query, response):
    """
    Saves a RAG query, its answer, and source documents to a timestamped JSON file.

    Args:
        query (str): The question asked to the QA chain.
        response (dict): The response from LangChain's RetrievalQA with 'result' and 'source_documents'.
        output_dir (str): Directory to save the output file. Created if it doesn't exist.

    Returns:
        str: Path to the saved JSON file.
    """

    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    output_dir = f"./prompts_and_responses"
    filename = os.path.join(output_dir, f"{timestamp}_rag_response.json")
    os.makedirs(output_dir, exist_ok=True)

    # Format the data to be saved
    data = {
        "datetime": timestamp,
        "response_heading": response["result"],
        "query": query,
        "sources": [
            {
                "source": doc.metadata.get("source", "unknown"),
                "page": doc.metadata.get("page", "?"),
                "metadata": doc.metadata
            }
            for doc in response["source_documents"]
        ]
    }

    print(response['result'] + "\n\n")
    print(f"(ﾉ◕ヮ◕)ﾉ Response saved to JSON: {filename} \n\n")


    # Save to JSON
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
        print(json.dumps(data, indent=4))  # Pretty-print with 4-space indentation

    return filename

Type prompts as a string

In [43]:
query = "what's an LLM?"
response = qa_chain(query)
save_rag_response_json(query, response)

Based on the provided context, I can infer that an LLM refers to a "Large Language Model". This is mentioned in several places throughout the text, including references 1 and 3.


(ﾉ◕ヮ◕)ﾉ Response saved to JSON: ./prompts_and_responses/2025-08-17_14-14-59_rag_response.json 


{
    "datetime": "2025-08-17_14-14-59",
    "response_heading": "Based on the provided context, I can infer that an LLM refers to a \"Large Language Model\". This is mentioned in several places throughout the text, including references 1 and 3.",
    "query": "what's an LLM?",
    "sources": [
        {
            "source": "s41524-024-01423-2.pdf",
            "page": 6,
            "metadata": {
                "keywords": "",
                "producer": "iText\u00ae 5.3.5 \u00a92000-2012 1T3XT BVBA (SPRINGER SBM; licensed version)",
                "creationDate": "D:20241102043739+05'30'",
                "moddate": "2024-11-05T09:43:13+01:00",
                "page": 6,
                "source": "s41524-024

'./prompts_and_responses/2025-08-17_14-14-59_rag_response.json'

# Finished!