# Hi! 

Here, I explore how to build a RAG model that I can "chat with" my research papers and find details, and if there is time, find exactly the source where my prompt response originated form, similar to Olmo2. https://allenai.org/olmo

This notebook can do the following:


- Loads your PDFs
- Extracts text and metadata (including source file and page)
- Creates embeddings using Ollama (nomic-embed-text)
- Runs a local RAG chain with Ollama LLM (e.g., llama3)
- Returns the answer along with the source file and page number

# Installations! 

In [33]:
# Install dependencies if needed (comment out when finished)
# !pip install langchain-ollama langchain chromadb langchain-community pathlib


# Imports!


In [45]:
from langchain_ollama import OllamaEmbeddings, OllamaLLM
from langchain_community.document_loaders import PyMuPDFLoader 
from langchain_community.vectorstores import Chroma 
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA

from pathlib import Path
from datetime import datetime
import json
import os


#  Path to your PDFs / Chroma_DB!


In [35]:
pdf_directory = './pdfs'  # Change this to your local directory with PDFs
persist_directory = './chroma_db'
prompts_and_reponses_directory = './prompts_and_responses'

for path in [pdf_directory, persist_directory, prompts_and_reponses_directory]:
    os.makedirs(path, exist_ok=True)

# Load your PDFs!



In [36]:
docs = []

for file in Path(pdf_directory).glob("*.pdf"):
    loader = PyMuPDFLoader(str(file))
    loaded_docs = loader.load()
    for doc in loaded_docs:
        doc.metadata['source'] = file.name
    docs.extend(loaded_docs)

print(f"Loaded {len(docs)} chunks from PDFs")

Loaded 8 chunks from PDFs


Now a little more technical lets split some chuks of text to overlap, feel fre to adjust numbers!

Then create a chroma vector store


In [37]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
split_docs = splitter.split_documents(docs)

#### make sure you pull tinyllama model on your terminal!

1. Pull what you need

- $ ollama pull tinyllama
2. Show what you got

- $ ollama list 

In [38]:
# Initializes an embedding function, OllamaEmbeddings tells LangChain to use Ollama’s local inference server to generate vector embeddings.
embedding = OllamaEmbeddings(model='llama3')  # Change model if needed

What the Chroma.from_documents(..) do?
- split_docs: your list of short text chunks from PDFs
- embedding=...: the function used to convert each chunk into a vector
- persist_directory=...: the folder to store your vector database on disk

What the vectordb.persist() funciton do? 
- Saves everything (vectors + metadata) to the persist_directory
- This makes your RAG system persistent — so you can skip reprocessing next time you run the notebook

In [39]:
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
vec = embedding.embed_query("test")
dim = len(vec) 
print(f"Embedding dimension size: {dim}")

persist_directory = f"./chroma_db_llama_{dim}_{timestamp}/" #  directory includes dimension size and timestamp
vectordb = Chroma.from_documents(
    split_docs,
    embedding=embedding,
    persist_directory=persist_directory
)
vectordb.persist()

In [40]:
# vectordb = Chroma.from_documents(split_docs, embedding=embedding, persist_directory=persist_directory) # comment when done, this one takes a while
# vectordb.persist()

In [42]:
# Initialize RAG chain with Ollama LLM
llm = OllamaLLM(model='llama3')

qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vectordb.as_retriever(), return_source_documents=True)

# Get ready to roll! Ask questions to your pdf dataset!

In [48]:

def save_rag_response_json(query, response):
    """
    Saves a RAG query, its answer, and source documents to a timestamped JSON file.

    Args:
        query (str): The question asked to the QA chain.
        response (dict): The response from LangChain's RetrievalQA with 'result' and 'source_documents'.
        output_dir (str): Directory to save the output file. Created if it doesn't exist.

    Returns:
        str: Path to the saved JSON file.
    """

    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    output_dir = f"./prompts_and_responses"
    filename = os.path.join(output_dir, f"{timestamp}_rag_response.json")
    os.makedirs(output_dir, exist_ok=True)

    # Format the data to be saved
    data = {
        "datetime": timestamp,
        "response_heading": response["result"],
        "query": query,
        "sources": [
            {
                "source": doc.metadata.get("source", "unknown"),
                "page": doc.metadata.get("page", "?"),
                "metadata": doc.metadata
            }
            for doc in response["source_documents"]
        ]
    }

    # Save to JSON
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)

    print(response['result'] + "\n\n")
    print(f"Done! (ﾉ◕ヮ◕)ﾉ !! Response saved to JSON: {filename}")
    return filename

Type your query here:

In [49]:
query = "what is an llm?"
response = qa_chain(query)
save_rag_response_json(query, response)

A helpful answer!

According to the provided context, a Large Language Model (LLM) is not explicitly defined. However, we can infer that it refers to a type of language model that is capable of processing and generating human-like text.

From the references provided, we see that LLMs are mentioned as being used in various applications such as robotics, scientific instrumentation, and conversational AI systems. This suggests that LLMs are large-scale neural networks trained on vast amounts of text data, enabling them to understand and generate human language.

In summary, while an explicit definition is not provided, we can infer from the context that a Large Language Model (LLM) is a type of AI model designed for processing and generating human-like text.


Done! (ﾉ◕ヮ◕)ﾉ !! Response saved to JSON: ./prompts_and_responses/2025-07-27_15-14-44_rag_response.json


'./prompts_and_responses/2025-07-27_15-14-44_rag_response.json'