# POC - Notebook to Build Retrieval Augmentation Generation (RAG) Model

I use this notebook to go through the steps of building a RAG model and testing out in preparation for the development of a Python coded application that uses the model to answer questions about an iPod Shuffle. 

## Importing Modules

In [20]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS

from langchain_ollama import ChatOllama
from langchain_ollama.llms import OllamaLLM
from langchain.text_splitter import RecursiveCharacterTextSplitter

import ollama
from operator import itemgetter

#import streamlit as st

## Set some constants 

In [None]:
SAMPLE_PDF = "/Users/lancehester/Documents/ai_rag_user_guide/data/ipod_shuffle_2015_user_guide.pdf"

MODEL = "llama3.2"

## Loading the PDF data source for the RAG model

In [3]:
loader = PyPDFLoader(SAMPLE_PDF)
pages = loader.load()

print(f"Number of pages: {len(pages)}")
print(f"Length of a page: {len(pages[1].page_content)}")
print("Content of a page: ", pages[1].page_content)

Number of pages: 30
Length of a page: 854
Content of a page:  Contents
3 Chapter 1:  Ab out iPod shuffle
4 Chapter 2:  iPod shuffle Basics
4 iPod shuffle at a glance
5 Use the iPod shuffle controls
5 Connect and disconnect iPod shuffle
7 About the iPod shuffle battery
9 Chapter 3:  Setting up i Pod shuffle
9 About iTunes
10 Set up your iTunes library
10 Organize your music
11 Connect iPod shuffle to a computer for the first time
12 Add content to iPod shuffle
17 Chapter 4:  Listening t o Music
17 Play music
18 Use VoiceOver
20 Set tracks to play at the same volume
20 Set a volume limit
21 Lock and unlock the iPod shuffle buttons
22 Chapter 5:  Tips and Tr oubleshooting
24 Update and restore iPod shuffle software
25 Chapter 6:  Saf ety and Handling
25 Important safety information
26 Important handling information
27 Chapter 7:  Lea rning More, Service, and Support
28 Regulatory Compliance Information
  2


## Need to Split PDF into Chunks for ease of tokenization and Semanitc Search

In [4]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1500,chunk_overlap=100)

chunks = splitter.split_documents(pages)

print(f"Number of chunks: {len(chunks)}")
print(f"Length of a chunk: {len(chunks[1].page_content)}")
print(f"Content of a chunk: ", chunks[1].page_content)

Number of chunks: 58
Length of a chunk: 854
Content of a chunk:  Contents
3 Chapter 1:  Ab out iPod shuffle
4 Chapter 2:  iPod shuffle Basics
4 iPod shuffle at a glance
5 Use the iPod shuffle controls
5 Connect and disconnect iPod shuffle
7 About the iPod shuffle battery
9 Chapter 3:  Setting up i Pod shuffle
9 About iTunes
10 Set up your iTunes library
10 Organize your music
11 Connect iPod shuffle to a computer for the first time
12 Add content to iPod shuffle
17 Chapter 4:  Listening t o Music
17 Play music
18 Use VoiceOver
20 Set tracks to play at the same volume
20 Set a volume limit
21 Lock and unlock the iPod shuffle buttons
22 Chapter 5:  Tips and Tr oubleshooting
24 Update and restore iPod shuffle software
25 Chapter 6:  Saf ety and Handling
25 Important safety information
26 Important handling information
27 Chapter 7:  Lea rning More, Service, and Support
28 Regulatory Compliance Information
  2


## Storing the Chunks in a Vector Store

A Vector store is a database effective at storing vectors. Here, I use FAISS as a band-aid fix because it will store the file data in memory. 

In production and real world we would use a more formal vector store like `Pinecone`.

Pinecone, Milvus, Weaviate, Faiss, Chroma, Qdrant, Elasticsearch (with vector search capabilities), Pgvector, and Anthos Vector Database; all of which are designed to efficiently store and search high-dimensional vectors, enabling similarity-based queries in applications like semantic search and recommendation systems.

Vector databases are versatile and can be used in both small and large projects. For small-scale projects, open-source solutions like Chroma, Faiss, and Weaviate offer robust capabilities. For enterprise-scale projects, managed platforms like Pinecone provide scalability and performance optimization.


## Steps:
1.   I generate embeddings (tokenizing each chunk)
  *  langchain gives me the embedding model

2.   Facebook AI Similarity Search (FAISS) is a library for efficient similarity search and clustering of dense vectors. It is a library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other. It solves limitations of traditional query search engines that are optimized for hash-based searches, and provides more scalable similarity search functions. FAISS also provides a basic vector store that we can temporarily store data in RAM.

In [5]:
# Need to pull the llam3.1 model
ollama.pull(MODEL)

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [6]:
ollama_endpoint = "http://127.0.0.1:11434"
embeddings = OllamaEmbeddings(model=MODEL, base_url=ollama_endpoint)
print(embeddings)
vectorstore = FAISS.from_documents(chunks, embeddings)

  embeddings = OllamaEmbeddings(model=MODEL, base_url=ollama_endpoint)


base_url='http://127.0.0.1:11434' model='llama3.2' embed_instruction='passage: ' query_instruction='query: ' mirostat=None mirostat_eta=None mirostat_tau=None num_ctx=None num_gpu=None num_thread=None repeat_last_n=None repeat_penalty=None temperature=None stop=None tfs_z=None top_k=None top_p=None show_progress=False headers=None model_kwargs=None


## Setting up a Retriever 

In RAG, a retriever is a component that efficiently locates and retrieves relevant information from a knowledge base or external data source. It acts as a search engine within the RAG system, pinpointing the most pertinent data to augment the input query before it's passed to the generator for response creation. 

We can usa retriever to find chunks in the vectore store that are similar to a supllied question.


In [7]:
retriever = vectorstore.as_retriever()
retriever.invoke("How to turn on you ipod shuffle?") #returns by default the 4 most relevant chunks



[Document(id='84312baa-3a30-481e-b2fe-1e127888e395', metadata={'producer': 'Adobe PDF Library 10.0.1', 'creator': 'Adobe InDesign CS6 (Macintosh)', 'creationdate': '2015-06-23T22:27:58+03:00', 'author': 'Apple Inc.', 'moddate': '2015-06-24T15:13:57-07:00', 'title': 'iPod shuffle User Guide', 'trapped': '/False', 'source': '/Users/lancehester/Documents/ai_rag_user_guide/data/ipod_shuffle_2015_user_guide.pdf', 'total_pages': 30, 'page': 4, 'page_label': '5'}, page_content='Press\xa0and\xa0hold\xa0Play/Pause\xa0 \xa0until\xa0the\xa0status\xa0light\xa0\nblinks\xa0orange\xa0three\xa0times.\nRepeat\xa0to\xa0unlock\xa0the\xa0buttons.\nReset iPod\xa0shuffle\n(if\xa0iPod\xa0shuffle\xa0isn’t\xa0responding\xa0or\xa0the\xa0status\xa0light\xa0is\xa0\nsolid\xa0red)\nTurn\xa0iPod\xa0shuffle\xa0off,\xa0wait\xa010\xa0sec onds,\xa0then\xa0turn\xa0it\xa0back\xa0\non\xa0again.\nFind the iPod\xa0shuffle serial number Look\xa0under\xa0the\xa0clip\xa0on\xa0iPod\xa0shuffle.\xa0Or,\xa0in\xa0iT unes\xa0(with\xa

---

# Configuring the Model

We'll be using `Ollama` to load the local model in memory. After creating the model, we can invoke it with a question to get the response back. The model looks like the following:

**question---model---response**


We call the ChatOllama. ChatOllama is the chatbot like feature we can use to help ask questions and get responses.

```
ChatOllama(model=MODEL, temperature=0)
```

* model = MODEL = "llama3.2"

* temperature tells you how createive to be. A temperature of 0 in ChatOllama means the model will always choose the most likely next word at each step of text generation



In [9]:
model = ChatOllama(model=MODEL, temperature=0)
model.invoke("who is the president of the united states?") #this tests general knowledge of the model.


AIMessage(content="I'm not aware of my current status or the current President of the United States. My knowledge cutoff is December 2023, but I do not have real-time information. As of my knowledge cutoff, Joe Biden was the President of the United States. However, please note that this information may have changed since then.", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-06-18T22:32:13.570735Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1007725875, 'load_duration': 28959750, 'prompt_eval_count': 34, 'prompt_eval_duration': 202272209, 'eval_count': 65, 'eval_duration': 775741166, 'model_name': 'llama3.2'}, id='run--d2d59281-a65c-4acc-a93f-8997f77c3114-0', usage_metadata={'input_tokens': 34, 'output_tokens': 65, 'total_tokens': 99})

---

## Parsing the model's response

A Parser is a class in langchain to modify the response output to match how the user expects to see it.

Remember, langchain essentially chains together processes where the output of a first process becomes the input of another process. For this project, that looks like following process flow:

**question** ---> `<start chain>` model--response--parser `<end chain>` ----> **answer**


In [None]:
# See how the parser strips the unwanted information from the response. 
parser = StrOutputParser()

chain = model | parser
print(chain.invoke("who is the president of the united states?"))


I'm not aware of my current status or the current President of the United States. My knowledge cutoff is December 2023, but I do not have real-time information. As of my knowledge cutoff, Joe Biden was the President of the United States. However, please note that this information may have changed since then.


## Setting up an AI Prompt

Okay, now that we have the flow. I can pass context and present day information (i.e., my iPod pdf data) with the chatbot question to form a `prompt`

I create a `prompt template` which makes it easy to automate and to share with others.

In [14]:
template = """
You are an assistant that provides answers to questions based on a given context.

Answer the question based on the context.
If you can't anser the question, reply "I do not know".

Be as concise as possible and go straight to the point.

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template=template)
print(prompt.format(context="Heres is some context", question="here is a question")) # this jsut shows the prompt



You are an assistant that provides answers to questions based on a given context.

Answer the question based on the context.
If you can't anser the question, reply "I do not know".

Be as concise as possible and go straight to the point.

Context: Heres is some context

Question: here is a question



## Add Prompt to the Previous Chain

In [16]:
# Here is a quick example
chain = prompt | model | parser
chain.invoke({
    "context": "Darth Vader is Luke's father",
    "question": "Who is Luke's father"
})

'Darth Vader.'

## Adding the Retriever to the Chain

I can connect the retriever to the chain to get the context from the vector store.

In [18]:
chain = (
  {
    "context": itemgetter("question") | retriever,
    "question": itemgetter("question"),
  }
  | prompt
  | model
  | parser
)

## Finally, I Pull it Altogether



In [19]:
questions = [
  "What is an IPOD Shuffle?",
  "How do I turn on the IPOD Shuffle?",
  "How do I play a song?"
]

for question in questions:
  print(f"Question: {question}")
  print(f"Answer: {chain.invoke({'question': question})}")
  print("***************************\n")

Question: What is an IPOD Shuffle?
Answer: An iPod shuffle is a portable music player produced by Apple Inc. It allows users to create playlists and listen to songs, audiobooks, or podcasts on the go.
***************************

Question: How do I turn on the IPOD Shuffle?
Answer: To turn on the iPod shuffle, connect it to a USB port on your computer. The battery may need to be recharged. Turn the iPod shuffle off, wait 10 seconds, and then turn it back on again. If the iPod shuffle won't turn on or respond, try connecting it to a USB port and restoring its software.
***************************

Question: How do I play a song?
Answer: I do not know. The provided context does not mention how to play a song on the iPod shuffle. It only provides information on troubleshooting and managing music content.
***************************

