# RAG with LangChain, Ollama, and FAISS Vector Store

## PDF Dataset: 
https://github.com/aydiegithub/rag-system-ollama/tree/dac3b4563a66b8b11962aaa08349ba5138be396a/rag-dataset-main

![Document Ingestion](flowcharts/Flowchart.png)

In [1]:
# pip install -U langchain-community faiss-cpu langchain-huggingface pymupdf tiktoken langchain-ollama python-dotenv

In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
import os
from dotenv import load_dotenv

os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
load_dotenv()

True

## Document loader

In [4]:
from langchain_community.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader("rag-dataset-main/machine-learning/MACHINE LEARNING(R17A0534).pdf")

docs = loader.load()

In [5]:
doc = docs[10]

In [6]:
print(doc.page_content)

6 
 
covers less distance physically than by train because a plane is unrestricted. Similarly, in chess, the 
concept of distance depends on the piece used – for example, a Bishop can move diagonally.   Thus, 
depending on the entity and the mode of travel, the concept of distance can be experienced differently. 
The distance metrics commonly used are Euclidean, Minkowski, Manhattan, and Mahalanobis. 
 
 
Distance is applied through the concept of neighbours and exemplars. Neighbours are points in 
proximity with respect to the distance measure expressed through exemplars. Exemplars are 
either centroids that ﬁnd a centre of mass according to a chosen distance metric or medoids that ﬁnd 
the most centrally located data point. The most commonly used centroid is the arithmetic mean, which 
minimises squared Euclidean distance to all other points. 
 
Notes: 
 
The centroid represents the geometric centre of a plane figure, i.e., the arithmetic mean 
position of all the points in the figu

## Load all the pdfs

In [7]:
import os

pdfs = []

for root, dirs, files in os.walk('rag-dataset-main'):
    # print(root, dirs, files)
    for file in files:
        if file.endswith('.pdf'):
            pdfs.append(os.path.join(root, file))

pdfs

['rag-dataset-main/machine-learning/CSIT_(R22)_3-2_MACHINE LEARNING_DIGITAL NOTES.pdf',
 'rag-dataset-main/machine-learning/NotesOnMachineLearningForBTech-1.pdf',
 'rag-dataset-main/machine-learning/ML_notes_22.pdf',
 'rag-dataset-main/machine-learning/notes.pdf',
 'rag-dataset-main/machine-learning/lec01.pdf',
 'rag-dataset-main/machine-learning/COS324_Course_Notes.pdf',
 'rag-dataset-main/machine-learning/2505.03861v1.pdf',
 'rag-dataset-main/machine-learning/lecturenotes.pdf',
 'rag-dataset-main/machine-learning/6_390_lecture_notes_spring24.pdf',
 'rag-dataset-main/machine-learning/MACHINE LEARNING(R17A0534).pdf',
 'rag-dataset-main/machine-learning/01_ml-overview_notes.pdf']

In [8]:
docs = []

for pdf in pdfs:
    loader = PyMuPDFLoader(pdf)
    pages = loader.load()
    
    docs.extend(pages)

In [9]:
len(docs) # number of pages in document

1440

## Document Chuncking

In [10]:
# pip install -qU langchain-text-splitters

In [11]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)

chunks = text_splitter.split_documents(docs)

In [12]:
len(docs), len(chunks)

(1440, 3371)

In [13]:
print(chunks[100].page_content)

CSIT DEPT-R22-MACHINE LEARNING 
48 
 
The multi-layer perceptron model is also known as the Backpropagation algorithm, which 
executes in two stages as follows: 
 
Forward Stage: Activation functions start from the input layer in the forward stage 
and terminate on the output layer. 
 
Backward Stage: In the backward stage, weight and bias values are modified as per 
the model's requirement. In this stage, the error between actual output and demanded 
originated backward on the output layer and ended on the input layer. 
Hence, a multi-layered perceptron model has considered as multiple artificial neural networks 
having various layers in which activation function does not remain linear, similar to a single 
layer perceptron model. Instead of linear, activation function can be executed as sigmoid, 
TanH, ReLU, etc., for deployment. 
A multi-layer perceptron model has greater processing power and can process linear and non-


In [14]:
len(chunks[100].page_content)

938

In [15]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4o-mini")

In [16]:
len(encoding.encode(chunks[2].page_content))

164

In [17]:
len(encoding.encode(docs[2].page_content))

314

## Document Vector Embedding

In [18]:
from langchain_ollama import OllamaEmbeddings

import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore # used to load the vectors in the ram

In [19]:
embeddings = OllamaEmbeddings(model = 'nomic-embed-text', base_url = "http://localhost:11434")

In [20]:
single_vector = embeddings.embed_query("Hello there I am Aydie.")

In [21]:
len(single_vector), single_vector[:10]

(768,
 [-0.0056480016,
  0.012251444,
  -0.1503848,
  -0.051890645,
  0.02039635,
  0.020038703,
  -0.015681038,
  -0.038975507,
  -0.020349437,
  0.017903127])

In [22]:
index = faiss.IndexFlatL2(len(single_vector))
index, index.ntotal, index.d

(<faiss.swigfaiss.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x10fc41770> >,
 0,
 768)

In [23]:
vector_store = FAISS(
    embedding_function = embeddings,
    index = index,
    docstore = InMemoryDocstore(),
    index_to_docstore_id = {}
)

In [24]:
vector_store

<langchain_community.vectorstores.faiss.FAISS at 0x10fc6b620>

In [25]:
docs

[Document(metadata={'producer': 'www.ilovepdf.com', 'creator': 'Microsoft® Word 2016', 'creationdate': '2024-12-06T07:02:22+00:00', 'source': 'rag-dataset-main/machine-learning/CSIT_(R22)_3-2_MACHINE LEARNING_DIGITAL NOTES.pdf', 'file_path': 'rag-dataset-main/machine-learning/CSIT_(R22)_3-2_MACHINE LEARNING_DIGITAL NOTES.pdf', 'total_pages': 120, 'format': 'PDF 1.5', 'title': '', 'author': 'MRCETIT', 'subject': '', 'keywords': '', 'moddate': '2024-12-06T07:02:22+00:00', 'trapped': '', 'modDate': 'D:20241206070222Z', 'creationDate': "D:20241206070222+00'00'", 'page': 0}, page_content='DIGITAL NOTES \nOF \nMACHINE LEARNING \n[R22A6602] \n \n                            B. TECH III YEAR - II SEM \n(2024-2025) \n \n \n \n \n \n \n   PREPARED BY            \n                                           P.HARIKRISHNA \n \n             DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY \n \n           MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY \n(Autonomous Institution – UGC, Govt. of 

In [None]:
ids = vector_store.add_documents(documents = chunks)

In [None]:
ids[:10], len(ids)

In [None]:
vector_store.index_to_docstore_id

In [None]:
# vector_store.save_local('machine_learning_vdb')

In [None]:
# new_vec_store = FAISS.load_local('machine_learning_vdb', embeddings = embeddings, allow_dangerous_deserialization = True)

In [None]:
# new_vec_store.index_to_docstore_id

## Retrevial

![Document Ingestion](flowcharts/Flowchart-2.png)

In [None]:
question = "What Is Machine Learning?"

In [None]:
docs = vector_store.search(query = question, search_type = 'similarity', )

for doc in docs:
    print(doc.page_content)
    print("\n\n")

In [None]:
retriever = vector_store.as_retriever(search_type = "mmr", search_kwargs = {'k': 3, 
                                                                            'fetch_k': 100, 
                                                                            'lambda_mult': 1})

In [None]:
docs = retriever.invoke(question)
for doc in docs:
    print(doc.page_content)
    print("\n\n")

In [None]:
docs = retriever.invoke("what is support vector machine")
for doc in docs:
    print(doc.page_content)
    print("\n\n")

In [None]:
docs = retriever.invoke("When should I use classification algorithm?")
for doc in docs:
    print(doc.page_content)
    print("\n\n")

In [None]:
docs = retriever.invoke("What is overfitting?")
for doc in docs:
    print(doc.page_content)
    print("\n\n")

## RAG with LLAMA 3.2 on OLLAMA

In [None]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser # to get output as string
from langchain_core.runnables import RunnablePassthrough # used to pass question directly to LLM and through Context
from langchain_core.prompts import ChatPromptTemplate # this is used to pass prompt from chunks 

from langchain_ollama import ChatOllama # used for end user communication

In [None]:
model = ChatOllama(model = "llama3.2:1b", base_url = "http://localhost:11434/")

In [None]:
model.invoke('Hello')

In [None]:
prompt = hub.pull("rlm/rag-prompt") # this is how we pull the prompt from langchain hub

In [None]:
prompt = """
    You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. Make sure your answer is relevant to the question and it is answered from the context only.
    Question: {question}
    Context: {context}
    Answer:
"""

prompt = ChatPromptTemplate.from_template(prompt)

In [None]:
prompt

In [None]:
def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])

# print(format_docs(docs))

In [None]:
rag_chain = (
    {
        "context": retriever|format_docs,
        "question": RunnablePassthrough(),        
    }
    | prompt 
    | model
    | StrOutputParser()
)

In [None]:
question = "what is machine learning?"

In [None]:
output = rag_chain.invoke(question)

In [None]:
print(output)

In [None]:
question = "Give me 3 algorithm in machine learning?"
print(rag_chain.invoke(question))

In [None]:
question = "What is Overfitting and Underfitting"
print(rag_chain.invoke(question))