# Installung Req Libraries

In [None]:
!pip install langchain langchain-community langchain-huggingface sentence-transformers pypdf faiss-cpu


Collecting langchain-community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-1.0.1-py3-none-any.whl.metadata (2.1 kB)
Collecting pypdf
  Downloading pypdf-6.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.13.0-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.7 kB)
INFO: pip is looking at multiple versions of langchain-community to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-community
  Downloading langchain_community-0.4-py3-none-any.whl.metadata (3.0 kB)
  Downloading langchain_community-0.3.31-py3-none-any.whl.metadata (3.0 kB)
Collecting requests<3,>=2 (from langchain)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (2

# Importing Req Libraries

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings


# Loading PDF Document

In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
loader = PyPDFLoader("/content/drive/MyDrive/Document Files for Colab/FAISS Published Paper.pdf")
documents = loader.load()

print(len(documents))
print(documents[0].page_content[:500])


25
THEFAISSLIBRARY
Matthijs Douze
FAIR, Meta
Alexandr Guzhva
Zilliz
Chengqi Deng
DeepSeek
Jeff Johnson
FAIR, Meta
Gergely Szilvasy
FAIR, Meta
Pierre-Emmanuel Mazar´e
FAIR, Meta
Maria Lomeli
FAIR, Meta
Lucas Hosseini
Skip Labs
Herv´e J´egou
FAIR, Meta
Abstract
Vector databases typically manage large collections of
embedding vectors. As AI applications are growing
rapidly, the number of embeddings that need to be
stored and indexed is increasing. The Faiss library is
dedicated to vector similarity se


# Split into Chunks (Recursive Chunking)

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=200
)

docs = text_splitter.split_documents(documents)
print("Total chunks:", len(docs))


Total chunks: 188


# Embedding model

In [None]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
embedding_model = SentenceTransformerEmbeddings(model_name="BAAI/bge-small-en-v1.5")


# Build FAISS Vector Store

In [None]:
vectorstore = FAISS.from_documents(docs, embedding_model)


# Saving Vector Store to Disk
This creates:

index.faiss → FAISS embeddings

index.pkl → Documents & metadata


In [None]:
vectorstore.save_local("faiss_store")


# For Vectors data back from storage

In [None]:
vectorstore = FAISS.load_local(
    "faiss_store",
    embedding_model,
    allow_dangerous_deserialization=True
)


# Perform Similarity Search

In [None]:
query = "Is FAISS a database?"
result = vectorstore.similarity_search(query,k=3)

In [None]:
print(result)

[Document(id='aab370b4-5757-4ee6-bb4b-dae483401b4b', metadata={'producer': 'pikepdf 8.15.1', 'creator': 'arXiv GenPDF (tex2pdf:e76afa9)', 'creationdate': '', 'author': 'Matthijs Douze; Alexandr Guzhva; Chengqi Deng; Jeff Johnson; Gergely Szilvasy; Pierre-Emmanuel Mazaré; Maria Lomeli; Lucas Hosseini; Hervé Jégou', 'doi': 'https://doi.org/10.48550/arXiv.2401.08281', 'license': 'http://creativecommons.org/licenses/by/4.0/', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.28 (TeX Live 2025) kpathsea version 6.4.1', 'title': 'The Faiss library', 'trapped': '/False', 'arxivid': 'https://arxiv.org/abs/2401.08281v4', 'source': '/content/drive/MyDrive/Document Files for Colab/FAISS Published Paper.pdf', 'total_pages': 25, 'page': 0, 'page_label': '1'}, page_content='tains a variety of indexing methods that commonly\ninvolve a chain of components (preprocessing, com-\npression, non-exhaustive search, etc.). In this paper,\nwe show that there exists a choice between a dozen\nind

In [None]:
for r in result:
    print("\n----- RESULT -----")
    print(r.page_content)
    print("Metadata:", r.metadata)



----- RESULT -----
tains a variety of indexing methods that commonly
involve a chain of components (preprocessing, com-
pression, non-exhaustive search, etc.). In this paper,
we show that there exists a choice between a dozen
index types, and the optimal one usually depends on
the problem’s constraints.
To summarize what Faiss isnot: Faiss does not ex-
tract features – it only indexes embeddings that have
been extracted by a different mechanism; Faiss is not
a service – it only provides functions that are run as
part of the calling process on the local machine; Faiss
is not a database – it does not provide concurrent write
access, load balancing, sharding, transaction manage-
ment or query optimization. The scope of the library
1
arXiv:2401.08281v4  [cs.LG]  23 Oct 2025
Metadata: {'producer': 'pikepdf 8.15.1', 'creator': 'arXiv GenPDF (tex2pdf:e76afa9)', 'creationdate': '', 'author': 'Matthijs Douze; Alexandr Guzhva; Chengqi Deng; Jeff Johnson; Gergely Szilvasy; Pierre-Emmanuel Mazaré

# Building a KNN Retriever (for RAG)

In [None]:
from langchain_community.retrievers import KNNRetriever

In [None]:
from langchain_core.documents import Document

retriever = KNNRetriever.from_documents(docs, embedding_model)

In [None]:
query = "Is FAISS a database?"
retrieved_docs = retriever.invoke(query)

In [None]:
for r in retrieved_docs:
    print("\n----- RESULT -----")
    print(r.page_content)
    print("Metadata:", r.metadata)


----- RESULT -----
tains a variety of indexing methods that commonly
involve a chain of components (preprocessing, com-
pression, non-exhaustive search, etc.). In this paper,
we show that there exists a choice between a dozen
index types, and the optimal one usually depends on
the problem’s constraints.
To summarize what Faiss isnot: Faiss does not ex-
tract features – it only indexes embeddings that have
been extracted by a different mechanism; Faiss is not
a service – it only provides functions that are run as
part of the calling process on the local machine; Faiss
is not a database – it does not provide concurrent write
access, load balancing, sharding, transaction manage-
ment or query optimization. The scope of the library
1
arXiv:2401.08281v4  [cs.LG]  23 Oct 2025
Metadata: {'producer': 'pikepdf 8.15.1', 'creator': 'arXiv GenPDF (tex2pdf:e76afa9)', 'creationdate': '', 'author': 'Matthijs Douze; Alexandr Guzhva; Chengqi Deng; Jeff Johnson; Gergely Szilvasy; Pierre-Emmanuel Mazaré

In [None]:
for r in retrieved_docs:
  context = context + r.page_content
context = context.strip('\n')
print(context)

work in modern systems, is trained so that dis-
tances between embeddings are aligned with the
task to perform.
• The vector index performs neighbor search
among the embedding vectors as accurately as
possible w.r.t. exact search results given the
agreed distance metric.
Faissis a library for ANNS. The core library is a
collection of C++ source files without external depen-
dencies. Faiss also provides a comprehensive Python
wrapper for its C++ core. It is designed to be used
both from simple scripts and as a building block of a
DBMS. In contrast with other libraries that focus on a
single indexing method, Faiss is a toolbox that con-
tains a variety of indexing methods that commonly
involve a chain of components (preprocessing, com-
pression, non-exhaustive search, etc.). In this paper,tains a variety of indexing methods that commonly
involve a chain of components (preprocessing, com-
pression, non-exhaustive search, etc.). In this paper,
we show that there exists a choice between a d

In [None]:
print(retrieved_docs)

[Document(metadata={'producer': 'pikepdf 8.15.1', 'creator': 'arXiv GenPDF (tex2pdf:e76afa9)', 'creationdate': '', 'author': 'Matthijs Douze; Alexandr Guzhva; Chengqi Deng; Jeff Johnson; Gergely Szilvasy; Pierre-Emmanuel Mazaré; Maria Lomeli; Lucas Hosseini; Hervé Jégou', 'doi': 'https://doi.org/10.48550/arXiv.2401.08281', 'license': 'http://creativecommons.org/licenses/by/4.0/', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.28 (TeX Live 2025) kpathsea version 6.4.1', 'title': 'The Faiss library', 'trapped': '/False', 'arxivid': 'https://arxiv.org/abs/2401.08281v4', 'source': '/content/drive/MyDrive/Document Files for Colab/FAISS Published Paper.pdf', 'total_pages': 25, 'page': 0, 'page_label': '1'}, page_content='tains a variety of indexing methods that commonly\ninvolve a chain of components (preprocessing, com-\npression, non-exhaustive search, etc.). In this paper,\nwe show that there exists a choice between a dozen\nindex types, and the optimal one usually depen

# Integrating FAISS Vector Store with LLM to complete a RAG Pipeline

## Totally Langchain Integration of LLM and performing Inference.

## HuggingFace API ENDPOINT


#### Seting Up Environtment Variable

In [None]:
import getpass
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass(
    "Enter your API Key: "
)

#### Installing Req Packages for Langchain Huggingface Models

In [None]:
!pip install -qU  langchain-huggingface text-generation transformers google-search-results numexpr langchainhub sentencepiece jinja2 bitsandbytes accelerate

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.4/59.4 MB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m473.0/473.0 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for google-search-results (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain 0.3.27 requires langchain-core<1.0.0,>=0.3.72, but you have langchain-core 1.0.7 which is incompatible.[0m[31m
[0m

#### Intantiating HuggingFace Model Via Hugging Face Endpoint  

In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1-0528",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
    provider="auto"
)

chat_model = ChatHuggingFace(llm=llm)

#### Invocation Running Inferences

In [None]:
def retrieve_context(query):
  retrieved_docs = retriever.invoke(query)
  ret_data = ""
  for r in retrieved_docs:
    ret_data = ret_data + r.page_content
  ret_data = ret_data.strip('\n')
  return ret_data

user_query = "Which is the fastest algorith in FAISS for retrieval?"
context = retrieve_context(user_query)


In [None]:
from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
)

messages = [
    SystemMessage(content="You're a helpful assistant. Answer the below mentioned Question based on the context provided. "),
    HumanMessage(
        content=f"Question: {user_query}\n Context:{context}"
    ),
]

ai_msg = chat_model.invoke(messages)

In [None]:
print(ai_msg.content)

<think>
Hmm, the user is asking about the fastest algorithm for retrieval in FAISS. I need to look carefully at the context provided since FAISS offers multiple index types with different speed/accuracy tradeoffs.

The context emphasizes that FAISS is a toolbox with various indexing methods, and explicitly states there's no single optimal index - the best choice depends on the problem's constraints. It mentions: "a dozen index types, and the optimal one usually depends on the problem’s constraints."

While listing specific algorithms like IVF, HNSW and Flat, the text actually avoids declaring any single method as universally fastest. This is important because each index type has different performance characteristics depending on factors like dataset size, available memory, and required accuracy.

The user might be hoping for a simple answer like "HNSW is always fastest", but the context suggests we should be more nuanced. I recall earlier passages confirm HNSW does offer very good spee