### FAISS-based RAG System with LangChain
This notebook demonstrates building a Retrieval-Augmented Generation (RAG) system using FAISS vector store and LangChain, with support for custom documents (PDFs, tweets, news) and transformer-based embeddings (via HuggingFace or OpenAI). Key components include:

✅ Loading & preparing documents (Document)

✅ Chunking text using RecursiveCharacterTextSplitter

✅ Embedding text with HuggingFace or OpenAI

✅ Storing vectors in FAISS (inner product-based similarity)

✅ Performing semantic search with similarity_search()

✅ Handling docstore correctly with InMemoryDocstore and valid index_to_docstore_id

✅ Saving and loading vector index locally

✅ Understanding sync vs async document loading (load() vs alazy_load())

✅ Building a complete rag_chain using:

* RunnablePassthrough for input

* Retriever + formatter

* Prompt + model + StrOutputParser for final generation

In [78]:
import os
import warnings
from dotenv import load_dotenv

import faiss
from sklearn.metrics.pairwise import cosine_similarity

from langchain_openai import OpenAIEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings

from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_google_genai import ChatGoogleGenerativeAI

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub


warnings.filterwarnings('ignore')
load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN')
os.environ['GOOGLE_API_KEY'] = os.getenv("GOOGLE_API_KEY")
os.environ['LANGSMITH_PROJECT'] = os.getenv("LANGSMITH_PROJECT")
os.environ['LANGSMITH_API_KEY'] = os.getenv("LANGSMITH_API_KEY")
os.environ['LANGSMITH_ENDPOINT'] = os.getenv("LANGSMITH_ENDPOINT")
os.environ['LANGSMITH_TRACING'] = os.getenv("LANGSMITH_TRACING")

OpenAI Embeddings

In [79]:
embeddings_openai = OpenAIEmbeddings(model="text-embedding-3-large")

text = 'My name is Bhagwat Chate'

# query_result = embeddings_openai.embed_query(text=text)
# print(len(query_result))

# query_result = embeddings_openai.embed_query(text=text, dimensions=1024)
# print(len(query_result))

In [80]:
embeddings_hf = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
query_result = embeddings_hf.embed_query(text)
print(len(query_result))

384


In [81]:
docs = ["India is great", 
        "India own 1983,2011 ODI world cup", 
        "India own T20 world cup in 2007, 2025"]
query1 = "who is the 2011 odi world cup winner"
query2 = "For Nuclear power what is the policy of Republic of India"

docs_embed = embeddings_hf.embed_documents(docs)

query_embed = embeddings_hf.embed_query(query1)

cosine_similarity([query_embed], docs_embed)

array([[0.29896229, 0.68161931, 0.43833797]])

#### Meaning:
* 0.2989 = similarity between query and doc[0] ("India is great")
* 0.6816 = similarity between query and doc[1] ("India own 1983,2011 ODI world cup")
* 0.4383 = similarity between query and doc[2] ("India own T20 world cup in 2007, 2025")

#### How to interpret these values:
* +1        → perfect match in meaning/direction
* 0.7+      → strong semantic similarity 
* 0.5–0.7   → moderate similarity
* 0.3–0.5   → weak relevance
* < 0.3     → not very related

Cosine similarity ranges from -1 to +1


In [82]:
index = faiss.IndexFlatL2(384)

| Feature               | `Flat`                | `IVF` (Inverted File Index)        | `HNSW` (Graph-based Index)          |
| --------------------- | --------------------- | ---------------------------------- | ----------------------------------- |
| Type of Search     | Exact                 | Approximate (cluster-based)        | Approximate (graph-based traversal) |
| Speed               | Slow (linear scan)    | Fast (search only in top clusters) | Very Fast (graph walk)              |


| Dataset Size              | Recommended Index                 |
| ------------------------- | --------------------------------- |
| UPTO 1L                     | `IndexFlatL2` or `IndexFlatIP`    |
| UPTO 1M                  | `IndexIVFFlat` or `IndexHNSWFlat` |
| > 1M                      | `IndexIVFPQ` or `IndexHNSWFlat`   |


In [83]:
vector_store = FAISS(embedding_function=embeddings_hf,
                     index=index,
                     docstore=InMemoryDocstore(),
                     index_to_docstore_id={})

In [84]:
docs

['India is great',
 'India own 1983,2011 ODI world cup',
 'India own T20 world cup in 2007, 2025']

In [85]:
vector_store.add_texts(['India is great',
 'India own 1983,2011 ODI world cup',
 'India own T20 world cup in 2007, 2025'])

['cda99d6f-0f32-4847-b507-968fbf74b639',
 '2286e033-cf4c-4108-ab3b-f43faa5aaa6b',
 '96a96dde-0800-4a73-bc67-e686c155c33f']

means that your three input texts were successfully embedded and stored in the FAISS vector store, and they were each assigned a unique UUID-based document ID internally by LangChain.

In [86]:
vector_store.index_to_docstore_id

{0: 'cda99d6f-0f32-4847-b507-968fbf74b639',
 1: '2286e033-cf4c-4108-ab3b-f43faa5aaa6b',
 2: '96a96dde-0800-4a73-bc67-e686c155c33f'}

In [87]:
results = vector_store.similarity_search("About which country we are talking here?", k=1)
results

[Document(id='cda99d6f-0f32-4847-b507-968fbf74b639', metadata={}, page_content='India is great')]

Example: 2

In [88]:
# from uuid import uuid4
from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]

In [89]:
index2 = faiss.IndexFlatIP(384)

vector_store2 = FAISS.from_documents(
    documents=documents,
    embedding=embeddings_hf
)

vector_store2.add_documents(documents=documents)

['bd2b860b-52be-4356-bf94-54e98e571e48',
 '711ce4d3-04fe-40fe-97e5-3f30aef8c230',
 '49493781-8a5f-4096-9989-5156b6770579',
 '90d1b791-418f-4f10-8324-7fee39517ddf',
 '3f7bc88d-e7ca-4384-a14f-2710369b1e0a',
 '2bc790dd-ee14-4f25-a936-a296406931db',
 '1aadc5f0-dbee-4bbd-923a-20440f4fb13b',
 '84f6e0b5-f3d4-4386-994e-208548b58ce8',
 '9389d2d9-5a6d-4700-8603-49395c893297',
 '58a99115-62b5-44d0-8c34-41a297a069d3']

In [90]:
vector_store2.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=3 #hyperparameter
)

[Document(id='18f6cb74-b663-466d-ba02-4c27eae85170', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(id='49493781-8a5f-4096-9989-5156b6770579', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(id='b409a583-b91f-4add-9762-ac4a1b5c97bd', metadata={'source': 'tweet'}, page_content='LangGraph is the best framework for building stateful, agentic applications!')]

In [91]:
result = vector_store2.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=3, #hyperparameter
    filter={'source':{"$eq":"tweet"}}
)

In [92]:
result[0].page_content

'Building an exciting new project with LangChain - come check it out!'

In [93]:
retriever = vector_store2.as_retriever(search_kwargs={"k":3})

In [94]:
retriever.invoke("Langchain provide abstraction layer to make working with LLM easy")

[Document(id='18f6cb74-b663-466d-ba02-4c27eae85170', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(id='49493781-8a5f-4096-9989-5156b6770579', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(id='b409a583-b91f-4add-9762-ac4a1b5c97bd', metadata={'source': 'tweet'}, page_content='LangGraph is the best framework for building stateful, agentic applications!')]

In [95]:
vector_store2.save_local('my_1st_faiss_index')

In [96]:
new_vector_store=FAISS.load_local(
  "my_1st_faiss_index", embeddings_hf, allow_dangerous_deserialization=True
)

In [97]:
result = new_vector_store.similarity_search("what is langchain", k=3)
print(result)

[Document(id='18f6cb74-b663-466d-ba02-4c27eae85170', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'), Document(id='49493781-8a5f-4096-9989-5156b6770579', metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!'), Document(id='b409a583-b91f-4add-9762-ac4a1b5c97bd', metadata={'source': 'tweet'}, page_content='LangGraph is the best framework for building stateful, agentic applications!')]


In [98]:
print(result[0].page_content)

Building an exciting new project with LangChain - come check it out!


RAG for PDF

In [99]:
# 1. Synchronous Loading
# What it does: Loads all PDF pages at once in a blocking (synchronous) manner.
# When to use: Simple scripts, notebooks, or small PDFs where immediate loading is okay.
# Behavior: Your program waits until the entire PDF is read and parsed.
# Performance: Fine for small/medium files, but can block the event loop if you're inside an async app (like FastAPI).


from langchain.document_loaders import PyPDFLoader

FILE_PATH = r"E:\Learning\Agentic_AI_V2\2-Langchain-Basics\LLAMA2.pdf"

loader = PyPDFLoader(FILE_PATH)

pages = loader.load()

In [100]:
len(pages)

77

In [102]:
# 2. Asynchronous Loading
# What it does: Loads pages one at a time asynchronously using Python's async for.
# When to use: Inside async environments (e.g., FastAPI, LangServe, any async-based pipeline).
# Behavior: Doesn't block your app — allows other async tasks to run while loading the PDF.
# Performance: Scales better in web servers or multi-client environments where responsiveness matters.

pages = []
async for page in loader.alazy_load():
    pages.append(page)

pages

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'E:\\Learning\\Agentic_AI_V2\\2-Langchain-Basics\\LLAMA2.pdf', 'total_pages': 77, 'page': 0, 'page_label': '1'}, page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗ Louis Martin† Kevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev

In [103]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

split_docs = splitter.split_documents(pages)

len(split_docs)

615

In [104]:
split_docs[0]

Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'E:\\Learning\\Agentic_AI_V2\\2-Langchain-Basics\\LLAMA2.pdf', 'total_pages': 77, 'page': 0, 'page_label': '1'}, page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗ Louis Martin† Kevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev'

In [105]:
index = faiss.IndexFlatIP(384)

docstore = InMemoryDocstore({})
vector_store_pdf = FAISS(
    embedding_function=embeddings_hf,
    index=index,
    docstore=docstore,
    index_to_docstore_id={}
)

vector_store_pdf.add_documents(documents=split_docs)

['88281d69-1677-4519-8d41-26b9688ec0d0',
 'e4603822-e0ba-4afc-a842-9ecb012de691',
 'c26bfc18-76a5-4c7a-a22c-889920541bd0',
 '2c15acd9-e7b1-48d0-959e-af3a2a7cb9b5',
 '3239ffcb-740e-4538-bc82-5af16d7b42ad',
 'b7aba053-6178-407a-abc8-2b773db7beda',
 'e88727b8-4155-4a2e-919f-1fc061e57ffb',
 'cec1b999-4252-40bc-a215-96d7b7376ea5',
 '9a779cb4-35f7-4895-9760-7ca9c845d301',
 '25d21076-a9d8-47b9-82e2-97d6cfff233b',
 'dc18578f-2188-4c25-a603-d47a6074c6ad',
 '2bb6703f-3c21-4031-bba2-11103489bf32',
 '5326c680-35e2-42c7-a3f3-e55474157194',
 'c16393e3-286d-4dcb-b54e-826e8ab35bbd',
 '35b867b0-5602-44ae-8de1-fa4b908331eb',
 '13bdfd20-bc05-454e-b8cf-36c884f66866',
 'b26945df-1647-4a76-95ab-a2e41cff431a',
 '9b76722f-d35b-4a4b-b54b-6eed87a204a2',
 '0d976d95-d3df-4777-8c7b-32035e866667',
 '2727da08-f864-4f4f-bb99-2f8030240292',
 '6a1e080e-c479-4025-ac91-c1aceea8084f',
 '021f9ee1-9eef-4b11-aa18-c4a17daebe37',
 '7a9b5ba3-0572-49e5-be89-cb41c22e1fba',
 'd154b0a6-2814-4a2b-9c4a-0e81a8d4cd04',
 'f40012dc-7e7e-

In [106]:
retriever = vector_store_pdf.as_retriever(searchkwargs={"k":10})

In [107]:
retriever.invoke("what is the llama model")

[Document(id='f40012dc-7e7e-46f1-b430-887546f32d1b', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'E:\\Learning\\Agentic_AI_V2\\2-Langchain-Basics\\LLAMA2.pdf', 'total_pages': 77, 'page': 3, 'page_label': '4'}, page_content='work (Section 6), and conclusions (Section 7).\n‡https://ai.meta.com/resources/models-and-libraries/llama/\n§We are delaying the release of the 34B model due to a lack of time to sufficiently red team.\n¶https://ai.meta.com/llama\n‖https://github.com/facebookresearch/llama\n4'),
 Document(id='0b6e33a9-0007-44dc-bccc-164e9e4c5017', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '

In [108]:
model=ChatGoogleGenerativeAI(model='gemini-1.5-flash')
prompt = hub.pull('rlm/rag-prompt')

In [109]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [110]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [111]:
response = rag_chain.invoke("what is llama model")
print(response)

LLaMA (Large Language Model Meta AI) is a large language model developed by Meta AI.  It's intended for commercial and research use, with pretrained models adaptable for various natural language generation tasks and tuned models for assistant-like chat.  A Stanford version, Alpaca, is an instruction-following LLaMA model.
